My favorite joke in college was about a spherical cow.

A farmer has a cow that’s stopped giving milk, and this farmer calls a bunch of scientists for their opinions. There’s a dig at biologists and chemists, but the punchline is reserved for the physicist, who, when asked for thoughts, responds, “I think I can solve the problem. We start by assuming the cow is a sphere…”

Admittedly, it’s not a knee-slapper, but at a time when all of my problem sets started with the words, “Assume the _____ is a sphere…”, I found it hilarious. I acknowledge that this says something about me. Don’t judge.

In retrospect, I think it’s an interesting analogy to the way that we tend to approach complicated analytics – attempting to simplify a problem to get to a simple answer. It makes sense, simple answers drive straightforward actions that are easily implemented.

Complexity in a system should not be ignored – in fact, I think complexity is where some of the most valuable knowledge can be gained.

There are lots of examples of this across an organization. I spend much of my time thinking about Interana’s user groups. I mentally map them by their analytic goals, their statistical sophistication, their role within their organization, their data priorities, their engagement with Interana. Each of these is a spectrum, and there’s a lot of overlap in our user base. I want to leverage the complexity of our user space to help our customers get the most value out of our software, help our product team develop our tool to address new customer challenges, and help our sales team as they interact with new prospects.

Additionally, my definition of a user group might be different than a product manager’s, or an engineer’s. The same general questions, viewed through the lens of available actions, or as a function of the data, can have a profound impact on the analysis direction, and the answers derived. The value of a question asked is often limited by the data I have accessible to answer it, and the value of a question’s answer is defined by my ability to act on it.

In my experience, as we think about classification problems, we tend to look for large-scale phenomena. It’s easier for me, as I seek to describe the space of Interana users, for example, to try to define five general categories of users, as opposed to twenty-five. Somewhere between treating each user individually, and all users the same, we find a balance of specificity of user experience, and scale with my resources to address the user groups. Finding that balance may be a function of resources, but knowing what that spectrum of complexity looks like helps inform better allocation of those resources, and prevents the kind of generalizations that frustrate attempts at more cognizant classification.

This seems obvious, but it’s something that I find myself continually confronted with, because of the language we assume others understand. Let’s take the concept of retention, for example. If I use that word, I have a specific concept in mind, and it’s related to continued user interaction. However, that continued interaction can take many forms:

  • Any and all continued touch points (events logged) by a user, perhaps after their initial account creation, or a specific outreach effort on my part
  • Specific kinds of touch points by a user, or use of a particular feature
  • Levels of engagement, knowing that a user has been active X days out of seven, or similar measures
  • Defining a cohort, or group, of users, and tracking the fall off in the entire group over time

All of these can be described as retention, they represent variations on a theme. Getting to a specific variation can drive a distinct analytic path.

So how do we get to actionable insight? How do I identify the meaningful subpopulations within a large, complex data set?

I think it starts with data. This is where we get to what drew me to Interana – speed and flexibility in defining queries on a data set provide enormous power in gaining intuitive insights about a system, in focusing advanced analytics, and in making business-relevant decisions. By concentrating on data that is focused on events first, and the actor second, I can relinquish some of my assumptions derived by my meta-knowledge about an actor. I can build an understanding from sequences of actions taken, as opposed to more general actor characteristics. Which means I’m more likely to recognize when behavioral patterns counter expectations. And that’s only to the good, when those are the patterns I’m using to drive my day-to-day interactions with specific customers, my general engagement strategy with user groups across customers, and plans for broadening adoption to new users, new data consumers, and new business challenges.

Complexity is good. My world got a whole lot more interesting when I stopped assuming everything was a sphere.