You can’t throw a stone these days without hitting an article about “big data." There’s endless advice on how you can be more “data-driven” (we prefer “data-informed”); how data can drive you and your business to infinite success; how X or Y analytics product will make your data dreams reality.
People talk a lot about how “big” data can be, but much less about how “big” the problem data itself actually presents; that's a problem we’ve only barely begun to tackle.
I don’t mean the day-to-day hard things we deal with — the janitorial slog of cleaning data, formulating SQL, building charts, getting your data infra right — I mean the larger picture. How do we go about the business of translating the bits we collect at (often) massive scale into actual human knowledge. Into, say, answers to any question your mind can concoct. Into something that consistently allows you to be more informed than you were yesterday. What would it mean for us if we could do this? The answer to that is so much bigger than the incremental gains we see touted in those hundreds of articles we see about big data.
In the 1980s, we experienced a personal computing revolution. Computing went from the domain of NASA mathematicians and government code breakers (if you go solely by the Academy Award-nominated motion pictures I've seen) to the domain of my grand aunt documenting her golden retriever’s diet.
What’s fascinating is that as much as the “masters of the computing universe” foresaw the vast potential of personal computing, they dramatically underestimated it. (Side note: the most must-see prediction for how computers would change the world, at least from a salesmanship perspective, is unsurprisingly Steve Jobs in 1980 with his “Computers are a Bicycle for Your Mind” pitch. Consider how you’d explain computers to the layman pre-Macintosh, and see if you could do better).
On a similar note, despite all the hype and hysteria you hear about the potential of “big data," it is still incredibly understated. Just as computer power is power, data is power. Just as bringing computation closer to people had a transformative impact, so too will our ability bring people closer to data, making learning from it orders of magnitude easier.
In fact you see the same evolution with data that we saw in computing. In the beginning, there mainframes with high priests alone having access in MIT basements, by way of stacks of punch cards (we can compare this to a data scientist writing a MapReduce job on Hadoop.) Later, select wizards who knew how to work the machines granted mortals indirect layers of hoops from which they too could benefit. Now, children are able to hold devices in their hands and play Minecraft and my aunt can taunt her golden retriever with her phone (note: neither should be considered computers anymore — nor should we consider good data engines databases).
Framed like this, data is a pretty cool and massive problem to solve. It’s a challenge that intersects UX, technology, infrastructure, languages. When you look at the state of data at most companies today, it doesn’t really seem like we’ve even made a dent. A BI tool on Redshift doesn’t seem to cut it. Having to ask your data team basic English questions about the product you’re developing and waiting, or having a few pre-canned charts and metrics about retention, or having to program SQL queries — it’s like drinking an In 'N Out shake through a skinny straw.
At Interana, we think to get it right you need to push the limits on 4 pillars: UX, speed, scale, and flexibility/expressibility of your data engine. Speed not for speed’s sake but for interactivity. UX not just for pretty colors and charts, but for ease and power and expression. The goal is strip away the crap ton of things we should never have to think about with data like indexes, schemas, tables, performance. Each of those pillars independently are hard problems, and in combination it’s multiplicatively harder — but if you can solve for all at once, it’s magic.
When we talk about problems being hard, there’s good hard, and there’s crappy hard. Data being hard because you have to write SQL to answer a 7-word english question is just pure crap hard. Data being hard because schemas are messy, or because you have to go through layers of technical or human indirection is dumb.
Good hard would be using all the information at your disposal — like how people use your product — to focus on figuring out how you can build the best version of your product. It's having your conventional wisdom aggressively challenged by the answers to the questions you have of your data, spurring thinking, and forcing you to adapt. Good hard is forcing you to understand your users and product better, and making you better at your job. That doesn’t happen when there’s too much of that crappy hard stuff.
At Interana, our "good" hard problem is making data easy (or, as I like to think of it, making "crap" hard things easy is often "good" hard). We’re trying to tackle the problem above and we think we can be an order of magnitude better than what exists, if not more. We think we can give people way more visibility into their product and business, and bring data closer to everyone. What’s cool about the hard problem of data is we’re never really done. The bar can always be lower. There is always more ways to strip away friction, to leverage the human ability to see patterns and iterate to make data even easier.
I must say, this is a pretty satisfying “good” problem to work on. Hopefully, it's one that also enables a lot of other people and organizations to focus on their particular "good" problems and produce some really great outcomes.