Data politics hinders progress and growth because it prevents you from making business decisions that fully leverage your data assets. Data politics impacts almost all organizations, especially ones that use powerful algorithms to support operations, track performance, analyze data, or gain competitive advantage. Such businesses allow only certain individuals or groups access to the algorithms. These restrictions influence how data is valued, who owns it, and who gets to use it.
But if your organization is committed to making decisions based on data instead of hunches, you’ll need maximum data transparency into the “black box” of your algorithms. Engage teams across the organization in a candid conversation about what they want to learn from the data, how they want to apply it to make their jobs easier, and how it can improve decision-making.
Start with the following questions:
- What data do I want to use and how much do I need for meaningful analytics?
- What constitutes reliable and valuable data?
- Who owns the data and is it necessary to restrict access to it?
Choosing the data to use
A behavioral analytics solution should give you the option to explore all your data—even the data you used to leave out because it was inaccessible, complicated, or too time-consuming to access. It should also enable you to go as deep as you want into each data source, so you can analyze every single row of your raw data—preferably in seconds so you don’t have wait hours for answers.
You may not need to analyze all your data sources for every single query. Your analytics platform can help you decide what’s most relevant for a given query or decision and how deep you need to go to get a representative sample. Be sure your solution samples at query time not at ingest. This avoids the drawbacks of setting up arbitrary samples during ingest, before you know the full scope of queries that will be made.
If you need to run the query across the full dataset, the platform will take into account the complete population that matches the query filters. Running a query across unsampled data takes a little longer, but the results are worth the wait because you’re asking the right question and including all relevant data in the answer.
What constitutes reliable and valuable data?
Today’s complex and huge data stores can be overwhelming, but once you start exploring your data, you’ll begin to know what you need. And don’t worry about asking wrong questions—there are no wrong questions.
Assess data across these three areas:
- Coverage: what population might provide insights into a particular topic?
- Quality: there’s no point running analyses across data that isn’t reliable. Make sure you understand the data source you’re using, and how it captures data.
- Timeliness: make sure the data covers enough of a time span that it can reveal how patterns and trends emerge and change. You may need up-to-the-minute data as well as data that stretches back several years.
Who owns the data and should access to it be restricted?
As data grows and becomes more diverse, businesses have a harder time valuing it and figuring out who should own it. Businesses tend to turn to IT and data scientists to manage data access. This can introduce serious delays because these data gatekeepers become overwhelmed with requests. Try not to let data politics and algorithm “ownership” dictate who owns and accesses data. Data is a corporate asset that should be leveraged across multiple teams to align goals within the business and help produce solid growth.
Make the right decisions for your business—analyze data without the blinders of data politics
Interana puts an end to data politics with its self-service model and mission to make data a part of everyone’s day. Its visual query builder offers an intuitive way for anyone to construct behavioral analytics queries, easily using features including filters, cohorts, metrics, sessions, and funnels. It integrates event data from all sources, eliminating the technical barriers to universal data access. And it runs queries rapidly against 100 percent of raw data or a subset sampled at query time to produce analysis that you can have confidence in.
This is the second of four blogs about the WSJ CIO Network conference. The first post in the series can be found here: " How Leading Companies Are Tackling the Challenges of Digital Transformation."