Scottish novelist and folklorist Andrew Lang once wrote, “I shall try not to use statistics as a drunken man uses lamp-posts, for support rather than for illumination.” Unfortunately, many organizations that consider themselves “data-driven,” are like drunkards who use lamp posts to support rather than challenge their assumptions and beliefs, to obscure their ignorance rather than learn.
An organization that uses data more for support than illumination poses a real challenge to its data science team, because leadership is likely to view anything the team discovers that contradicts long-held beliefs to be bad information. Leadership may even discourage the asking of any questions that may be deemed to be a threat to organizational beliefs.
Your data science team needs to be sure to use data for discovery, which keeps the team from falling into the trap of using data merely to support what’s already known — or worse, to support misconceptions. As Mark Twain once wrote, “What gets us into trouble is not what we don’t know. It’s what we know for sure that just ain’t so.” A major benefit of data science is challenging accepted beliefs, especially misconceptions considered to be established truths.
Keep in mind that if your organization is relying on knowledge that’s not backed up by good data, it’s likely to run into trouble. Garbage in, garbage out; if the organization’s leaders are making decisions based on misconceptions and false assumptions, they’re probably making bad decisions. Imagine trying to navigate your way through New York City with a map of Chicago!
Three Areas of Responsibility
One way to ensure that your data science team remains true to its mission is to maintain some separation between its three areas of responsibilities:
- Hypothesis: Asking questions, developing hypotheses
- Research: Conducting research and analytics
- Implementation: Sharing the team’s discoveries and implementing change
Note that the three areas have some overlap. In these areas of overlap, the team engages in a continuous three-step process:
- Question (hypothesis)
- Research (research)
- Learn (implementation)
Hypothesis is the process of asking questions and making educated guesses that can be tested through experimentation with the data. This is primarily the role of the research lead. She knows the business, has a skeptical and creative mind, and has a knack for asking compelling questions. Broad business knowledge is key, because it provides sufficient background to feed the research lead’s curiosity. Imagine how difficult it would be to ask questions about scuba diving, for example, if you had never scuba dived. You wouldn’t even know the vocabulary needed to formulate an intelligent question about it.
A skeptical mind is also crucial to performing well in this role. While a research lead is wise to communicate with others across and at all levels of the organization, a skeptical mind prevents her from succumbing to group think and accepting as fact any deeply ingrained false assumptions or beliefs. The research lead should also be given the freedom to ask questions, regardless of how uncomfortable those questions are for the organization. As the research lead communicates with others in the organization, she needs to stay true to the data and not be swayed by politics, biases, or other pressures.
Research is the foundation for the data science team’s work. It often provides the basis for questions and follow-up questions as well as answering those questions. Research is in the realm of the data analyst, who is the only one on the team who works directly with the data. The data analyst works with the research lead to come up with interesting questions. He then mines the data in various ways to find answers and delivers the results via a report, which is typically illustrated with data visualizations to clearly convey the information and insights.
The data analyst works closely with both the research lead and project manager, but the two relationships are independent and differ significantly. His work with the research lead focuses mainly on exploring the data. He then works with the project manager to pass along knowledge and insights and prepare reports and presentations to share the team’s findings with the rest of the organization.
Implementation is the process of sharing what the data science team discovers with the rest of the organization and enforcing the what the team learned. This is the realm of the project manager, who must be sure that the team produces actionable intelligence. She then delivers the team’s discoveries to stakeholders across the organization, typically in the form of reports and presentations.
An Independent Unit
Although I break down the data science team into three areas of responsibility, in practice, the team functions as a unit. Everyone on the team works together to ask and answer questions and share the team’s discoveries with the organization. To a large degree, the team should function as an independent investigation and service agency within the organization. It should serve the business intelligence (BI) needs of various divisions and departments without being influenced by their assumptions or beliefs or any of the pressures under which they function.