Building a data science team is not as simple as hiring a database administrator and a few data analysts. You want to democratize your data — you want the organization’s data and the tools for analyzing it in the hands of everyone in the organization. You want your entire organization to think about your data in creative and interesting ways and put the newly acquired information and insights into action.
Yet, your organization should have a small data science team that’s focused exclusively on extracting knowledge and insights from the organization’s data. Approach data science as a team endeavor — small groups of people with different backgrounds experimenting with the organization’s data to extract knowledge and insights.
Keep the team small (three to five members, max). You need to fill the following three positions:
- Research lead
- Data analyst
- Project manager
In the following sections, I describe these roles in greater detail.
Note: When building a data science team, you’re essentially breaking down the role of data scientist into three separate positions. Finding a single individual who knows the business, understands the data, is familiar with analytical tools and techniques, and is an effective project manager is often an insurmountable challenge. Creating a team enables you to distribute the workload while ensuring that the data is examined from different perspectives.
The research lead has three areas of responsibility:
- Know the industry and the business
- Identify assumptions
- Drive questions
The research lead should be someone from the business side — someone who knows the industry in which the business operates, the business itself, and the unique intelligence needs of the business. He or she must recognize the role that the data science team plays in supporting the organization’s strategic initiatives and enabling data-driven decision-making at all levels.
A good research lead is curious, skeptical, and innovative. Specialized training is not required. In fact, a child could fill this role. For example, Edward Land invented the Polaroid instant camera to answer an interesting question asked by his three-year-old daughter. When they were on vacation in New Mexico, after he took a picture with a conventional camera, his daughter asked, “Why do we have to wait for the picture?”
Asking compelling, sometimes obvious, questions sounds easy, but it’s not. Such questions only seem easy and obvious after someone else asks them.
Of course, asking compelling questions is something everyone in your organization should be doing. Certainly everyone on the data science team should be involved in the process. However, having one person in charge of questions provides the team with some direction.
Maintaining separation between the people asking the questions and the people looking for possible answers is also beneficial. Otherwise, you’re likely to encounter a conflict of interest; for example, if the people in charge of answering questions are working with a small data set, they may be inclined to limit the scope of their questions to the available data. A research lead, on the other hand, is more likely to think outside that box and ask questions that can’t be answered with the current data. Such questions would challenge the team to capture other data or procure data from a third-party provider.
Your data science team should have one to three data analysts to work with the research lead to answer questions, discover solutions to problems, and use data in creative ways to support the organization’s operations and strategy. Responsibilities of a data analyst include the following:
- Identify, obtain, cleanse, and aggregate the data in preparation for storage and analysis
- Select/develop software and techniques for extracting meaning from data
- Summarize/analyze the data
- Communicate knowledge and insights extracted from the data in the most effective ways to stakeholders in the organization — presentations may include stories, slide shows, tables, charts, maps, and other visualizations
Note: The data analyst on the team should be familiar with software development. Many of the best data visualization tools require some software coding.
The primary purpose of a project manager is to protect the data science team from increasing demands placed on it from the rest of the organization. For example, I once worked for an organization that had a very creative data science team. They were coming up with new and interesting ways to use the company’s vast credit card data. During the first few months, the data science team was mostly left alone to explore the data. As their insights became more interesting, the rest of the organization became more curious. Departments started calling on team members to give presentations. These meetings increased interest across the organization, which led to even more meetings. After a few months, some people on the data science team were in meetings for up to twenty hours a week! They shifted roles from analysts to presenters.
As a result, the team spent much less time analyzing data. The same departments who were requesting the meetings started asking why output from the data science team was dwindling.
An effective product manager serves as a shield to protect the team from too many meetings and as a bulldozer to break down barriers to the data. In this role, the project manager has the following responsibilities:
- Democratize the data: Democratizing the data means providing data access to everyone in the organization, so they can query the data warehouse and conduct analytics to some degree on their own — typically through the use of business intelligence (BI) “dashboards.”
- Gain access to data silos: In organizations without a central data warehouse, various divisions or departments may have their own databases, which, for whatever reason, may be made off limits to the data science team. The project manager is responsible for convincing various groups to share their data with the team.
- Share the results: The project manager attends the meetings and delivers the presentations, so the data science team can continue to focus on analyzing the data.
- Enforce organizational learning: The project manager works closely with the research lead to ensure that the data science team’s insights are translated into actionable items. At the end of the day, the team will still be evaluated by what the organization learns. Someone needs to follow through and turn the insights into products or changes.
Working together, the research lead, analysts, and project manager function as a well-oiled machine — asking and answering questions, uncovering solutions to problems, developing creative ways to use the organization’s data to further its competitive strategy, and working with other groups and individuals throughout the organization to implement data-driven changes.