As I explained in a previous post, “Building a Top Notch Data Science Team,” a data science team should consist of three to five members, including the following:
- Research lead: Knows the business, identifies assumptions, and drives questions.
- Data analyst: Prepares data, selects BI tools, and presents the team’s findings.
- Project manager: Distributes results, democratizes data, and enforces learning.
Together, the members of the data science team engage in a cyclical step-by-step process that generally goes like this:
- Question: The research lead or other members of the team ask compelling questions related to the organization’s strategy or objectives, a problem that needs to be solved, or an opportunity the organization may want to pursue.
- Research: The data analyst, with input from other team members, identifies the data sets required to answer the questions and the tools and techniques necessary to analyze the data. The data analyst conducts the analysis and presents the results to the team.
- Learn: The team meets to evaluate and discuss the results. Based on what they learn from the results, they ask more questions (back to Step 1). They continue the cycle until they reach consensus or arrive at a dead end and realize that they’ve been asking the wrong questions.
- Communicate and implement: The project manager communicates what the data science team learned to stakeholders in the organization who then work to enforce the learning or implement recommended changes.
Data science teams also commonly run experiments on data to enhance their learning. Experiments generally comply with the scientific method:
- Ask a question.
- Perform background research.
- Construct a hypothesis.
- Test with an experiment.
- Analyze the results and draw conclusions.
- Record and communicate the results.
Suppose your data science team works for an online magazine. At the end of each story posted on the site is a link that allows readers to share the article. The data analyst on the team ranks the stories from most shared to least shared and presents the following report to the team for discussion.
The research lead asks, “What makes the top-ranked articles so popular? Are articles on certain topics more likely to be shared? Do certain key phrases trigger sharing? Are longer or shorter articles more likely to be shared?”
Your team works together to create a model that reveals correlations between the number of shares and a number of variables, including the following:
- Specific key words or phrases
- Article length
- Graphics used
- Article tone (for example, serious or humorous)
The research lead is critical here because she knows most about the business. She may know that certain writers are more popular than others or that the magazine receives more positive feedback when it publishes on certain topics. She may also be best at coming up with key words and phrases to include in the correlation analysis; for example, certain key words and phrases, such as “sneak peek,” “insider,” or “whisper” may suggest an article about rumors in the industry that readers tend to find compelling.
Based on the results, the analyst develops a predictive analytics model to be used to forecast the number of shares for any new articles. He tests the model on a subset of previous articles, tweaks it, tests it again, and continues this process until the model produces accurate “forecasts” on past articles.
At this point, the project manager steps in to communicate the team’s findings and make the model available to the organization’s editors, so it can be used to evaluate future article submissions. She may even recommend the model to the marketing department to use as a tool for determining how to charge for advertising placements — perhaps the magazine can charge more for ads that are positioned alongside articles that are more likely to be shared by readers.
Striving for Innovation
Although you generally want to keep your data science team small, you also want people on the team who approach projects with different perspectives and have diverse opinions. Depending on the project, consider adding people to the team temporarily from different parts of the organization. If you run your team solely with data scientists, you’re likely to lack a significant diversity of opinion. Team member backgrounds and training will be too similar. They’ll be more likely to quickly come to consensus and sing in a chorus of monotones.
I once worked with a graduate school that was trying to increase its graduation rate by looking at past data. The best idea came from a project manager who was an avid scuba diver. He looked at the demographic data and suggested that a buddy system (a common safety precaution in the world of scuba diving) might have a positive impact. No one could have planned his insight. It came from his life experience.
This form of creative discovery is much more common than most organizations realize. In fact, a report from the patent office suggests that almost half of all discoveries are the result of simple serendipity. The team was looking to solve one problem and then someone’s insight or experience led in an entirely new direction.