In a previous post, “Conducting a Data Science ‘Project’,” I point out some of the key differences that separate data science from traditional project management. While traditional project management is focused more on goals, planning, and tangible deliverables, data science is a more open-ended operation with the focus on discovery and innovation — less tangible, but no less valuable, deliverables.
Data Science Challenges
To arrive at a deeper understanding of the differences between traditional project management and data science, consider the unique challenges of a data science project:
- Unlike traditional projects, data science “projects” have a much broader scope and are much less constrained by cost and schedule requirements. As a result, data science teams are more susceptible to wandering— losing focus and spending too much time trying to answer irrelevant or unimportant questions. Having a narrow scope, a limited budget, and strict deadlines just isn’t compatible with the scientific method that data science teams should follow, but these teams still need to produce something of value to the organization.
- While traditional projects can benefit from having a narrow and well-defined scope, data science teams often must resist forces in the organization that attempt to “box them in.” The process must be empirical and exploratory. A data science team functioning as it should thinks outside the box. If a team is forced to engage in setting goals and achieving milestones, it is likely to look for what it already knows. A team is unlikely to discover anything new when it is forced to explore within the confines of a well-defined box.
- Data science teams must also break away from traditional organizational structure and language. The language of most organizations still hinges on terms such as “mission,” “objectives,” and “outcomes.” Meetings still usually revolve around setting goals and objectives, planning, and progress reports. Many organizations find it difficult to imagine a team devoted solely to exploration and discovery. As a result, data science teams often struggle to swim upstream against a very strong current.
Comparing a Traditional and a Data Science “Project”
Let’s look at a traditional project and compare it to what a data science team does. Then, we’ll look at what often happens when traditional project management is applied to a data science team.
Consider a typical software project. Your organization wants to develop a human resources (HR) self-help portal for its employees. The project charter is to create the portal as a way to lower costs and improve overall employee satisfaction. The project will have a set cost, but the organization will save money by reducing HR costs and employee turnover. The estimated return on investment (ROI) for this project is substantial. The plan lays out all the features in a requirements document and includes a development schedule and detailed budget. The project manager will oversee development and update the plan to account for any changes in schedule, budget, or product requirements.
In contrast, consider how a data science team operates. The team is small — four to five people, including a research lead, a couple data analysts, and a project manager. Their “mission” is to help the organization come to a better understanding of the customers’ needs and behaviors in the hopes that this deeper understanding reveals opportunities to generate more revenue.
The research lead starts by asking questions such as these:
- What do we know about our customer?
- What do we assume about our customer?
- Why does our customer shop with us instead of our competitors?
- What might make our customers shop with us even more?
The data analysts do their job — analyze the data — to come up with answers to these questions. They deliver the answers in the form of data visualizations — graphic summaries of the data. For example, the data visualizations may be graphs that shed light on customer income and spend, as shown here. The x-axis (horizontal) represents income, and the y-axis (vertical) represents spending. Note that customers with higher incomes don’t necessarily spend more. Those who have an income around $20k–$30k seem to spend the most.
The analysts could also look at data from social media platforms and create a word cloud of feedback from thousands of customers, as shown below. For example, some of the largest words in the word cloud are “travel,” “recipe,” and “restaurant.”
Based on the knowledge and insight gleaned from these data visualizations, the team is likely to ask more questions, such as “Why do customers in a certain income bracket spend more than customers in higher or lower income brackets?” and “Why do our customers like to travel?” and “When our customers travel, where are they most likely to go?”
As you can imagine, knowing more about customers can lead to higher sales. The team could then share its discoveries with others in the organization. Marketing may decide to advertise more in travel magazines. Product development may shift its focus to products that are more closely related to travel. Sales might focus more if its efforts toward customers in a specific income bracket.
Then again, the team may hit a dead end. A data visualization created to analyze spending patterns among customers who travel and those who don’t is inconclusive, as shown below. It reveals only that customers who travel outspend, by a relatively small margin, those who don’t travel and that customers who do travel visit a variety of destinations around the world and the total spend by customers who travel to those destinations is no greater than the total spend by customers who don’t travel. The data visualization doesn’t provide sufficient evidence to support a change in what the company is doing, so the team abandons this line of enquiry and shifts direction.
Applying Traditional Project Management to Data Science
Imagine trying to shoe-horn data science into a traditional project management framework. How would you define the scope of the project when your exploration can lead you in so many different directions? How can you meet predetermined milestones when you’re building an ever-increasing body of knowledge and insight about your customers? How can you possibly meet a deadline when you don’t know, specifically, what you’re looking for? How do you budget for time when you have no idea how long it will take to find the answers?
Data science is all about learning, and “learning” is a verb. Specifically, it is a verb in the form of a present participle, which conveys continuous action. Data science is engaged in ongoing discovery and innovation. It doesn’t conform to the traditional project management framework. Don’t try to force it to.