Comparing Software Projects to Data Science Projects

Share on facebook
Facebook
Share on google
Google+
Share on twitter
Twitter
Share on linkedin
LinkedIn

In my previous post, “Conducting a Data Science ‘Project’,” I pointed out the differences between project management and data science. These differences are summarized in the following table:

Project Management Data Science
Planning Exploring and experimenting
Goals and objectives Discovery and knowledge
Schedule- and budget-driven Data-driven
Certainty Curiosity
Execution Innovation

You can see how these differences play out when comparing traditional software projects to typical data science projects, as presented in the following table. While traditional software projects are focused more on achieving a goal and delivering an end product, data science projects are more exploratory and open ended. Both have deliverables, but with software projects, the deliverables are more tangible and deadline-oriented, whereas data science tends to deliver a less tangible and ever growing body of knowledge and insights, which may be of even greater value to the organization.

Traditional Software Project Typical Data Science Project
Develop a new customer self-help portal Better understand a customer’s needs
Create new software based on customer feedback Create a data model to predict churn
Install a new server farm to increase scalability Discover new markets and opportunities
Convert legacy code into updated software Verify assumptions about customer behaviors

Despite their differences, software project management is fast becoming more like data science with the growing popularity of agile software development methodologies, such as Scrum, Extreme Programming (XP), Lean and Kanban, and Dynamic Systems Development Method (DSDM).

Like data science, many of these newer software development methodologies follow the scientific method, at least to some degree. That is, they often begin with research to assess the customer’s (end user’s) needs, and they build the software gradually in multiple, iterative cycles (commonly referred to as “sprints”). Team members are encouraged to experiment during these cycles to innovate and build knowledge that the team can draw on to achieve continuous improvement, both in the product being developed and the process used to create that product.

In many cases, the software development cycle is never-ending — the software is in continuous development, improving continuously with each development cycle and with each new release. As with data science, the focus is more on the process than the product and is open-ended — a never-ending cycle of building knowledge and insight and driving innovation. In the case of software development, this knowledge and insight is applied to continuously improve the software. With data science, the knowledge and insight is applied to continuously improve the organization.

Spotify, the digital music, podcast, and video streaming service, follows this same iterative approach in the development of its platform. The company nurtures a creative, failure-friendly culture, as reflected in its values:

  • Agile > Scrum
  • Chaos > Bureaucracy
  • Community > Structure
  • Cross pollination > Standardization
  • Enable > Serve
  • Failure recovery > Failure avoidance
  • Impact > Velocity
  • Innovation > Predictability
  • People > * (anything else)
  • Principles > Practices
  • Servant > Master
  • Trust > Control

Spotify’s approach to software development is rooted in the Lean Startup approach of “Think it, build it, ship it, tweak it.” The organization even hosts “hack days” and “hack weeks,” encouraging its development teams (called “squads”) to spend ten percent of their time building whatever they want with whomever they want.

Squads are given a great deal of creative license to develop and test new features with the condition that they try to “limit the blast radius.” They accomplish this by decoupling the architecture to enable each squad or “tribe” (a collection of squads) to work on an isolated part of the platform, so any mistakes are limited to that part; and by rolling out new features gradually to more and more users.

Spotify also places an emphasis on “capturing the learning.” Teams experiment with new tools, features, and methods and then discuss the results to figure out ways to improve both product and process. They document what they learn and share it with other teams, so everyone in the organization is better equipped to make data-driven decisions instead of decisions driven by authority, ego, or opinion.

Organizations would be wise to follow Spotify’s lead not only in developing new software but also in managing their data science teams — or, even better, in allowing and enabling the data science teams to manage themselves. Your organization’s data science team should feel free to ask questions, challenge assumptions, formulate and test their own hypothesis, and cross pollinate (reach out to others in the organization for insight and feedback). The team’s mission should be more about exploration, innovation, and discovery than about setting goals, meeting milestones, and staying on budget or on schedule.

More to explorer

The Data Story’s in the Details

In a previous post, “Structuring Your Data Story,” I provide guidance on the big picture of storytelling — nailing down the five