In my previous post, “Building a Data Science Life Cycle (DSLC),” I encourage you to adopt a structure for your data team’s activities that is conducive to the type of work it does — exploration. I refer to this structure as the Data Science Life Cycle (DSLC), illustrated below.
At first glance, DSLC appears to be a linear process, starting with identification and ending with learning, but the process is actually cyclical. Learning leads to more questions that return the team to the beginning of the process. In addition, mini-cycles often form within the DSLC as research and analysis results prompt questions that require additional research and analysis to answer, as shown below.
In this post, I drill down to illustrate how data science teams can function more effectively and efficiently within the DSLC framework by employing the following techniques:
- Working in sprints— relatively brief, intensive, iterative work sessions
- Using question boards
- Conducting productive meetings
- Breaking down the work
- Telling interesting stories
Iterating through DSLC Sprints
The DSLC isn’t designed to cycle over a long period of time. Two weeks is sufficient for a cycle (a sprint). That gives the team sufficient time to prepare and analyze the data and compose a story that reveals the knowledge and insight extracted from the data and its significance to the organization. With short cycles, if a specific line of enquiry proves fruitless, the team can change course and head in a different direction or tackle a new challenge.
You may have heard of sprints in the context of agile software development methodologies, such as Scrum, but the term actually originated in product development. A sprintis a consistent, fixed period of time during which the team runs through an entire lifecycle. Each sprint should run through all six stages of the DSLC, as shown below.
Using Question Boards
As I explained in an earlier post, “Building a Top-Notch Data Science Team,” teams should be small (four to five individuals) and include a research lead, data analyst, and project manager. Although every member of the team should be asking compelling questions, the research lead is primarily responsible for that task.
One of the most effective ways to inspire and share interesting questions is via a question board— usually a large whiteboard positioned near the data science team on which team members and others in the organization post questions or challenges. The board should have plenty of open space with a short stack of sticky notes in one of the corners. You may want to include a large arrow pointing down to the stack of sticky notes with the caption, “Ask a question.”
The question board should be open to everyone in the organization, including the research lead, other data science team members, executives, managers, and employees. Try to make your question board look as enticing as possible. Anyone in the organization should be able to walk by, grab a sticky note, and post a quick question.
Conducting Team Meetings
Given only two weeks to complete each sprint, your data science team should limit the amount of time it spends in meetings and keep those meetings focused on a specific purpose. I recommend that teams conduct five meetings over the course of a two-week sprint, each with a specific purpose and a time limit that the team agrees upon in advance:
- Research planning: During this meeting, typically about two hours long, the team chooses the questions/problems it wants to research, and the research lead and data analysts develop a research agenda.
- Question breakdown: During each sprint, the data science team should have at least two one-hour question breakdown meetings, during which they ask questions, evaluate and prioritize questions for the next sprint, and clear uninteresting questions from the board.
- Visualization design: Typically a one-hour meeting, during which the research lead and data analysts formulate rough-draft data visualizations to begin to extract knowledge and insight from the data.
- Storytelling session: During this meeting, typically one hour, the data science team presents a story about what the team learned during the sprint. They present more polished versions of their data visualizations, discuss questions on the board, and tell stories about those questions.
- Team improvement: At the end of each sprint, the team should have a two-hour post-mortem meeting to discuss challenges they encountered during the sprint and talk about improving the process moving forward.
Breaking Down Your Work
Breaking down your work involves allocating a sufficient time to all six stages of the DSLC. What often happens is that data science teams get caught up in the research stage — specifically in the process of capturing, cleaning, and consolidating the data in preparation for analysis. Given only two weeks per sprint to deliver a story, the data science team has little time to prep the data. Like agile software development teams, the data science team should look to create a minimally viable product (MVP) during its sprint — in the respect to data science, this would be a minimally viable data set, just enough data to get the job done.
Remember, at the end of a sprint, stakeholders in the organization will want to know “What do we know now that we didn’t know before?” If your team gets caught up in data prep, it won’t be able to answer that question.
Telling an Interesting Story
Organizations that make significant investments in any initiative want to see a return on investment (ROI), typically in the form of a deliverable. In the world of data science, the deliverable is typically in the form of an interesting story that reveals both the meaning and the significance of the team’s discoveries. Unlike a presentation or data visualization, which merely conveys what the team sees, a story conveys what the team believes. A good story provides context for understanding the data, along with guidance on how that understanding can benefit the organization.
An effective story accomplishes the following goals:
- Extracts meaning and insight from the data and simplifies the presentation of it.
- Makes the meaning and insight extracted from the data relevant to the organization and to specific questions or challenges.
- Engages the audience and leaves a lasting impression. While most people quickly forget a presentation, they typically remember a good story.
- Persuades the audience to take action. A good story ends with a call to action, even if that call to action is to “stay tuned” because the data science team is on to something interesting and needs more time to explore. At the end of your story, you don’t want your audience asking, “So what?” or, even worse, “Who cares?”