In a previous post, "Structuring Your Data Story," I provide guidance on the big picture of storytelling — nailing down the five key elements of a story: characters, setting, plot, conflict, and resolution. However, if you've ever heard someone tell a story, you know it takes more than those five elements to make it interesting. The devil is in the details. Skilled storytellers embellish their stories with plenty of details that feed the imagination and stimulate the senses. They make you feel as though you're watching the action unfold before your eyes.
In a similar fashion, your data science team should include plenty of details in every story it tells to flesh it out and make it more memorable. Details are like little mental sticky notes that help the audience remember the characters, setting, plot, conflict, and resolution. In addition, the details provide supporting evidence to the larger observations and claims being presented by the team.
Shots and Needles ?
An organization I once worked was struggling to get enough people to participate in its medical studies. The data science team was called in to figure out why. The team conducted some research and discovered that some people are afraid of needles, others are afraid of having their blood drawn, and a cross-section are afraid of both. This cross-section represented a lot of people.
The data science team asked some good questions and made some interesting discoveries. One such discovery was that people who participated in and had a positive experience with a medical study that did not involve needles or blood draws were more inclined to participate in future studies that die involve needles or blood draws.
The research lead (a nurse) had a great idea on how to tell that story with impact. She would start with a case study, changing the participant's name and a few details to protect the patient's anonymity. Her story went something like this:
When I was a nurse I could always tell who was afraid of needles. They always crossed their arms in a certain way. They grabbed both of their elbows as a way to protect themselves from the poke in the arm. There are a lot of people out there like that, and we need them to participate in our medical studies. So I'm going to tell you a little bit about someone I found in one of our reports.
Let's call her Tracy. She participated in one of our medical studies for a drug being developed to help people sleep. The first day of the study she showed up with her own pillow. She must've been optimistic about how well it would work. She was hoping that this new pill would help her since she had some trouble sleeping during periods of high stress.
It turned out that Tracy was one of the participants who didn't get any benefit from the drug. When she left, she told the nurse that her father was a doctor, so she felt some obligation to participate in medical studies. She said she could never be a doctor because she was scared of blood and needles. A few months later she decided to participate in a flu vaccine trial. The study required needles for the vaccination and for later blood tests.
So why did Tracy decide to participate?
The obvious answer the research lead's question is that Tracy participated because she felt an obligation to do so. After all, she didn't actually benefit in any way from the sleep study. She felt as though she couldn't contribute to helping others with their health issues directly by being a doctor or nurse, so she would do her part by participating in studies.
Now, think about the story you just read. What do you recall? Clearest in your mind are probably the details — the description of how people held their arms when they were afraid of needles, Tracy's name, Tracy bringing her pillow to the sleep study, what her dad did for a living, the trials she participated in, and so on. All of these details make it easier to remember the story and to remember the conclusion drawn from the story — that Tracy participated in medical studies because she felt obligated to do so.
When you tell a data science story, try to use details to paint a picture in words. They help your audience connect to characters, setting, plot, conflict, and resolution.
Avoid the Temptation to Deliver a Presentation
Data science is a combination of science and art. The data science team follows the scientific method to explore and discover — to add to the organization's growing body of knowledge and insight. The team then uses the art of storytelling to convey that knowledge and insight to people across the organization in a compelling and memorable way.
Business presentations are boring. They're not structured to be interesting. They're static. They communicate the current state of affairs. They’re like a verbal “reply all” to the organization's stakeholders. That’s usually fine for status meetings, but it falls short when you need to convey a point, make it stick, and transform the audience in a positive way.
Avoid the temptation to merely deliver reports or presentations. Use the data and the findings from your analysis to tell a compelling story. And be sure to include the details.
In a previous post, “Facilitating Better Data Analytics Questions," I stress the importance of asking compelling questions when serving as a member on a data science team. After all, questions are the impetus for exploration and discovery. In that post and a subsequent post, "Three Places to Look for Data Analytics Questions,"I recommend several techniques initiating question sessions.
However, the techniques I recommend aren't helpful unless you and others on your data science team are comfortable asking questions. In this post, I present four common reasons that data science team members may be uncomfortable asking questions. Simply by recognizing the common barriers to asking questions, you are better equipped to overcome those barriers on your own.
Asking questions may be very uncomfortable, especially when you're asking someone who's in a position of authority and especially when the person you're asking has an intimidating presence. After all, your question may be perceived as being dumb or as challenging or threatening the other person. No doubt about it — some people have even been fired over asking very good questions.
As a result, many employees, even those who serve on a data science team, may be reluctant to ask compelling questions. They have a natural desire to protect themselves. Nobody wants to seem dumb, wrong, or confrontational.
Overcoming this barrier requires working up the courage to ask compelling questions. Sometimes, you just need to do it — force yourself. If you can't work up the courage, try the opposite tactic — fear. Remind yourself that your job is to ask good questions. If you don't ask, you're not doing your job. And if you don't do your job, your team will fail, and you'll all end up in the unemployment line.
The good news is that over time and with lots of practice, asking tough questions becomes second-nature. When you begin to see that asking questions isn't a threat, and you begin to reap the benefits of asking good questions, any fear you may have had quickly disappears.
Some data science teams just don't have enough time and energy to ask compelling questions. Asking questions is hard work; it's exhausting, especially when you're just getting started on a project. It might seem as though each question meeting gets longer. Instead of feeling as though you're making progress toward an answer or solution, you may feel as though you're getting further and further from it. At this point, the team can quickly become discouraged and stop asking.
Many data science teams fall into this trap, and as soon as they stop asking questions, they turn their attention to routine work, such as capturing and cleaning data or implementing new data analytics and visualization tools.
Often, the rest of the organization celebrates this shift from what's perceived as esoteric to more practical endeavors — real work. Many organizations prefer a busy team over an effective one. When this happens, everyone gets so focused on rowing that no one takes the time to question where the ship is headed and why.
Remember that there is no prize for the most data, the cleanest data set, or the best data analytics and visualizations. Prizes are given out for delivering insights and creating business value. You can't do that unless you spend quality time coming up with compelling and relevant questions.
Some data science teams struggle to ask questions simply because they have little experience doing so. This is especially prevalent when team members are engineers, software developers, or project managers — people who have built their careers on answering questions and solving problems. These people want to do, not ask. Team members who come from science or academia tend to have an easier time making the transition.
Nothing is wrong with answers and solutions. In fact, a data science team often needs its members to propose answers and solutions, so those can be tested. However, during question sessions, the team needs to find a way to transform some statements into questions. For example, a team member who is unaccustomed to asking questions may say something like, "I see that more women than men are buying running shoes on our website. Maybe it's because our marketing department caters mostly to women.” The team could easily convert those statements into a question: "Why do more women than men buy running shoes on our website?"
Remember: statements don't spark discussion. Usually, the only option is for the other person to agree or disagree. With a question, the team can begin to consider a range of possibilities and discuss the data it needs to examine for answers.
A Corporate Culture That Stifles Questions
Some data science teams are stifled by a corporate culture that discourages employees from asking questions. In his book The Magic of Dialogue: Transforming Conflict into Cooperation, social scientist Daniel Yankelovich points out that most organizations in the U.S. have a culture of action. When they encounter a problem, their first instinct is to fix what's broken. Asking questions impedes progress.
Quick, decisive action is often needed in organizations, but it's counterproductive in data science, where the focus is on learning and innovation. One thing you don’t want to see the data science team doing is getting wrapped up in routine work to accomplish something practical. You don’t want the research lead saying something like, “You can ask questions once you finish uploading all the data to the cluster.” The team shouldn't be focused on completing projects but on coming up with new insights.
When you’re working on a data science team, watch out for an individual or organizational bias against questions. Questioning is one of the first steps toward discovery. If you skip this step, your team, and the organization overall, will have trouble learning anything new.
In my previous post, "Challenging Evidence and Conclusions in Data Science," I encourage data science teams to be skeptical of any claims or evidence that supports those claims, and I provide several techniques for challenging claims and evidence.
However, missing data can be just as misleading as wrong data, if not more so. One of the big problems with missing data is that people can't see what's not there. When you have data, you can check for errors and validate it. With missing data, you have nothing to check. You may not even think to ask about it or look for it.
For example, suppose you see the following graph with the headline: “Major Heat Wave in Atlanta!"
Your initial reaction might be that temperatures are rising precipitously in Atlanta and something must be done to reverse this dangerous trend. What's missing from this graph? The months along the horizontal axis: January through July. Of course monthly temperatures are going to rise dramatically over the spring and summer months!
I once worked for an organization that was trying to figure out why more men than women were participating in their medication trials. A report from the company's labs showed that 60 percent of its study participants were men compared to only 40 percent who were women. The data science team was assigned the job of finding out why men are more likely to participate in the company's medication studies than women.
When team members received this report, they asked, “What significant information are we missing?” “What does it mean that men are more likely than women to participate?” Does that mean that more men applied or that equal numbers of men and women applied but that a greater number of men were accepted? Or does it mean that equal numbers of men and women applied and were accepted but more men actually participated?
This additional data would shift the team's exploration in different directions. If more men applied, the next question would be "Why are men more likely than women to apply for our medication studies?" If equal numbers of men and women applied but more men were accepted, the next question would be "Why are more men being accepted?" or "Why are more women being rejected?" If equal numbers of men and women applied and were accepted but more men actually participated, the next question would be "Why are men more likely to follow through?" As you can see, the missing data has a significant impact on where the team directs its future exploration.
When you encounter a scenario like this, consider both what data might be missing and why it might be missing:
This last question turned out to be significant. The benefit to having more women participate in the company's studies is that young women are more likely to be on prescription medication, which would make the studies more comprehensive. The medication studies would be able to test for a greater number of drug interactions. The flip side is that many women couldn't participate because they were taking a prescription medication that prohibited them from participating in the study. The statistic could then be rephrased as "60 percent of those who are allowed to participate in our medication studies are men.” This tells an entirely different story.
Data science teams need to remain vigilant regarding missing information. If a claim seems too good or too bad to be true, the team needs to question it and ask, "What's the rest of the story? What's missing? What's been omitted, intentionally or not?" The team also should always be asking, "Do we have all the relevant data?"
Data drives the data science team's exploration and discovery, so the team must be on the constant lookout for bad data, which can lead the team astray or result in erroneous conclusions. In this post, I present several ways to challenge the data the team is provided to ensure that the team is working with accurate information and to generate addition questions that may lead to valuable discoveries.
Questioning the "Facts"
Many organizations rely on what they believe to be facts in their daily operations. Questioning these "facts" may be taboo for the rest of the organization, but they are fair game to the data science team. After all, one of the data science team's key obligations is to challenge assumptions.
Whenever your data science team encounters a "fact," it should challenge the claim by asking the following questions:
When you're working on the data science team, you'll see all kinds of well-established "facts." The source of these "facts" are numerous and varied: intuition, personal experiences, examples, expert opinions, analogies, tradition, whitepapers, and so on. Part of your job as a member of the data science team is to question these "facts," not reject them outright. As you explore, you may find evidence to support the "fact," evidence to refute it, a lack of evidence, or a mix of inconclusive evidence. Keep an open mind as you gather and examine the evidence.
Considering Alternate Causes
It's easy to saythat correlation doesn't imply causation — just because one event follows another doesn't mean that the first event caused the second — but distinguishing the difference between correlation and causation is not always easy. Sometimes, it is easy. If you bump your head, and it hurts, you know the pain was caused from bumping your head.
However, sometimes, it is not so easy. For example, when a doctor noticed that many children were developing autism after receiving a vaccination to protect against measles, mumps, and rubella, he and some of his colleagues found it very tempting to suggest a possible cause-effect relationship between the vaccination and autism. Later research disproved any connection. It just so happens that children tend to develop autism about the same time they are scheduled to receive this vaccination.
Whenever your data science team encounters an alleged cause-effect relationship, it should look for the following:
Uncovering Misleading Statistics
While true that "numbers don't lie," people frequently use numbers, specifically statistics, to lie or mislead. A classic example is in advertisement, where 80 percent of dentists recommend a specific toothpaste. The truth is that in many of these studies, dentists were allowed to choose several brands from a list of options, so other brands may have been just as popular, or even more popular, than the advertised brand.
When your team encounters statistics or a claim based on statistics, it needs to dig into those numbers and identify the source of the information and how the numbers were obtained. Don't accept statistics at face value.
Remember that a data science team can only be as good as the data (evidence) it has. Many teams get caught up in capturing more and more data at the expense of overlooking the data's quality. Teams need to continuously evaluate the evidence. The techniques described in this post are a great start.
Bottom line, the data science team needs to be skeptical. When presented with a claim or evidence to back up a claim, it needs to challenge it. An old Russian proverb advises "Trust but verify." I go a step further to recommend that you not trust at all — be suspicious of all claims and evidence that your data science team encounters.
In a previous post, "Encouraging and Facilitating Data Analytics Questions," I recommend a couple ways to get the get the ball rolling when it comes to getting people in your organization to start asking compelling questions. However, getting people to ask great questions is not always as simple as creating the right environment. Even a highly skilled data science team often needs more guidance.
To stimulate questions, it is often helpful to focus on specific areas that are fertile grounds for questions. In this post, I highlight three key areas that are not only the places you’ll find great questions, but also are a good place to start. These are questions that:
Note: These three areas are intended to initiate the process or get your team moving if it's stuck. Don't let these areas limit the scope of your exploration. If you address these three areas, you’re bound to come up with at least a few questions to grease the gears. When the team develops some momentum, team members will naturally ask more questions.
Clarify Key Terms
George Carlin once joked that he put a dollar in a change machine and nothing changed. Jokes like this are possible because many words in the English language have different meanings based on the context in which they're used and on different individual's understanding of the words. While jokes are funny, however, people often get into heated arguments when they don't have a shared understanding of what certain words or phrases mean. Just look at how different people define "success." For some, it's spending time with family, for others it's financial security, and for some knowledge or power.
The world of business is not immune to ambiguity inherent in certain terms; for example, ask two people to define "custom satisfaction." Does it simply mean that the person is a return customer? Is a customer who never complains satisfied? Can a customer who returns a product for a refund be satisfied? If a customer never buys another product, can we assume that customer was not satisfied?
Your data science team needs to be sensitive to ambiguous terms and nail down their intended meanings. Here's a short list of ambiguous terms commonly used in various organizations:
Identify "Facts" That Are Really Assumptions
People often accept assumptions as facts. A company's leadership, for example, may believe that the company has such a unique manufacturing process that nobody can compete with it on price or quality even when that's not true. The truth may be that some other company has yet to develop something better or that there is an entirely new product being developed somewhere that will make the company's existing product obsolete — leadership just doesn't know about it yet.
In general, assumptions have four characteristics:
Data science teams must remain on the lookout for false or questionable assumptions. Not all assumptions are bad. If the assumption reflects reality and facilitates positive or productive decisions and activity, it can be helpful. However, false assumptions can create blind spots and introduce misinformation into the decision-making process.
Reveal Errors in Reasoning
Data science teams need to be aware of the possibility of errors in data and errors in reasoning, which are even worse. A data error may result in a minor setback or a series of false reports. On the other hand, an error in reasoning can lead the team down the wrong path or result in completely wrong conclusions. Watch out for the following types of logical fallacies(reasoning that results in invalid arguments):
All three of the techniques described in this post boil down to listening and observing closely and being skeptical about what you hear and observe. Whenever you encounter a statement presented as a fact, ask yourself, "Is this really true?" Whenever you encounter someone presenting a position, ask yourself, "Is the conclusion based on sound reasoning?" Questions like this force you to take a closer look and determine for yourself the truth and validity of a statement or conclusion.
The success of any data science initiative hinges on the team's ability to ask interesting questions that are relevant to the organization's success and the team's ability and willingness to challenge assumptions and beliefs. After all, without questions, you can have no answers. However, asking compelling questions and challenging long-held beliefs that have become accepted as facts can be a significant challenge, especially in organizations with strict hierarchies that discourage questioning and the challenging of authority.
If your data science team is struggling to come up with compelling questions and hesitates to challenge assumptions, the suggestions I present in this post can get the ball rolling. Getting started is the most difficult part. As soon as the team gets into the swing of asking questions and questioning beliefs, it will have no shortage of follow-up questions and problems to investigate.
Conduct Question Meetings
One of the best ways to encourage data science team members to ask questions and challenge beliefs is to build an environment that's conducive to the free exchange of ideas. The research lead is ultimately responsible and can start to nurture the free exchange of ideas by modeling the desired behavior — listening and learning without judging. Everyone on the team should engage in deep listening— focused listening that enables them to hear and understand what others are saying, ignoring any initial impulse to judge what they hear. Team members need to recognize that they have plenty of time later to analyze what they hear, but the first step is to fully understand what the other person is getting at.
A good way to encourage questions and reinforce deep listening is to conduct question meetings. In these meetings, the research lead should encourage participants to ask questions before making statements. This techniques is sometimes called a "question first" approach. These meetings are about eliciting the maximum number of questions. They’re focused on everyone asking their questions and listening. Ban smartphones, laptops, and other electronic devices from these meetings. Everyone should focus on listening, although you may want to assign one person in the meeting the task of taking notes.
Although question meetings are mostly unstructured, consider starting the meeting like this:
Avoid quick statements that are likely to limit the scope of the discussion, such as "The CEO suspects that we are losing market share due to the recent reorganization of our marketing department." Such statements keep people from coming up with their best ideas. Remember that it’s the discussion that gives your team the greatest value. You want the team to consider all possibilities.
After a question meeting, you should have plenty of questions — far more than you need and some far more valuable than others. Now it's time to pan for gold — to identify the few questions you want your team to explore.
When evaluating questions, it often helps to categorize questions as open- or close-ended and then identify individual questions as essential or non-essential:
If you’re the research lead, make sure that the team is not asking too many of any one type of question. Too many open-ended questions can result in the team spending too much time wondering and not enough time exploring the data. Too many close-ended questions can result in too much time digging up facts and too little time looking at the big picture.
You can also categorize questions as essential and non-essential:
If you’re a fan of detective shows, you’ve probably seen a crime wall. That’s when a detective tries to figure out all the different pieces of an unsolved mystery. He or she puts up pictures and notes on a wall and tries to connect the different pieces. The board becomes a visual story. That’s why you’ll often see the detective sitting on the floor staring at the board trying to pull together the story from all the little mysteries in the data.
Your data science team will have a similar challenge. They’ll try to tell a story but they’ll only have pieces of the puzzle. Your team can use the same technique to create a question board—a place where they can see all the questions and data. That way they can tell a larger story.
Creating a question board is a great way to display ideas and solicit questions from your team and the rest of the organization. At the very top of the board, you should put a simple identifier such as “question board” or “ask a question.” The question board is a clear way to communicate and organize them in one place.
Your data science team should have dozens or even hundreds of different questions. The question board will likely be a key meeting point for the team as well as a great place for team members and stakeholders to talk about the project.
To start, place your question board next to someone’s desk on the team or in a hallway. Open spaces aren’t good for a question board. You’ll want people to stand next to the board and read the questions. Another suggestion is to put the board next to an area with a lot of traffic. Ideal places are next to the water cooler, snack bar, or bathroom. It should be a place where several team members can meet and not distract other people.
Usually, the best way to organize your board is to use different color sticky notes. You’ll want to organize your board from top to bottom. The sticky notes at the top of the board contain your essential questions. Use red or pink sticky notes for these questions. Below them, you can use yellow sticky notes for nonessential questions. Remember that these are questions that address smaller issues. They are usually closed questions with a correct answer. Finally, you can use white or purple sticky notes for results. These are little data points that the team discovered that might help address the question.
There are five major benefits to having a question board:
Remember that you want your team to have deep discussions. Everyone should be able to question each other’s reasoning. The team should listen to each other’s questions and try to come up with questions of their own. They should be focused on learning and not judging the quality of their questions.
The question board helps with this because it provides a place for people to focus their discussions. It also helps the team stand up and participate physically and come up with new ideas.
Many of your questions will be interconnected. Often, you’ll have essential questions that are connected to several closed, nonessential questions. If it’s on the wall, you can use string to show these connections. If it’s on a whiteboard, you can just draw different colored lines. This will help your team stay organized and even prioritize their highest value questions.
The question board will invite other people outside your team to participate. You might want to leave a stack of green sticky notes next to the board. Leave a marker and a small note that invites other people to add their own questions. Sometimes these questions from outside the team tell the most interesting stories.
Create Question Trees
Your question board will be a key part of communicating your data science story. It should have the questions that your team is working to address. It may also have little bits of data that suggest some answers. A good question board encourages other people to participate and tempts people to be part of your shared story.
One of the challenges of a question board is to have it filled with questions and keeping it well organized. Since it’s designed for a group discussion, you want everyone to be able to share the same information. It shouldn’t have several different groups of one person’s notes. If each group only has one person’s ideas, that one person will be the only one to understand its meaning.
Instead, all your questions should be organized using the same system. One of the best ways to do this is by creating question trees. A question tree is a group of sticky notes all related to one essential question. You’ll want to have the essential questions as the most attention grabbing color. Usually this is either red or pink.
Let’s imagine a question board for our running shoe website. One question that your team came up with is, “Can our website help encourage non-runners become runners?” If you’re the research lead for the team, you want to put this essential question on a red sticky at the very top of the board.
Underneath that essential question, you can start adding other questions. It could be another essential question such as, “What makes people run?” It could also be a nonessential question like, “Do non-runners shop on our site?” Since this is a closed question, you could put a little data sticky next to the yellow question sticky. Maybe something like, “Data suggest that 65% of our customers don’t run in a given week.” You could use a pie chart like the one shown below to illustrate this point.
Assume that this generated data comes from a survey that the company did on its customers. The question asked, “How many times, on average, do you run per week?” When you look at the data, you see that about 65% of the respondents don't run at all. 55% of the respondents run more than once per week.
Someone looking at the question tree should be able to follow the thought process of the team. She should see that the lower branches of questions started with one open-ended essential question (“Can our website help encourage non-runners become runners?”) and see the team addressing that question. She should be able to follow it all the way down to different branches.
Let’s say that the question, “What makes people run?”, branches off in its own direction. Underneath that question is another question that says, “Do they run to relieve stress?” Underneath that is another question that says, “Can non-runners who are stressed see the benefits of running?”'
With the question tree, the research lead now has a report to show progress to the rest of the organization. She could show that the data science team is working on several high-value questions simultaneously. It shouldn’t be too difficult to see how gaining insight into creating customers might increase revenue.
The question trees help the research lead connect the team’s work to real business value. A question board should have several questions trees. At the very top of the board, there should be several red or pink essential questions. Each of these should branch down like an upside down tree into several other questions. Be sure to use different color sticky notes as discussed previously (essential questions red or pink and nonessential questions yellow). Sometimes open questions will branch off into different question trees and you should end closed questions with little sticky notes that show the data.
Like any tree you’re going to want to prune your questions. This is one of the key responsibilities of the research lead. She needs to make sure that your questions lead to real business value. If he doesn’t think your questions will lead to insights, he might want to pull them off the question board so the data analyst doesn’t start searching for results.
Note: The research lead usually removes questions as part of the team’s question meetings. You don’t want your research lead pulling questions off the board without communicating the change to the team.
One of the key things about question trees is that they actually mirror how most teams come up with new questions. Remember that data science is using the scientific method to explore your data, which means that most of your data science will be empirical. Your team will ask a few questions, gather the data, and then they will react to that data and ask a series of questions. When you use a question tree, it reflects what the team has learned. At the same time, it shows the rest the organization your progress.
The success of any data science initiative hinges on the team's ability to ask interesting questions that are relevant to the organization's success and its ability and willingness to challenge assumptions and beliefs. After all, without questions, you can have no answers. However, asking compelling questions and challenging long-held beliefs can be difficult, especially in organizations with strict hierarchies that discourage questioning and the challenging of authority.
If your data science team is struggling to come up with compelling questions and hesitates to challenge assumptions, the suggestions I present in this post can get the ball rolling. Getting started is the most difficult part. As soon as the team gets into the swing of asking questions and questioning beliefs, it will have no shortage of follow-up questions.
Conduct Question Meetings
One of the best ways to encourage data science team members to ask questions and challenge beliefs is to build an environment that's conducive to the free exchange of ideas. The research lead is ultimately responsible and can start to nurture the free exchange of ideas by modeling the desired behavior — listening and learning without judging. Everyone on the team should engage in deep listening— focused listening that enables them to hear and understand what others are saying, ignoring any initial impulse to judge what they hear. Team members need to recognize that they have plenty of time later to analyze what they hear, but the first step is to fully understand what the other people are getting at.
A good way to encourage questions and reinforce deep listening is to conduct question meetings. In these meetings, the research lead should encourage participants to ask questions before making statements. This technique is sometimes called a "question first" approach. These meetings are about eliciting the maximum number of questions. They’re focused on everyone asking their questions and listening. Ban smartphones, laptops, and other electronic devices from these meetings. Everyone should focus on listening, with one person taking notes.
Although question meetings are mostly unstructured, consider starting the meeting like this:
1. Set the tone by starting with a question, such as “Does everybody know why we are having this meeting?” and then wait for a response. A good research lead is not afraid of short periods of silence. Don’t try to answer your own questions. Give everyone in the room time to think about their answer.
2. When you’re satisfied that everybody understands the meeting's purpose, present the challenge. For example, you may say something like, "The CEO wants to know why we're losing market share to XYZ Corporation." Don't share what you think. Leave the topic open for the rest of the team to weigh in on. Sit down and wait to see if anyone starts asking questions.
3. If, after a few minutes, no one says anything, you could ask something like, “Does everyone understand why this is a challenge?” What you’re hoping to get from the team is something like, “How do we know we're losing market share?” or "What is XYZ Corporation doing different or better than us?" or "When did this start?" These types of questions can help to guide the team's analysis.
Avoid quick statements that are likely to limit the scope of the discussion, such as "The CEO suspects that we are losing market share due to the recent reorganization of our marketing department." Such statements keep people from coming up with their best ideas. Remember that it’s the discussion that gives your team the greatest value. You want the team to consider all possibilities.
If you’re a fan of detective shows, you’ve probably seen a crime wall plastered with maps, photos, names, clues, sticky notes, and so on. The board functions as a combination collage, story board, and puzzle that provides the detective with a clear visualization of the evidence.
Your data science team can create its own "crime wall" by soliciting questions from across the organization through the use of a question board. Here are some suggestions for hosting an effective question board:
A question board delivers the following benefits:
Hosting question meetings and a question board are only two ways to encourage people in the organization to ask compelling questions. You are likely to come up with your own unique ideas. What's important is that you provide the encouragement and means for people to contribute their questions.
In my two previous posts, "Building a Data Science Life Cycle" and "Conducting Data Analytics in Sprints," I present a six-stage framework to structure the work a data science team performs and five techniques for performing the work in intense, two-week cycles called "sprints." These techniques go a long way to making the data science team productive.
In this post, I call your attention to several pitfalls that commonly undermine the data science team's efforts, and I provide guidance on how to be proactive in avoiding these pitfalls. Generally, your data science team needs to squash anything that limits their mission to something other than exploration and discovery.
Change the Organization's Mindset
Many organizations create data science teams and then essentially tie their hands, preventing them from truly exploring the data. Much less frequently, organizations provide their data science teams with too much freedom, so the teams end up chasing data and questions that are irrelevant to the organization's success or getting so wrapped up in routine chores, such as managing the data warehouse, that they fail to produce anything of value. In most organizations, though, the problem involves a strict hierarchy that tries to control what the data science team does, and that is a formula for failure.
Prior to installing a data science team, an organization often must change its mindset and values. It must embrace a spirit of creativity and innovation, especially in respect to its data science team. When the team is doing what it should be doing, it is learning and helping the organization learn. It is discovering what the organization doesn’t know. Attempts to micro-manage the team run counter to its mission.
However, the data science team does need to deliver value. It should serve the needs of the organization. Data science teams can achieve that goal by being highly service-oriented and by collaborating with everyone across the organization to get their questions answered, help them overcome any challenges they face, and inform their decisions.
Work without Objectives
Most organizations still view work as a series of goals and objectives. They invest a great deal of time, money, and effort on planning, management, and compliance. Teams are expected to set goals in advance, formulate plans to meet those goals, execute their plans, and deliver the promised outcomes. While that approach works well for most teams, it is counterproductive for data science teams whose mission it is to explore and innovate. Data science teams need to follow the data and the questions, and they cannot shift direction if their path is carved in stone.
If you're on a data science team, you may feel as though your team is trying to hit a constantly moving target. Every sprint introduces new questions that may lead the team in a different direction. Sometimes, the team may not even know what the moving target is. The team may be looking for patterns in the data that reveal new targets. By working without objectives, the team has the flexibility it needs to let its curiosity and the data determine the outcomes.
Take Advantage of Serendipity
Serendipityis a happy happenstance, such as striking up a conversation with the CEO of Microsoft at a Mariners game and having him offer you a job on the spot. It is an odd concept in the world of business, where strategy, goals, objectives, and planning are enshrined as the essential components of success.
However, more and more evidence points to the advantages of serendipity over goal setting and planning. One of the best books on the topic is Why Greatness Cannot Be Planned: The Myth of the Objective,by Ken Stanley and Joel Lehman. According to the authors, “Objectives actually become obstacles towards more exciting achievements, like those involving discovery, creativity, invention, or innovation.”
Data science teams are wise to capitalize on serendipity. For example, if a team member sees something unexpected and intriguing in the data the team is analyzing, the team needs to follow up on that discovery. You don't want your team focused on objectives at the expense of overlooking a groundbreaking discovery. Professor Stanley calls these “stepping-stones” — interesting things that eventually lead to insights. If you ignore them, you are likely to miss key discoveries.
Deliver Practical Knowledge and Insights
When you're working on a data science team, it's easy to get so caught up in the data, analysis, exploration, and discovery that you lose sight of the organization's needs. Driven by innate curiosity to follow wherever the data leads, the team forgets that others in the organization are relying on it to deliver knowledge and insight that guide strategy and inform decision-making. Every couple weeks, the team delivers its reports or presentations, which the team finds fascinating but which leave everyone else in the organization wondering "So what?" or "Who cares?"
To avoid this pitfall, the data science team must engage, to some degree, in guided exploration. Three tools in particular are helpful for structuring and guiding the data team's work:
Focus on Exploration over Routine Work
By its very nature, routine is repetitive, and it can become hypnotic, lulling you into a complacency that prevents you from noticing the wonderful world that surrounds you. The same is true for a data science team. It can become so wrapped up in capturing, cleaning, and consolidating data and creating data visualizations that it loses its sense of adventure. It falls into a rut and stops asking interesting questions. When looking at the data, it may not even notice an intriguing fact that's staring right back at them.
To avoid this pitfall, try the following techniques:
Keep in mind that your data science team should be committed to exploration, discovery, and innovation that's relevant to the organization's needs. If the team works toward achieving that mission, it will be less susceptible to the most common pitfalls.
In my previous post, "Building a Data Science Life Cycle (DSLC)," I encourage you to adopt a structure for your data team's activities that is conducive to the type of work it does — exploration. I refer to this structure as the Data Science Life Cycle (DSLC), illustrated below.
At first glance, DSLC appears to be a linear process, starting with identification and ending with learning, but the process is actually cyclical. Learning leads to more questions that return the team to the beginning of the process. In addition, mini-cycles often form within the DSLC as research and analysis results prompt questions that require additional research and analysis to answer, as shown below.
In this post, I drill down to illustrate how data science teams can function more effectively and efficiently within the DSLC framework by employing the following techniques:
Iterating through DSLC Sprints
The DSLC isn’t designed to cycle over a long period of time. Two weeks is sufficient for a cycle (a sprint). That gives the team sufficient time to prepare and analyze the data and compose a story that reveals the knowledge and insight extracted from the data and its significance to the organization. With short cycles, if a specific line of enquiry proves fruitless, the team can change course and head in a different direction or tackle a new challenge.
You may have heard of sprints in the context of agile software development methodologies, such as Scrum, but the term actually originated in product development. A sprintis a consistent, fixed period of time during which the team runs through an entire lifecycle. Each sprint should run through all six stages of the DSLC, as shown below.
Using Question Boards
As I explained in an earlier post, "Building a Top-Notch Data Science Team," teams should be small (four to five individuals) and include a research lead, data analyst, and project manager. Although every member of the team should be asking compelling questions, the research lead is primarily responsible for that task.
One of the most effective ways to inspire and share interesting questions is via a question board— usually a large whiteboard positioned near the data science team on which team members and others in the organization post questions or challenges. The board should have plenty of open space with a short stack of sticky notes in one of the corners. You may want to include a large arrow pointing down to the stack of sticky notes with the caption, “Ask a question.”
The question board should be open to everyone in the organization, including the research lead, other data science team members, executives, managers, and employees. Try to make your question board look as enticing as possible. Anyone in the organization should be able to walk by, grab a sticky note, and post a quick question.
Conducting Team Meetings
Given only two weeks to complete each sprint, your data science team should limit the amount of time it spends in meetings and keep those meetings focused on a specific purpose. I recommend that teams conduct five meetings over the course of a two-week sprint, each with a specific purpose and a time limit that the team agrees upon in advance:
Breaking Down Your Work
Breaking down your work involves allocating a sufficient time to all six stages of the DSLC. What often happens is that data science teams get caught up in the research stage — specifically in the process of capturing, cleaning, and consolidating the data in preparation for analysis. Given only two weeks per sprint to deliver a story, the data science team has little time to prep the data. Like agile software development teams, the data science team should look to create a minimally viable product (MVP) during its sprint — in the respect to data science, this would be a minimally viable data set, just enough data to get the job done.
Remember, at the end of a sprint, stakeholders in the organization will want to know "What do we know now that we didn't know before?" If your team gets caught up in data prep, it won't be able to answer that question.
Telling an Interesting Story
Organizations that make significant investments in any initiative want to see a return on investment (ROI), typically in the form of a deliverable. In the world of data science, the deliverable is typically in the form of an interesting story that reveals both the meaning and the significance of the team's discoveries. Unlike a presentation or data visualization, which merely conveys what the team sees, a story conveys what the team believes. A good story provides context for understanding the data, along with guidance on how that understanding can benefit the organization.
An effective story accomplishes the following goals:
Scottish novelist and folklorist Andrew Lang once wrote, “I shall try not to use statistics as a drunken man uses lamp-posts, for support rather than for illumination.” Unfortunately, many organizations that consider themselves “data-driven,” are like drunkards who use lamp posts to support rather than challenge their assumptions and beliefs, to obscure their ignorance rather than learn.
An organization that uses data more for support than illumination poses a real challenge to its data science team, because leadership is likely to view anything the team discovers that contradicts long-held beliefs to be bad information. Leadership may even discourage the asking of any questions that may be deemed to be a threat to organizational beliefs.
Your data science team needs to be sure to use data for discovery, which keeps the team from falling into the trap of using data merely to support what’s already known — or worse, to support misconceptions. As Mark Twain once wrote, “What gets us into trouble is not what we don’t know. It’s what we know for sure that just ain’t so.” A major benefit of data science is challenging accepted beliefs, especially misconceptions considered to be established truths.
Keep in mind that if your organization is relying on knowledge that’s not backed up by good data, it’s likely to run into trouble. Garbage in, garbage out; if the organization’s leaders are making decisions based on misconceptions and false assumptions, they’re probably making bad decisions. Imagine trying to navigate your way through New York City with a map of Chicago!
Three Areas of Responsibility
One way to ensure that your data science team remains true to its mission is to maintain some separation between its three areas of responsibilities:
Note that the three areas have some overlap. In these areas of overlap, the team engages in a continuous three-step process:
Hypothesis is the process of asking questions and making educated guesses that can be tested through experimentation with the data. This is primarily the role of the research lead. She knows the business, has a skeptical and creative mind, and has a knack for asking compelling questions. Broad business knowledge is key, because it provides sufficient background to feed the research lead’s curiosity. Imagine how difficult it would be to ask questions about scuba diving, for example, if you had never scuba dived. You wouldn’t even know the vocabulary needed to formulate an intelligent question about it.
A skeptical mind is also crucial to performing well in this role. While a research lead is wise to communicate with others across and at all levels of the organization, a skeptical mind prevents her from succumbing to group think and accepting as fact any deeply ingrained false assumptions or beliefs. The research lead should also be given the freedom to ask questions, regardless of how uncomfortable those questions are for the organization. As the research lead communicates with others in the organization, she needs to stay true to the data and not be swayed by politics, biases, or other pressures.
Research is the foundation for the data science team’s work. It often provides the basis for questions and follow-up questions as well as answering those questions. Research is in the realm of the data analyst, who is the only one on the team who works directly with the data. The data analyst works with the research lead to come up with interesting questions. He then mines the data in various ways to find answers and delivers the results via a report, which is typically illustrated with data visualizations to clearly convey the information and insights.
The data analyst works closely with both the research lead and project manager, but the two relationships are independent and differ significantly. His work with the research lead focuses mainly on exploring the data. He then works with the project manager to pass along knowledge and insights and prepare reports and presentations to share the team’s findings with the rest of the organization.
Implementation is the process of sharing what the data science team discovers with the rest of the organization and enforcing the what the team learned. This is the realm of the project manager, who must be sure that the team produces actionable intelligence. She then delivers the team’s discoveries to stakeholders across the organization, typically in the form of reports and presentations.
An Independent Unit
Although I break down the data science team into three areas of responsibility, in practice, the team functions as a unit. Everyone on the team works together to ask and answer questions and share the team’s discoveries with the organization. To a large degree, the team should function as an independent investigation and service agency within the organization. It should serve the business intelligence (BI) needs of various divisions and departments without being influenced by their assumptions or beliefs or any of the pressures under which they function.