Data science is a multi-disciplinary approach to extracting insight from data. The disciplines involved include computer science/information technology, math/statistics, and domain knowledge/expertise (for example, knowledge of a specific industry). The process of extracting insight from data is typically broken down into the following five stages:
- Capture/gather: Organizations capture data from their daily operations and obtain data from other sources that share their data freely (such as certain government organizations) or sell it.
- Store/maintain: Data is typically stored in a data warehouse on premises or in the cloud.
- Process: Data may be processed via data mining, clustering/classification, data modeling, or data summarization to make it more suitable for the next stage in the process — analysis.
- Analyze: Various techniques are used to extract meaning and insight from data, including computer science, predictive analytics, statistics, and machine learning.
- Communicate: The information and insights derived from the data must be communicated in the clearest way possible — typically in the form of graphs, tables, maps, and other visual formats.
Shifting the Focus from Data to Science
The best way to think about data science is to focus less on the data and more on the science — specifically the scientific method:
- Ask a question.
- Conduct background research.
- Formulate a hypothesis.
- Test with an experiment.
- Analyze the data.
- Draw conclusions.
- Communicate the results.
Instead of merely guessing or theorizing, data science embraces an empirical approach, drawing conclusions based on observation or experience instead of theory or logic, which is more prone to bias.
A data scientist often begins by asking questions, such as “How is our organization doing?” “What are the key drivers of my business?” “Who are my customers?” and “What can I do to engage my customers more effectively?” Questions vary depending on the organization and the purpose of the analysis. For example, a shipping company may want to know the shortest, fastest, or safest delivery routes. A doctor may want to know which patients are at the highest risk of developing a serious illness. A government agency may want to know the potential impact of raising property taxes in a certain state.
As a coach and trainer I do a fair amount of traveling, and I’m amazed at the diversity of plumbing faucets and fixtures around the world. Whenever I’m in an unfamiliar hotel, I often struggle to figure out how to use the shower. An empirical approach is often best:
- I start with a question, “How do I turn on the shower?”
- I review the research — information and insight I gathered from past experiences with showers along with my observations of the current shower setup.
- I form a hypothesis — my best guess as to what to do to turn on the water and adjust the temperature.
- I test my hypothesis with an experiment — turning or pulling a certain knob or lever.
- I analyze the data — the water did or did not turn on or it was the expected temperature or warmer or colder than I had hypothesized.
- I draw my conclusion, deciding whether the control I chose was the right one and whether the action I performed on that control delivered the desired results. This new data adds to my research in Step 2, and I can repeat the steps, making additional hypotheses and conducting more experiments until I develop the knowledge and insight required to operate the shower correctly.
Data scientists use this same empirical approach all the time. They ask questions, gather the data they think will help them answer their questions, and then choose or develop a mathematical/computer data model that is the best match for the data and the purpose for which it is being used. They test the model and then make adjustments as needed until the model performs as desired.
Exploratory, Not Objective-Driven
Over the last 20 years, most organizations have focused on increasing their operational efficiency. They asked operational questions such as, “How can we scale in a way that reduces costs?”
Data science is different; it isn’t objective-driven. It’s exploratory. It’s not about how efficiently an organization operates; it’s about gaining useful insight, often insight that was totally unexpected and something a human being would never have imagined looking into. Data science asks different questions, such as:
- What do we know about our customer?
- How can we deliver a better product?
- Why are we better than our competitors?
These are all questions that require a higher level of organizational thinking and most organizations aren’t ready to ask these types of questions. They tend to be more focused on quantitative goals, such as meeting milestones and staying on budget. They lack a culture of inquisitiveness and skepticism, and they rarely consider the qualitative factors that drive business, such as innovation, customer service, and community.
Imagine you’re in a business meeting and someone asks such questions. Why are we doing it this way? What makes you think this will work? Why is this a good idea? Chances are that the person asking these questions would be treated as a gadfly and be asked something like “Didn’t you read the memo?” However, these are the questions that lead the process of building organizational knowledge. These are the questions you want from your data science team. As an organization, you gain knowledge by asking interesting questions and taking an empirical approach to answering them. That’s what data science is all about.