Data science is a multi-disciplinary approach to extracting insight from data. The disciplines involved include computer science/information technology, math/statistics, and domain knowledge/expertise (for example, knowledge of a specific industry). The process of extracting insight from data is typically broken down into the following five stages:
Shifting the Focus from Data to Science
The best way to think about data science is to focus less on the data and more on the science — specifically the scientific method:
Instead of merely guessing or theorizing, data science embraces an empirical approach, drawing conclusions based on observation or experience instead of theory or logic, which is more prone to bias.
A data scientist often begins by asking questions, such as “How is our organization doing?” “What are the key drivers of my business?” “Who are my customers?” and “What can I do to engage my customers more effectively?” Questions vary depending on the organization and the purpose of the analysis. For example, a shipping company may want to know the shortest, fastest, or safest delivery routes. A doctor may want to know which patients are at the highest risk of developing a serious illness. A government agency may want to know the potential impact of raising property taxes in a certain state.
As a coach and trainer I do a fair amount of traveling, and I’m amazed at the diversity of plumbing faucets and fixtures around the world. Whenever I’m in an unfamiliar hotel, I often struggle to figure out how to use the shower. An empirical approach is often best:
Data scientists use this same empirical approach all the time. They ask questions, gather the data they think will help them answer their questions, and then choose or develop a mathematical/computer data model that is the best match for the data and the purpose for which it is being used. They test the model and then make adjustments as needed until the model performs as desired.
Exploratory, Not Objective-Driven
Over the last 20 years, most organizations have focused on increasing their operational efficiency. They asked operational questions such as, “How can we scale in a way that reduces costs?”
Data science is different; it isn’t objective-driven. It’s exploratory. It’s not about how efficiently an organization operates; it’s about gaining useful insight, often insight that was totally unexpected and something a human being would never have imagined looking into. Data science asks different questions, such as:
These are all questions that require a higher level of organizational thinking and most organizations aren’t ready to ask these types of questions. They tend to be more focused on quantitative goals, such as meeting milestones and staying on budget. They lack a culture of inquisitiveness and skepticism, and they rarely consider the qualitative factors that drive business, such as innovation, customer service, and community.
Imagine you’re in a business meeting and someone asks such questions. Why are we doing it this way? What makes you think this will work? Why is this a good idea? Chances are that the person asking these questions would be treated as a gadfly and be asked something like “Didn’t you read the memo?” However, these are the questions that lead the process of building organizational knowledge. These are the questions you want from your data science team. As an organization, you gain knowledge by asking interesting questions and taking an empirical approach to answering them. That’s what data science is all about.