Whenever an organization is looking to extract meaning from data, its leaders would be wise to consult a data scientist — a person who specializes in mining data for information and insights. A data scientist role is trained in various disciplines, including science, programming, data management, statistics, and machine learning, for the purpose of knowing how to collect, analyze, and interpret data, typically in support of the organization's decision-making process.
Specifically, a data scientist performs the following tasks:
- Asking compelling, relevant questions to identify the questions and problems the organization should be researching.
- Determining the best analytical tools (such as data visualization software or machine learning algorithms) for specific jobs.
- Identifying data sources required to perform the desired analyses.
- Identifying the variables to be examined.
- Capturing and procuring large sets of structured, semi-structured, and unstructured data from disparate sources.
- Ensuring the data is cleaned and validated to ensure accuracy, completeness, and relevance.
- Developing algorithms to mine the collected data. (An algorithm is a process or set of rules to be followed in calculations or other problem-solving operations.)
- Interpreting the data to identify problems, solutions, and opportunities.
- Designing experiments to confirm or refute assumptions and preliminary conclusions.
- Communicating discoveries to stakeholders, primarily via compelling stories that typically include data visualizations.
Supporting the Data-Driven Decision-Making Process
In the past, many organizations based their decisions on organizational leadership's knowledge and insight. If they were honest, these leaders would have to admit that their decision-making process was more art than science. Decisions were based on historical data at best and pure hunches and conjecture at worst.
With the growing availability of large volumes of diverse data, business intelligence (BI) software, and machine learning, decision-making has become more science than art. Now, machine learning algorithms can make highly accurate predictions and forecasts to guide the decision-making process. Algorithms can also be used to gain highly accurate insights into consumer behavior in order to market products and services to them much more effectively.
Another trend is the democratization of data — the availability of data and analytics at all levels to enable data-driven decision-making throughout the organization, not just at the upper echelons. We are now seeing everyone in a company, including marketing, sales reps, customer service reps, product development specialists, and manufacturing supervisors using BI software to inform their decisions.
Supporting this trend toward greater adoption of data-based decision-making is the data scientist, who ensures that everyone in the organization has access to the data and analytical tools they need.
Much of what a data scientist does involves data mining — the process of extracting value from data by using a combination of database management, statistics, mathematics, and machine learning. Although the methods can be complex, data mining relies primarily on old school logical processes, including the following:
- Descriptive statistics: Analyzing, describing, or summarizing data in a meaningful way to discover patterns in the data.
- Probability: Gauging the likelihood that something will happen.
- Correlation: Measuring the degree to which two things are related.
- Causation: Determining the likelihood that one event is the result of another event.
- Predictive analytics: Applying statistical analysis to historical data in an attempt to predict future outcomes.
Data scientists also play a role in artificial intelligence (AI), supporting the drive toward increased automation with their expertise in machine learning. Automation includes expert systems that perform specific tasks, such as the following:
- Conducting preliminary reviews of job applications to screen out unqualified applicants and identify highly qualified candidates.
- Detecting and preventing fraudulent transactions and other fraudulent activity within the organization.
- Managing the warehouse, including shipping, receiving, and inventory control.
- Optimizing machine maintenance through the use of sensors that capture and report data about each machine's operation.
- Reducing energy costs through energy management systems.
- Optimizing quality control to detect and even prevent manufacturing defects.
Look for a Scientist Who Works with Data
If you are looking to hire a data scientist, stress the importance of scientist over that of data. A good data scientist thinks like a scientist and strictly adheres to the scientific method:
- Identify a problem or question.
- Research the problem or question.
- Develop a hypothesis.
- Design an experiment.
- Collect and analyze the results.
- Formulate a conclusion.
Look for a candidate with an inquisitive and skeptical mind who is also familiar with business intelligence software, in addition to statistics, programming, and machine learning. You want someone who is good at not only answering questions, but, much more importantly, asking the right questions and challenging the answers.