“Data scientist” is more difficult to define than terms used to describe other scientists, such as chemist, biologist, geneticist, or meteorologist. Part of the problem may be due to the fact that “data science” became a commonly used term long before data science became a formal field of study. Even now people who call themselves data scientists come from diverse fields and industries and interact with data in different ways. Some work more as database administrators (DBAs), others lean more toward statistical analysis, while some focus most of their efforts on writing algorithms. (An algorithm is a process or set of rules for performing calculations, processing data, or solving problems.)
Simply put, a data scientist is anyone who extracts value from data using a variety of skills, tools, and methods, including human logic, statistical analysis, machine learning, and visualization software. Specifically, data scientists do the following:
If you’re a statistician, a data analyst, or a mathematician who specializes in developing machine learning algorithms, you can probably make a strong case that you’re a data scientist. However, as this field becomes more established, more and more organizations are looking for candidates who have a standardized skill set. Several universities, including Berkley, Syracuse, and Columbia are already moving in this direction, offering degree programs in the field of data science. Graduates are expected to have a wide variety of skills in the following areas:
Asking Interesting and Relevant Questions
A large part of what a data scientist does is ask interesting questions that are relevant to furthering (or challenging) the organization’s strategy and objectives.
Over the last 20 years, most organizations focused on increasing their operational efficiency by streamlining their business processes. They asked operational questions such as, “How can we work smarter, instead of harder?” and “How can we implement new technologies to save time and money?”
Data science is different; it isn’t objective-driven. It’s exploratory and uses a scientific method. It’s not about how well an organization operates; it’s about gaining useful business knowledge and insight. Part of the role of a data scientist is to work with leaders and other stakeholders in an organization to ask interesting and relevant questions and mine the data for answers. Questions are less objective-driven and more business-intelligence driven, such as:
These are all questions that require a higher level of organizational thinking, and most organizations aren’t ready to ask these types of questions. They are driven to set milestones and create budgets. They haven’t been rewarded for being skeptical or inquisitive.
Data scientists engage in data mining — the process of extracting value from data by using a combination of database management, statistics, mathematics, and machine learning. Although the methods can be complex, data mining relies primarily on old school logical processes, including the following:
Delivering the Goods
One of the best ways to understand what any professional does is to look at what they produce or deliver — the fruits of their labor. Deliverables for data scientists include the following:
Data science is all about harnessing the power of data to gain knowledge and insight, solve problems, automate processes, and make better decisions. The data scientist plays a key role in this process.