The key ingredient for a successful data scientist is a curious, skeptical, and innovative mind. However, data scientists also need to be familiar with the tools of the trade, various technologies, which can be broken down into the following categories:
Moving Beyond the Tools
Keep in mind that tools are just that — devices for reducing the amount of work required. Choosing the best tools available is certainly important. For example, many organizations currently struggle to manage “big data” because they made huge investments in on-premises data warehouses that run Hadoop, which requires highly specialized database administrators (DBAs) just to keep it up and running. However, choosing the right tools is only a small part of extracting value from data. Organizations also need to come up with creative uses for that data.
Imagine you manage a website for connecting prospective car buyers to dealers. You build hundreds of information tags into the website that indicate when a shopper is hovering over a clicking a link. All the data flows into your data warehouse, representing several terabytes of data per week. You’ve collected historical data spanning several years. Collecting the data was relatively easy. The challenge is figuring out what to do with that data.
This seems like a common challenge for many organizations starting out in data science. They approach it mostly as an operational challenge and focus on collecting the data because it’s relatively cheap and easy to understand. It’s meeting-friendly and everyone can get behind the effort. They’ll even create multiple clusters or data lakes to pool their data from across the organization. But that’s the easy part. What organizations struggle with is the science of data science. They’re unaccustomed to asking and answering interesting questions.
As the data scientist for this imaginary car website, think about the questions you could ask to extract value from the data. You could examine the data to determine whether customers were more likely to click on a car if it were red, blue, or yellow. If the reports showed that customers are 2% more likely to click on a car if it’s red, the organization could share that with car dealerships to increase their sales and revenue. You could also experiment with the number of cars listed on each page to determine the optimum number of listings per page.
As a data scientist, this is the type of empirical research you should be considering. Using the tools at your disposal, you need to ask interesting questions, run experiments, and produce well-designed reports to convey the newfound knowledge and insight to your organization’s decision-makers and stakeholders.
Remember, it’s less about the tools and more about what you can do with those tools. A master carpenter can do wonders with a hammer, a handsaw, and a few other basic tools and materials, whereas someone unskilled in carpentry would have trouble building a simple chair in a fully-equipped woodworking shop. As a data scientist, you want it all — the skills and expertise along with the best tools available!