One of the biggest challenges your data science team is likely to encounter is gaining access to all of the organization’s data. Many organizations have data silos— data repositories managed by different departments and isolated from the rest of the organization.

The term “silo” is borrowed from agriculture. Farmers typically store grain in tall, hollow towers called silos, each of which is an independent structure. Silos typically protect the grain from the weather and isolate different stores of grain, so if one store is contaminated by pests or disease all of the grain isn’t lost. Data silos are similar in that each department’s database is separate; data from one department isn’t mixed with data from another.

Data silos develop for various reasons. Often they result from common practice — for example, human resources (HR) creates its own database, because it can’t imagine anyone else in the organization needing its data or because it needs to ensure that sensitive employee data is secure. Data silos may also arise due to office politics — one team doesn’t want to share its data with another team that it perceives to be a threat to its position in the organization.

If your data science team encounters a data silo, it needs to find a way to access that data. Gaining access to data is one of the primary responsibilities of the project manager on the data science team. After the data analyst identifies the data sets necessary for the team to do its job, the project manager needs to figure out how to gain access to that data.

The Problems with Data Silos

Although data silos may be useful for protecting sensitive data from malware and from unauthorized access, they also cause a number of problems, including the following:

I once worked for an organization that was trying to migrate all its data to a central data warehouse. They felt that the organization wasn’t getting enough insight from its data. The organization had just gone through a data governance transformation and wanted to govern how the data was controlled in the organization.

When they finally got into their data, they realized how much was locked away in silos that no one knew about. Over many years, each department had created its own processes, schemas, and security procedures. The organization wanted to get value from this data, but that data was stored on different servers across the entire company. To compound the problem, the various departments were reluctant to share their data. It was as if the project manager was asking them to share toothbrushes.

Breaking Down Data Silos

One of the first steps toward becoming a data-driven organization is to break down the data silos:

  1. Migrate all of the organization’s data to a secure data warehouse. A cloud data warehouse may be the most economical, because you can outsource data warehouse management and security to a third-party vendor that has the technology and expertise to provide superior performance and security.
  2. Assign each user a unique username, and require a secure password from each user to log in. This enables IT to grant unfettered access to all data they may need to do their jobs, while restricting unauthorized access to any sensitive data.
  3. Provide users with the tools and training they need to query and analyze the data.

By breaking down data silos, you give everyone in your organization self-serve access to the data they need to do their jobs better.

Words of Advice for Project Managers

If you’re a project manager on a data science team, try to keep the following key points in mind: