One of the biggest challenges your data science team is likely to encounter is gaining access to all of the organization’s data. Many organizations have data silos— data repositories managed by different departments and isolated from the rest of the organization.
The term “silo” is borrowed from agriculture. Farmers typically store grain in tall, hollow towers called silos, each of which is an independent structure. Silos typically protect the grain from the weather and isolate different stores of grain, so if one store is contaminated by pests or disease all of the grain isn’t lost. Data silos are similar in that each department’s database is separate; data from one department isn’t mixed with data from another.
Data silos develop for various reasons. Often they result from common practice — for example, human resources (HR) creates its own database, because it can’t imagine anyone else in the organization needing its data or because it needs to ensure that sensitive employee data is secure. Data silos may also arise due to office politics — one team doesn’t want to share its data with another team that it perceives to be a threat to its position in the organization.
If your data science team encounters a data silo, it needs to find a way to access that data. Gaining access to data is one of the primary responsibilities of the project manager on the data science team. After the data analyst identifies the data sets necessary for the team to do its job, the project manager needs to figure out how to gain access to that data.
The Problems with Data Silos
Although data silos may be useful for protecting sensitive data from malware and from unauthorized access, they also cause a number of problems, including the following:
- With data silos, an organization has no single source of truth. Data from various departments must be collected and combined prior to analysis.
- If two or more departments are storing the same data, figuring out which department has the most accurate and current data can be a major challenge.
- The chance of overwriting new data with old data is increased.
- Data sharing may be more difficult and less efficient.
- Data security may be more challenging if the organization needs to secure multiple sources of data, as opposed to having data in only one location to secure.
I once worked for an organization that was trying to migrate all its data to a central data warehouse. They felt that the organization wasn’t getting enough insight from its data. The organization had just gone through a data governance transformation and wanted to govern how the data was controlled in the organization.
When they finally got into their data, they realized how much was locked away in silos that no one knew about. Over many years, each department had created its own processes, schemas, and security procedures. The organization wanted to get value from this data, but that data was stored on different servers across the entire company. To compound the problem, the various departments were reluctant to share their data. It was as if the project manager was asking them to share toothbrushes.
Breaking Down Data Silos
One of the first steps toward becoming a data-driven organization is to break down the data silos:
- Migrate all of the organization’s data to a secure data warehouse. A cloud data warehouse may be the most economical, because you can outsource data warehouse management and security to a third-party vendor that has the technology and expertise to provide superior performance and security.
- Assign each user a unique username, and require a secure password from each user to log in. This enables IT to grant unfettered access to all data they may need to do their jobs, while restricting unauthorized access to any sensitive data.
- Provide users with the tools and training they need to query and analyze the data.
By breaking down data silos, you give everyone in your organization self-serve access to the data they need to do their jobs better.
Words of Advice for Project Managers
If you’re a project manager on a data science team, try to keep the following key points in mind:
- Don’t underestimate the difficulty of gaining access to data stored in silos. It may take a long time, so get started before the team actually needs the data.
- Migrating an organization’s data to a centralized data warehouse requires the entire organization to be on board. You’ll need executive buy-in to make any progress. You can also expect to have to sell each department on the idea. Expect push-back from some departments that are highly protective of their data or that think “if it ain’t broke, why fix it?” You may have to entice them by explaining that with central data storage they’ll be able to create more complex reports or use newer visualization tools.
- Provide access to your team’s reports. You may have an easier time breaking down silos if you can show the value of company-wide reporting and insights. Build interest in your information system by sharing your team’s wins. When others in the company see the value in the data and the BI, they’ll be eager to adopt.
- Do your best to protect the data science team from any meetings about breaking down the data silos. You want the rest of your team focusing on exploration and discovery, while you focus on getting them access to the data.