Dark logo

Data Modeling Basics

Published January 16, 2017
Doug Rose
Author | Agility | Artificial Intelligence | Data Ethics

Data modeling is the process of defining the structure, flow, and relationships of data stored in a database management system (DBMS). A data model provides the framework for all the data an organization generates and stores, and it facilitates the process of extracting, combining, and analyzing that data later. The goal of the data modeling process is to define the following:

  • The data entries to be stored in the database, such as product numbers, names, prices, and quantities in stock; customer names, IDs, and addresses; supplier names, IDs, and contact information; and so on
  • The relationships between data items; for example, customers buy products
  • Constraints on data items; for example, all dates must be in the MM/DD/YYYY format

A simple example of a data model is a spreadsheet with customer contact information. Row and column headings label every data entry, and data validation can be used to constrain data; for example, you can limit text entries to a certain number of characters. In practice, however, data models are much more complex, typically consisting of hundreds of individual tables that are related to one another in various ways.

A data model ensures that an organization’s data is structured — meaning that all data entries are clearly labeled and in a consistent format. Structuring data ensures consistency in naming conventions, default values, and data entry formats.

The Data Modeling Process

To build a data model, data architects must work closely with everyone in the organization responsible for entering data into the system as well everyone who will be using the information system to extract data, conduct analysis, generate reports, and so on. Only the people who enter data into the system and use that data to develop and execute the organization’s strategy know what the data model must contain.

The process of developing a data model typically involves the following steps:

  1. Identify entities. An entity is a collection of related data, such as data specific to customers, suppliers, shipping addresses, or products.
  2. Identify the attributes of each entity. For example, the customer entity may include each customer’s ID number, first name, last name, phone number, and e-mail address.
  3. Specify data naming conventions. Naming conventions ensure consistency among field labels. For example, the data architect must decide whether to refer to product number as “PROD#,” “#PROD,” “PRDCT#,” or something else.
  4. Define constraints. Constraints validate data entries. For example, a constraint may specify that a phone number entry must be 10 digits.
  5. Identify relationships among entities. For example, a customer places and order that is shipped to one address and billed to another. In this case, the customer entity is related to three other entities — order, billing address, and shipping address.
  6. Assign keys. Each entity has one or more keys that identify each unique record in a table and identify relationships among tables in the DBMS. For example, a customer ID identifies a unique customer and is used in all orders to link customers to order numbers.

Additional steps are required to address access, storage, and performance requirements.

The Three Stages of Data Modeling

The data modeling process is broken down into three stages or levels:

  1. Conceptual: At this stage, the data model specifies the data the system must store and how the data are related. Conceptual data models are typically presented in the form of flow charts or relationship diagrams.
  2. Logical: Technical details are added to include rules and data structures. The goal here is to provide details for implementation independent of which DBMS is used.
  3. Physical: The physical data model contains additional details on how to implement the system for a specific DBMS. At this stage, software developers are primarily responsible for implementing the data model.

Ideally, a data architect takes a top-down approach to data modeling — gathering input from people in various departments to develop a conceptual model for capturing data and storing it in the organization’s DBMS.

However, a data architect may also take a bottom-up approach, starting with existing forms, fields, software, and reports to reverse-engineer a data model from an organization’s existing system.

Ensuring Data Integrity

One of the key benefits of data modeling is that it helps to ensure data integrity — accuracy and reliability, both in terms of entity integrity and referential integrity:

  • Entity integrity refers to the accuracy and reliability of data stored in a given table. Assigning a unique primary key to each and every record in a table ensure entity integrity.
  • Referential integrity refers to the accuracy and reliability of the relationships between tables, which is enforced through the use of foreign keys. A foreign key is a primary key in one table that is used in another table; for example, a customer ID that appears in the Orders table.

Normalization

Normalization is the process of systematically breaking down a large, complex table into smaller ones to eliminate disparities and redundancy, thus improving data integrity while making the DBMS easier to manage. For example, by creating separate tables for customer data and order data, if a customer moves to a new address, you simply make that change in the customer table, and the change is automatically reflected in any new orders the customer places. You don’t have to make the change in the customer table and the order table, which would require more work and more storage and be more susceptible to introducing data discrepancies in the system.

Moving on

Of course, data modeling is much more complex and involved than what I describe here, but this post provides a basic understanding to get you started. Keep in mind, however, that not all data an organization needs is as structured as its internal data. For business intelligence, organizations often rely on semi-structured and unstructured data, a topic I will cover in a later post.

Related Posts
January 9, 2017
What Is Data Science?

Data science is a multi-disciplinary approach to extracting insight from data. The disciplines involved include computer science/information technology, math/statistics, and domain knowledge/expertise (for example, knowledge of a specific industry). The process of extracting insight from data is typically broken down into the following five stages: Shifting the Focus from Data to Science The best way […]

Read More
October 9, 2017
Data Storytelling Audience

To tell a story with data you need to communicate with your data storytelling audience. Not just with visualization, but with a good narrative.

Read More
May 15, 2017
Breaking Down Data Silos

Most organizations have data silos. To get better insights you need to break down this siloed data.

Read More
1 2 3 18
9450 SW Gemini Drive #32865
Beaverton, Oregon, 97008-7105
Dark logo
© 2022 Doug Enterprises, LLC All Rights Reserved
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram