Data visualization is the process of communicating data graphically — in the form of tables, graphs, maps, timelines, matrices, tree diagrams, flow charts, and so on. Their purpose is to convey relationships, comparisons, distributions, compositions, trends, and workflows more clearly and succinctly than can be presented solely in words. You can think of a data science team’s reports as employing two forms of communication:
When building a report, the data science team combines the two forms of communication to tell the story revealed by the data with maximum clarity and impact. Visuals often provide the means of communicating complex information and insights with the greatest simplicity and effectiveness. Often, the audience immediately “gets it” upon viewing a simple graphic that summarizes the data.
Choose the Right Chart Type
When doing data visualizations, a key first step involves choosing the chart type that’s the best fit for the data and what you’re trying to illustrate. The following table provides general guidance to help you make the right choice.
Keep in mind that content and purpose should drive form. Don’t choose a chart or other visual just because it looks pretty. I’ve seen some beautiful charts that do a poor job of communicating the data, as well as ugly charts that are very informative. Ideally, you want a beautiful chart that’s informative and communicates the point you’re trying to make. However, if you have to make trade-offs, clarity trumps beauty.
A Team Sport
Creating data visualizations is a team sport. The data analyst should work closely with the other members of the data science team to develop data visualizations that communicate the data most effectively. If the data analyst has to explain the charts to the research lead, they’re probably too complex for other stakeholders in the organization. The team is a good testing ground for ensuring that the visuals in a report will be effective.
Remember that your team works together to explore the data, which means that the majority of the first round of reports you design will be for each other. The research lead drives interesting questions; the data analyst creates a quick and dirty report to explore possible answers; and then the team might come up with a whole series of new questions. This means that most of your initial data visualizations will be quick exchanges — more like visual chitchat than full data reports.
After the team reaches consensus on the data and the visuals, spend some time polishing the data visualizations to share them with the rest of the organization. Your final data visualizations should be even simpler and easier to understand than the versions you shared with team members.
Work in Cycles
Think of your first round of data visualizations as whiteboard presentations in your data science team meetings. Although you’ll probably do most, if not all, of your data visualizations on a computer, treat them like mock-ups or scribbles on a whiteboard. These data visualizations may be oversimplified. Their purpose is to initiate productive and creative discussions. You may start with a quick and simple scatter plot or linear regression chart and then fine-tune it as you ask more questions and collect and analyze more data. Obtaining and responding to feedback from other team members is the best way to create effective and attractive data visualizations.
Your best charts will be the product of an emergent design. Start with simple reports and improve them over time. You’ll produce much better reports by going through several revisions.
Recommended Books on Data Visualization
If you’re interested in discovering more about data visualization, I recommend the following two books:
Note: There’s typically nothing in the training of data analysts that prepares them for producing good visualizations. Most graduate programs are still very much rooted in math and statistics. Good data visualization relies on aesthetic and design. It’s a learned skill and may not come easy.