Getting Up to Speed on Data Visualization

Share on facebook
Facebook
Share on google
Google+
Share on twitter
Twitter
Share on linkedin
LinkedIn

Data visualization is the process of communicating data graphically — in the form of tables, graphs, maps, timelines, matrices, tree diagrams, flow charts, and so on. Their purpose is to convey relationships, comparisons, distributions, compositions, trends, and workflows more clearly and succinctly than can be presented solely in words. You can think of a data science team’s reports as employing two forms of communication:

  • Verbal (words)
  • Visual (pictures)

When building a report, the data science team combines the two forms of communication to tell the story revealed by the data with maximum clarity and impact. Visuals often provide the means of communicating complex information and insights with the greatest simplicity and effectiveness. Often, the audience immediately “gets it” upon viewing a simple graphic that summarizes the data.

Choose the Right Chart Type

When doing data visualizations, a key first step involves choosing the chart type that’s the best fit for the data and what you’re trying to illustrate. The following table provides general guidance to help you make the right choice.

Purpose Chart Types
Compare values Bar

Column

Line

Pie

Scatter plot

Spider chart

Show composition Area

Pie

Stacked bar

Stacked column

Waterfall

Show distribution Bar

Column

Line

Scatter plot

Show trends Column

Dual-axis line

Line

Show relationships Bubble

Line

Scatter plot

Show locations Map

Keep in mind that content and purpose should drive form. Don’t choose a chart or other visual just because it looks pretty. I’ve seen some beautiful charts that do a poor job of communicating the data, as well as ugly charts that are very informative. Ideally, you want a beautiful chart that’s informative and communicates the point you’re trying to make. However, if you have to make trade-offs, clarity trumps beauty.

A Team Sport

Creating data visualizations is a team sport. The data analyst should work closely with the other members of the data science team to develop data visualizations that communicate the data most effectively. If the data analyst has to explain the charts to the research lead, they’re probably too complex for other stakeholders in the organization. The team is a good testing ground for ensuring that the visuals in a report will be effective.

Remember that your team works together to explore the data, which means that the majority of the first round of reports you design will be for each other. The research lead drives interesting questions; the data analyst creates a quick and dirty report to explore possible answers; and then the team might come up with a whole series of new questions. This means that most of your initial data visualizations will be quick exchanges — more like visual chitchat than full data reports.

After the team reaches consensus on the data and the visuals, spend some time polishing the data visualizations to share them with the rest of the organization. Your final data visualizations should be even simpler and easier to understand than the versions you shared with team members.

Work in Cycles

Think of your first round of data visualizations as whiteboard presentations in your data science team meetings. Although you’ll probably do most, if not all, of your data visualizations on a computer, treat them like mock-ups or scribbles on a whiteboard. These data visualizations may be oversimplified. Their purpose is to initiate productive and creative discussions. You may start with a quick and simple scatter plot or linear regression chart and then fine-tune it as you ask more questions and collect and analyze more data. Obtaining and responding to feedback from other team members is the best way to create effective and attractive data visualizations.

Your best charts will be the product of an emergent design. Start with simple reports and improve them over time. You’ll produce much better reports by going through several revisions.

Recommended Books on Data Visualization

If you’re interested in discovering more about data visualization, I recommend the following two books:

  • The Visual Display of Quantitative Information, 2nd Edition, by Edward R. Tufte. In this book, Professor Tufte introduces the idea of the data-to-ink ratio. The goal is to communicate the maximum amount of data with the minimum amount of ink. He uses the term “chartjunk” for useless visuals such as 3-D shadows or gradient effects.
  • Storytelling with Data: A Data Visualization Guide for Business Professionals by Cole Nussbaumer Knaflic. This book extends the discussion beyond data visualizations to explain more about using them effectively as part of a report. The author covers key topics, including the necessity of understanding the audience and the context in which the data visualizations are presented.

Note: There’s typically nothing in the training of data analysts that prepares them for producing good visualizations. Most graduate programs are still very much rooted in math and statistics. Good data visualization relies on aesthetic and design. It’s a learned skill and may not come easy.

More to explorer

Democratizing Data in Your Organization

Democratizing data involves making it available to personnel throughout an organization and providing them with the tools and training needed to query and