Data visualization allows readers to quickly analyze and interpret data. But how can we build visualizations that correctly communicate accurate information about our data? By understanding what people perceive when they observe a visualization, we can design them in a way that leads to truthful analysis while avoiding biases. In this article, we will cover common data visualization mistakes that may hinder successful analysis and how to avoid them.
Keep it Simple
First, it is important to discuss how data visualization errors come to be. In the same way it is important for the visualization creator to keep the reader in mind, it is equally important for the reader to think about the motivation of the person building the visualization. For example, often visualizations are overly flashy and attempt to draw in the reader through engaging imagery and appealing colors. The creator wants readers to be drawn in, which is understandable, but flashy elements can distract from the real insights of the data, and at times, mislead. The following are visualization elements that can make a visualization more eye-catching but are best to avoid.
3D visualizations appear to have depth and look like they are jumping off the page. They can be very appealing to the eye, and they are included in visualization software such as Microsoft Office and Matplotlib for Python. The problem with 3D visualizations, however, is the way they can distort or hide the elements of the chart. For example, the bars in the rear of ‘Chart 1’ are obstructed by the bars in the front. The viewer does not have the ability to rotate the view to see the hidden elements and even if they did, rotating would hide other elements. Another reason to avoid 3D visualization elements is the way projection alters the sizing of the chart elements. 3D visualizations use sizing of objects to make things appear closer or further away, therefore achieving the illusion of three dimensions. However, this makes it more difficult for the viewer to determine size comparisons between bars or pie slices. For example, in ‘Chart 2’ and ‘Chart 3’ Indiana and Illinois both have $55,000 in sales but the sizing of the elements distorts this information.
When building a visualization, avoid 3D elements to ensure details are not hidden and components are sized in a way that is obvious to the viewer.
Too Much of a Good Thing
It can be tempting to add as much detail as possible to a visualization to make it look exciting and to give the reader all the data in one view. However, overcrowding is a real issue in many analytics dashboards, and it can hinder the quick and accurate data analysis for which visualizations are meant. While the line graph below looks colorful and exciting, it is nearly impossible to drill down to view trends at the state level due to overcrowding. Also, using color to distinguish between 50 states is sure to result in a very similar color for numerous states, making it even harder to distinguish what you are seeing.
For example, Is the spike in early 2018 from Florida or North Carolina? Is the red line that spikes late every year Ohio or New York? These questions are impossible to answer visually. Overcrowding and putting too much information on a view is detrimental to the usefulness of a visualization, which is the reader’s ability to quickly and accurately find answers in the underlying data. Instead, try to think about what questions the reader will want to answer with the visualization and construct the view so it is seamless for the reader to find their answers.
Purposefully Misleading Visualizations
While focusing too much on stylish design is usually an honest oversight, there are times when the motivation that leads to misleading visualizations are truly nefarious. Sometimes the creator of a visualization has an agenda and is trying to convince the reader of something that the underlying data does not necessarily support. British economist, Ronald H. Coase, famously quipped, “if you torture the data long enough, it will confess to anything”. Think about the motivations of the creator and be on the lookout for data that may have been ‘tortured’ - through shady data visualization techniques - into admitting things that are not true. Here are common tactics a biased visualization may use to mislead the reader.
When crafting a visualization, the creator may attempt to convince the reader there is a relationship between two or more variables. However, variables that show a strong correlation coefficient are not always truly correlated. Tyler Vigen publishes charts showing so-called ‘spurious correlations’ to show how erroneous it can be to assume correlation. One example he published at tylervigen.com/spurious-correlations is the relationship between divorce rates in Maine and US per capita margarine consumption from 2000 to 2009.
The graph shows a very strong correlation coefficient for this time period, but it would be foolish to assume these two variables are truly correlated as there is an overwhelming likelihood that these variables have nothing to do with each other. Another factor that can be misleading about correlations is that correlations do not necessarily mean causation. When looking at a chart showing a correlation between two variables, the reader might wrongly assume that a change in one variable causes a change in the other. However, there may be one or more additional variables that are causing the correlation to take place. As the viewer of a visualization that appears to show correlation, be weary of the bias of the source, as they may be erroneously claiming a correlation exists where it does not or leaving out causative variables that explain the correlation in a different way.
Tipping the scales
Many forms of data visualization are presented on a Cartesian plane, with a x and y-axis. These include but aren’t limited to bar charts, line graphs, scatter plots. Visualizations on a Cartesian plane make it easy to compare the quantitative values between data points. Issues arise, however, when truncating the scale of an axis, or starting the axis at something other than zero. Consider the difference between the bar charts below, both of which display the same information for sales by state.
The chart with the truncated y-axis (left), at first glance, misleads the reader by showing Illinois sales much higher than Michigan and nearly double the sales of Ohio. When looking at the non-truncated chart on the right, it is evident that the sales for each state are much closer to each other. The reason starting the y-axis at a higher value is so misleading is because we naturally measure the magnitude of data in a bar chart by visually measuring the distance between the x-axis and the top of the bar.
Another common method of altering a cartesian plane is to zoom in on a favorable portion of the x-axis when showing changes over time. For example, in the line graphs below, the left shows the full dataset, but the right zooms in on a specific portion of the graph to show a positive trend.
As a data visualization architect, it is imperative to avoid visualization mistakes that could potentially mislead the end user. There is a balance that must be struck between making your visualization attractive enough to draw in and excite the reader, but not going overboard with flashy elements that can distract or lead your reader to make false conclusions. The most effective visualizations follow the less-is-more principle. Focus on your most useful components in order to not overwhelm the reader. Avoid 3D elements whenever possible as they typically do more harm than good. And for those consuming data visualizations, keep in mind that a visualization is not always what it seems. Contemplate the biases and motivations of the source of the visualization and look out for the common tactics used to purposely mislead the viewer, such as implying correlations and altering the axes.
At Bear Cognition, our team of analysts are trained in data visualization best practices, and we understand the importance of integrity and honesty when it comes to the analysis and visualization work we do for our clients. In this age of rapid data collection and utilization, it pays to have expert analytics in your corner to put your data to work for you, and make sure you maintain your competitive advantage. If you would like to learn more about us, please fill out the form below. We are looking forward to hearing from you!