Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Oct 29;140(18):1506-1518.
doi: 10.1161/CIRCULATIONAHA.118.037777. Epub 2019 Oct 28.

Reveal, Don't Conceal: Transforming Data Visualization to Improve Transparency

Affiliations
Review

Reveal, Don't Conceal: Transforming Data Visualization to Improve Transparency

Tracey L Weissgerber et al. Circulation. .

Abstract

Reports highlighting the problems with the standard practice of using bar graphs to show continuous data have prompted many journals to adopt new visualization policies. These policies encourage authors to avoid bar graphs and use graphics that show the data distribution; however, they provide little guidance on how to effectively display data. We conducted a systematic review of studies published in top peripheral vascular disease journals to determine what types of figures are used, and to assess the prevalence of suboptimal data visualization practices. Among papers with data figures, 47.7% of papers used bar graphs to present continuous data. This primer provides a detailed overview of strategies for addressing this issue by (1) outlining strategies for selecting the correct type of figure depending on the study design, sample size, and the type of variable; (2) examining techniques for making effective dot plots, box plots, and violin plots; and (3) illustrating how to avoid sending mixed messages by aligning the figure structure with the study design and statistical analysis. We also present solutions to other common problems identified in the systematic review. Resources include a list of free tools and templates that authors can use to create more informative figures and an online simulator that illustrates why summary statistics are meaningful only when there are enough data to summarize. Last, we consider steps that investigators can take to improve figures in the scientific literature.

Keywords: bar graphs; basic science; continuous data; data visualization.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. Why one shouldn’t use a bar graph, even if the data are normally distributed
Bar graphs arbitrarily assign importance to the height of the bar, rather than focusing attention on how the difference between means compares to the range of observed values. Panel A: The bar height represents the mean. Error bars represent one standard error. The y-axis starts at zero and ends just above the highest error bar. Panel B: Adding data points reveals that the bar graph in Panel A includes low values that never occurred in the sample (Zone of Irrelevance) and excludes observed values above the highest error bar (Zone of Invisibility). Panel C: The dot plot emphasizes how the difference between means compares to the range of observed values. The y-axis includes all observed values. Reprinted from Weissgerber et al. under a CC-BY license. Abbreviations: SE, standard error.
Figure 2:
Figure 2:. Figures for comparing groups in cross-sectional or experimental studies
When choosing among different types of graphs, it is important to consider the study design, sample size and data distribution. This figure provides a detailed overview of different types of graphs, describes when to use each graph and lists best practices for clear data presentation.
Figure 3:
Figure 3:. Strategies for making effective dot plots
The initial graph is hard to interpret because it has many overlapping data points (A). Strategies for making all points visible include decreasing the size of the data points (B), making the data points semi-transparent (C), and using random (D) or symmetric (E) jittering. The bottom row illustrates how to clearly show the main finding, while allowing readers to critically evaluate the data. Increasing the white space between groups (G) and emphasizing the summary statistics (H) makes the graph (F) much easier to interpret.
Figure 4:
Figure 4:. Combining dot plots with box or violin plots
Panel A shows the data distribution, but provides no information about sample size. The overlapping points in Panel B offer little additional insight. Panels C and D allow readers to evaluate the data by including symmetrically jittered data points, making the box plot width proportional to the sample size and listing the sample size on the x-axis. Only the dot plot is shown for the last group, as the sample size was too small for a box plot. This dataset includes small groups (n = 10–15); therefore it would be better to emphasize the dot plot (C). If all groups have larger sample sizes, investigators can choose whether to emphasize the dot plot (C) or the box plot (D).
Figure 5:
Figure 5:. Select Color Blind Safe Color Maps
This figure illustrates how heat maps created using different color palettes would appear to someone with normal color vision (top row) vs. someone with the most common form of color blindness (bottom row). Color blindness was simulated using Color Oracle.
Figure 6:
Figure 6:. Summary statistics are only meaningful when there are enough data to summarize
Panel A illustrates how the sample mean (red/blue dots) and standard deviation (red/blue error bars) might change if we repeated the same experiment 100 times, with n = 5 or n = 20. The black line and gray shaded region show the population mean and standard deviation. If all samples gave precise estimates, each sample mean would be on the black line and the error bars would fill the gray region. Panel B illustrates how cumulative means change with increasing sample size. The sample mean is calculated for three participants. New participants are added one at a time. The colored lines illustrate how the sample mean changes as each new participant is added. The experiment is repeated 100 times. The black line shows the true population mean. When n is small, the sample means are often quite different from the population mean (Seas of Uncertainty). As n increases, the sample means converge on the population mean (Corridor of Stability). Interactive version: https://rtools.mayo.edu/size_matters/. The terminology of Seas of Chaos/Uncertainty and Corridor of Stability was used in papers examining the effects of sample size on correlation coefficients and effect sizes. Abbreviations: SD, standard deviation.
Figure 7:
Figure 7:. Small samples do not contain enough information to determine the distribution
Random samples of different sizes (n = 100, n = 20 and n = 5) were drawn from populations with a normal, skewed or bimodal distribution (red violin plots). One can clearly identify the different data distributions when n = 100. Determining the data distribution becomes more difficult when n = 20 and is impossible when n = 5. Interactive version: https://rtools.mayo.edu/size_matters/.
Figure 8:
Figure 8:. Avoid sending mixed messages – Why the figure structure should match the study design and statistical analysis
The experiment was designed to compare normotensive and hypertensive patients. Separate analyses were performed for each dependent variable (biomarkers A, B and C). Panel A illustrates a common strategy for presenting this type of data. Including all dependent variables on the same graph erroneously suggests that the authors intended to compare biomarkers A, B and C. Panel B avoids confusion by presenting each biomarker separately. Abbreviations: HTN, hypertensive; NT, normotensive.
Figure 9:
Figure 9:. How to structure figures for common analyses
This figure illustrates how to structure figure panels and groups for common types of analyses, including comparing groups, repeating the same analysis on different dependent variables, comparing groups with pooled subgroups, stratified analyses, and testing for an interaction. Abbreviations: HTN, hypertensive; NT, normotensive.

Similar articles

Cited by

References

    1. Weissgerber T, Milic N, Winham S and Garovic VD. Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol. 2015;13:e1002128. - PMC - PubMed
    1. Alsheikh-Ali AA, Qureshi W, Al-Mallah MH and Ioannidis JP. Public availability of published research data in high-impact journals. PLoS One. 2011;6:e24357. - PMC - PubMed
    1. Hardwicke TE, Mathur MB, MacDonald K, Nilsonne G, Banks GC, Kidwell MC, MHofelich Mohr A, Clayton E, Yoon EJ, Henry Tessler M, Lenne RL, Altman S, Long B and Frank MC. Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition. R Soc Open Sci. 2018;5:180448. - PMC - PubMed
    1. Vines TH, Albert AY, Andrew RL, Debarre F, Bock DG, Franklin MT, Gilbert KJ, Moore JS, Renaut S and Rennison DJ. The availability of research data declines rapidly with article age. Curr Biol. 2014;24:94–7. - PubMed
    1. PLOS Biology. Submission Guidelines: Data Presentation in Graphs. 2016. http://journals.plos.org/plosbiology/s/submission-guidelines-loc-data-pr.... Accessed July 1, 2019.

Publication types

-