Medicine & Science in Sports & Exercise

Journal Logo


Statistical Reporting Recommendations 


Transparency and Reproducibility 

  1. The Statistical Analysis section of the Methods should provide sufficient details such that others can replicate the work. For more complex analyses, authors may provide further details in supplementary documents and/or code documents, which provide an exact account of the analyses performed 

  1. Authors are required to include a data availability statement that indicates whether the data and code are publicly available and, if so, where, and how to access them. When data and code are not available, authors should include a reason why these cannot be made available. 

  1. We strongly encourage authors to make code and data publicly available by publishing them in a data repository or by making them available in the paper’s supplementary materials. If authors do not provide data, Medicine & Science in Sports & Exercise® Editors may ask for access to de-identified data during peer review. Further resources for code and data sharing are provided below. 

  1. We strongly recommend that authors follow existing reporting guidelines for different study designs, such as CONSORT for randomized trials, STROBE for observational studies, and PRISMA for systematic reviews and meta-analyses. Authors may download these guidelines at the Equator Network: https://www.equator-network.org/. We also recommend that authors refer to the CHAMP checklist for assessing the statistics in sports science papers: https://pubmed.ncbi.nlm.nih.gov/33514558/ 

  1. Authors should follow best practices for handling and reporting missing data (https://pubmed.ncbi.nlm.nih.gov/35475743/). 

  1. Authors should clearly distinguish between prespecified analyses and exploratory/post-hoc analyses. Exploratory/post-hoc analyses have an increased chance of producing spurious findings; thus, conclusions based on these analyses should be appropriately cautious. We encourage authors to preregister studies, as one of the benefits of preregistration is that it allows stakeholders to clearly distinguish between prespecified and post-hoc analyses (https://pubmed.ncbi.nlm.nih.gov/32020542/). 

  1. For prespecified analyses, authors should justify the adequacy of their sample size in the Methods section by reporting the results of a priori sample size calculation(s) for the main statistical test(s) (https://doi.org/10.1525/collabra.33267,  https://pubmed.ncbi.nlm.nih.gov/27640735/):  

a.      Sample size calculations may either be based on ensuring sufficient power for testing a specific hypothesis test or on ensuring sufficient precision for estimating an effect. 

b.  Include sufficient information to replicate the calculation, including the alpha level, effect size assumed, variability assumed, and null value (for hypothesis testing). 

c If an a priori sample size calculation was not performed (as in the case of a secondary data analysis) and authors would like to demonstrate that their sample size was adequate for their analytic aims, they may provide a power calculation based on a minimum effect size of interest (https://doi.org/10.1525/collabra.33267), not on the observed effect size. Post hoc power calculations using the observed effect size are just a transformation of the study’s observed p-value are not meaningful, and should not be reported (https://pubmed.ncbi.nlm.nih.gov/32814615/, https://pubmed.ncbi.nlm.nih.gov/32844536/). 

 

Inferences and P-values 

  1. If reporting p-values, authors should follow the guidance of the American Statistical Association’s statement on p-values: https://www.tandfonline.com/doi/full/10.1080/00031305.2016.1154108. 

  1. Do not base conclusions solely on whether p-values meet a specific threshold (e.g., p<.05). 

  1. Report precise p-values, e.g., p=.016 rather than ranges such as p<.05 or p<.01. (Note that values under .001 may be reported as p<.001.) 

  1. Don’t report p-values in isolation, particularly in the abstract of the paper. For example, “group A recovered faster than group B (p=.04)” is insufficient information for an abstract. Include effect size information, such as: “group A recovered 5 days faster on average than group B (50± 4 vs. 55 ± 4 days, p=.04).” 

  1. P-values should not be interpreted as giving the probability of a hypothesis, e.g., p=.05 does not indicate that there is a 5% chance that the null hypothesis is true, nor does it indicate that there is a 5% chance that the results were just due to random chance.  

 

  1. P-values greater than the specified alpha, e.g., p>.05, should not be interpreted as evidence of “no effect” or “no difference.” To make claims about equivalence or non-inferiority, researchers must run equivalence or non-inferiority studies.  

  1. Authors should not compare p-values or statistical significance between groups. For example, “The improvement was significant in the treatment group (p=.02) but not in the control group (p=.43).” is a meaningless comparison that does not address the question of whether the groups differed in their improvements (https://pubmed.ncbi.nlm.nih.gov/20630442/). Similarly, conclusions about subgroup differences require a formal test of interaction (https://pubmed.ncbi.nlm.nih.gov/21878926/). 

  1. Authors should consider the issue of multiple testing (https://journals.physiology.org/doi/full/10.1152/ajpregu.2000.279.1.R1). Multiple testing occurs when authors test multiple independent or dependent variables, subgroups, or time points. Multiple testing increases the likelihood of generating spurious findings (https://pubmed.ncbi.nlm.nih.gov/20006317/). For prespecified analyses, authors are encouraged to identify a primary outcome and time point a priori or to use formal statistical methods that account for multiple testing (https://pubmed.ncbi.nlm.nih.gov/20010596/). For exploratory/post-hoc analyses, accounting for multiple testing may be more informal. For example, researchers may simply put the results in context, as in: “We ran over 100 tests comparing the groups and thus would expect to find 5 significant differences just by chance if there were actually no differences between the groups and the tests were independent.” 

  1. If reporting results from Bayesian analyses, authors should specify and justify the choice of prior distribution; present numeric or graphical summaries of both the prior and posterior distributions; report the sampler used and indicate the number of chains and iterations specified, including details of burn-in and thinning; and report the sensitivity of the results to different choices of prior distribution. Authors should indicate whether posterior predictive checks were performed and how convergence was assessed. Posterior probabilities should be labeled as such. More information about Bayesian workflows can be found here: https://arxiv.org/abs/2011.01808, https://arxiv.org/abs/1709.01449 

  1. When reporting main effects of interest, authors should quantify the uncertainty of these effects by providing either confidence intervals (if using a frequentist approach) or credible intervals (if using a Bayesian approach). For example, the odds ratio was 1.50 with a 95% confidence interval of 0.73 to 3.07. The 95% level of confidence/credibility is standard, though authors may use other levels if justified. 

  1. The magnitude-based inference approach [https://pubmed.ncbi.nlm.nih.gov/19092709/] is not an acceptable method of statistical analysis in Medicine & Science in Sports & Exercise®, based on two articles and two commentaries published in the journal [https://pubmed.ncbi.nlm.nih.gov/29683920/, https://pubmed.ncbi.nlm.nih.gov/25051387/, https://pubmed.ncbi.nlm.nih.gov/30216266/, https://pubmed.ncbi.nlm.nih.gov/25783665/]. It is also inappropriate to use the magnitude-based inference approach and label it as a Bayesian approach since magnitude-based inference uses frequentist, not Bayesian, statistics (https://pubmed.ncbi.nlm.nih.gov/29683920/, https://pubmed.ncbi.nlm.nih.gov/25051387/, https://pubmed.ncbi.nlm.nih.gov/31149752/).  

 Data presentation and visualization 

 

  1. Authors should provide sufficient data visualizations to allow readers to judge the appropriateness and robustness of their analyses. 

a.  Whenever possible, show individual data points in addition to summary measures. For example, superimpose individual datapoints on top of box plots or violin plots (https://pubmed.ncbi.nlm.nih.gov/26892802/). 

b.  For repeated measures, authors are encouraged to show line plots that display trajectories of individual participants over time (https://journals.sagepub.com/doi/full/10.1177/25152459211047228).   

c.       When reporting correlation coefficients or linear regressions, authors should provide accompanying scatter plot (s) either in the main paper or in supplementary materials. Scatter plots can reveal when relationships are non-robust (as when an outlier is driving the correlation) or non-linear, or when nonparametric statistics are appropriate (https://pubmed.ncbi.nlm.nih.gov/27989418/). 

  1. Bar graphs are inappropriate for continuous data; they should only be used for count or categorical data. For continuous data, alternatives include histograms, dot plots, box plots, and violin plots (https://pubmed.ncbi.nlm.nih.gov/31657957/ https://pubmed.ncbi.nlm.nih.gov/26892802/). 

  1. To describe the variability in traits, use standard deviation or interquartile range, not standard error. The standard error quantifies the uncertainty in a statistic (e.g., mean height) and not the variability in the trait itself (e.g., height).   

  1. When reporting descriptive statistics for variables with highly skewed distributions, authors should report the median and interquartile range instead of (or in addition to) the mean and standard deviation. 

  1. Tables and figures should prominently display sample size information for each group presented. When presenting results from multivariable models, always indicate the sample size that was included in the final model, as some observations may have been excluded due to missing data. 

  1. Use only the number of significant figures appropriate. For example, if weight was measured to the nearest tenth of a kilogram, then summary statistics for weight should include only a single decimal place. Similarly, heart rate should be reported as a whole number of beats per minute. Odds ratios, rate ratios, risk ratios, and hazard ratios should generally be displayed to two decimal places. Summary statistics about the same variable should have the same number of decimal places; for example, if the mean weight is reported to 1 decimal place then the standard deviation for weight should also be reported to 1 decimal place.   

  1. When reporting summary statistics and effect sizes, always state the units. For example, was exercise measured in days/week, hours/day, etc.? When presenting regression results, make sure that the units for both the independent and dependent variables are clearly displayed. SI units should be used, e.g., kilograms instead of pounds. 

  1. In general, effect sizes should be presented in original units (e.g. seconds, meters, kilograms) rather than standardized units. Unstandardized effect sizes are more directly interpretable, provide important context, and are less likely to obscure anomalies in the data. Standardized effect sizes may provide useful complementary information, particularly when variables are measured in arbitrary units (https://peerj.com/articles/10314/). Standardized effect sizes are also appropriate when combining or comparing effects measured on different scales, as in meta-analysis. Authors should avoid interpreting standardized effect sizes based on arbitrary scales (e.g., do not interpret a Cohen’s d value of 0.8 as “large” https://peerj.com/articles/10314/). 

Other Statistical Considerations 

  1. Correlated observations (e.g., “paired” or “repeated” measurements) should be handled with appropriate statistical methods. Examples of correlated data include the same person measured over time, two knees from the same person, or two individuals from the same cluster in a cluster randomized trial. Failure to account for correlated observations in the statistical analyses can lead to incorrect inferences (https://pubmed.ncbi.nlm.nih.gov/20869686/, https://pubmed.ncbi.nlm.nih.gov/25245249/). 

  1. While we encourage authors to use methods less commonly published in the exercise science field but developed and validated in other analytical fields, authors should be aware that it can be challenging to find reviewers adequately trained to review manuscripts using those methods.  As such, a manuscript may be rejected due to lack of expertise in the Editorial or Reviewer pool to make a sufficient judgment on the quality of the analytical work. 

 Resources for Data and Code Sharing 

To improve transparency and reproducibility, authors are strongly encouraged to make their data and statistical code publicly available. Authors are also required to provide a data availability statement that gives details on the openness of their data and code. Sharing data and code promote trust in study results and theory generation. These practices also benefit the authors as data and code sharing increase citations (https://doi.org/10.1371/journal.pone.0225883). 

In cases where identifiability is a concern, it may be possible to remove the threat of identifiability by releasing only a subset of the data or by randomly perturbing values (such as dates) by a small amount. In cases where other practical considerations prohibit the release of data, we recommend that authors release their statistical code and/or provide supplemental documents that explain the analyses in detail and provide model summaries, visualizations, and summary statistics beyond those presented in the main paper.  

If code and data cannot be made available then authors should provide a reasonable explanation as to why (e.g., ethical concerns) this information cannot be openly shared. Statements such as “data are available upon reasonable request” does not constitute data sharing. If such a statement is provided, then the authors must state what exactly constitutes a “reasonable request”. 

Preparation for sharing: 

●       Project: Include a readme with a high-level description of the project, overview of the code and an overview of the variables in the dataset, including their measurement time points, if applicable. 

●       Code: Add sufficient comments to code documents to make them more readable. Include sufficient code to reproduce all analyses and graphs in the paper. 

●       Data: De-identify data, by making sure to exclude variables such as: name, address, telephone, social security, GPS data, postcode/Zipcode data, etc. The software AMNESIA can help de-identify sensitive data https://www.openaire.eu/item/amnesia-data-anonymization-made-easy#:~:text=AMNESIA%20is%20a%20flexible%20data,be%20identified%20in%20the%20data. 

●       Organize your data to comply with the FAIR Data Principles https://www.go-fair.org/fair-principles/. This makes it easier to access and reuse by others. 

We recommend archiving data/code in a version-controlled repository that provides a permanent digital object identifier (DOI) for code/data, making them citable: 

●       Dryad https://datadryad.org/stash 

●       Open science framework https://osf.io/ 

●       Zenodo https://zenodo.org/ 

  • Figshare https://figshare.com/ 

We recommend liaising with local librarians at your institute to discuss long term data storage with DOIs/Accession numbers. 

 

  

 ​

-