That Pesky P Value: Considering study interpretation during the pandemic

Multiple clinical studies related to COVID-19 are published daily, so it is important for us to carefully consider the conclusions of such studies, particularly those with an overreliance on statistical tests that ultimately provide a P value.

For many basic research experiments, as well as clinical trials, we often accept that a P value of < 0.05 means the difference between two groups is “significant.” But P = 0.048 (for example) for a typical t-test really just means there is a 4.8% chance that the outcome you observed between two groups is not actually different; that is, the two means of the two groups are identical. Statistically speaking, this indicates there is a 4.8% chance that the null hypothesis (no difference between groups or no treatment effect on some measured outcome) is true. (Or, that random sampling from otherwise identical populations would lead to a difference smaller than you observed in 95.2% of experiments and larger than you observed in 4.8% of experiments.) So, when P is even marginally greater than 0.05 the tendency is to label the results as “nonsignificant.” A corollary is the unabashed support a reader of a paper may give to an intervention deemed “significant” when the P is minimally lower than 0.05.

In the era of “big data” the simple P value has been improved as the defining statistical test. This has included correction for multiple comparisons (eventually meaning that the raw P value might need to be quite small) and even using P = 0.10 or 0.20 to make some initial lists of genes to consider for further study. We have also seen greater use of the false discovery rate as a way to slim down very large datasets. Nevertheless, an overreliance on the P value (1, 2) may exist for most papers we read.

In the context of an appropriately designed and powered clinical trial, it is important to think about the arbitrary nature of “P < 0.05.” Do we really think that P = 0.049 and P = 0.051 can adequately help us decide if a treatment is effective or not? To my mind, such a heavy reliance on an arbitrary cutoff is inappropriate and can lead to poor decisions during the pandemic. It can even promote the “targeted reading” effect, in which the reader looks at the abstract for a P value. If is not < 0.05, they dismiss a therapy or intervention as insignificant and move on. I suggest that values approaching the prespecified P value should be interpreted, at a minimum, as “worthy of further studies.” And of course, P values minimally below the cutoff should be suspect, with the study needing replication.

There also appears to be publication bias against papers with P values marginally above 0.05 (or whatever value was prespecified), which may leave important papers to publication in lower-tier journals. As reported in a statement from the Statistical Association, a petition signed by 800 statisticians proposes the elimination of inferring statistical significance based on P values (3). The higher P value is frequently interpreted as “proof” that the null hypothesis is true. But it really conveys the current study provides insufficient evidence to infer, from a probabilistic standpoint, that the null hypothesis can be rejected.

The use of an adaptive clinical trial design has helped mitigate some of these concerns (4). These trials incorporate interim data analyses used to modify the ongoing trial, without undercutting its validity. The actions that might occur are typically prespecified, but they need not be when faced with a rapidly unfolding life-threatening public health crisis such as the COVID-19 pandemic. Indeed, most COVID-19 trials that we are conducting with Tampa General Hospital are adaptive trials.

When it comes to scientific literature, then, we have no choice but to read the full paper. This includes looking at how the study was designed, and other outcome indices such as confidence intervals, to understand the results. In this pandemic era in particular, an overreliance on P ≤ 0.05 could lead us away from effective solutions for COVID-19.

Stephen Liggett, MD
Associate Vice President for Research, USF Health
Vice Dean for Research, USF Health Morsani College of Medicine
Professor of Medicine, Molecular Pharmacology and Physiology

References:

Singh AK, Kelley K, Agarwal R. Interpreting Results of Clinical Trials: A Conceptual Framework. Clin J Am Soc Nephrol 2008;3:1246-52.
Ross Meisner and Bill Woywod. Are you overvaluing your clinical trial p values? Clinical Leader, (Guest column) Feb. 6, 2020.
Valentin Amrhein, Sander Greenland, Blake McShane. Scientists rise up against statistical significance, Nature (Comment), March 20, 2019
Philip Pallmann, Alun W. Bedding, Babak Choodari-Oskooei, et al. Adaptive designs in clinical trials: why use them, and how to run and report them. BMC Medicine, 28 February 2018.

University of South Florida

University of South Florida

Quick Links

Blog

Search USF Health News

Can’t find a news post?

USF Health Communications