P-Hacking

Does it hurt to take a peek? Or just leave “unimportant” findings unreported?

For the first post on StatisticalBullshit.com, it seems appropriate to discuss one of the most common instances of Statistical Bullshit: p-hacking!

What is p-hacking?  Well, let’s first talk about p-values.

A p-value is the probability that the observed data occurred due to random chance alone. For instance, when performing a t-test that compares two groups of data, such as performance for two work units, the p-value indicates the likelihood that the observed differences between the two groups occurred due to random chance alone. If the p-value is 0.05, for example, it indicates that there is a five percent likelihood that the observed results occurred due to random chance alone.  So, if the p-value is reasonably small (most often < .05), then we can assume that some effect other than random chance alone caused the observed relationships – and we most often assume that our effect of interest was indeed the cause.  In these instances, we say that the result is statistically significant.

For more information on p-values, visit my p-value page at MattCHoward.com.

So, what is p-hacking?  P-hacking is when a researcher or practitioner looks at many relationships to find a statistically significant result (p < .05), and then only reports significant findings.  For instance, a researcher or practitioner may collect data on seven different variables, and then calculate correlations between each of them.  This would result in a total of 21 different correlations.  They could find one or two significant relationships (p < .05), rejoice, and write-up a report about the significant finding(s).  But is this a good practice?  Definitely not.

Given 21 different correlations, we would expect one or two of them to be significant, even if all the variables were completely randomly generated (and hence should not be significantly related).  This is because the p-value is the likelihood that a result is significant due to random chance alone.  If we expect this random chance to occur five percent of the time, then 21 correlations would produce at least one significant result on average (21 * .05 = 1.05).  So, although a result may be statistically significant, it does not always mean that a meaningful effect caused the finding.

Some readers may still be skeptical that random variables could produce significant findings.  Let me give you another example.  I recently completed two studies in which I had participants predict a completely random future event.  As you probably assumed, no one was able to predict the future even better than random chance alone, which supports that any correct guesses were only achieved by random chance alone.  So, any predictor variables should not be significantly related to the number of correct guesses.  However, I found a statistically significant relationship between gender and the number of correct guesses (r = .21, p < .05), and women are able to predict the future better than men!  Right?

Well, let’s look at the results of the two studies:

Study 1

Study 2

Predictor

Correlation with Number of Correct Guesses

Predictor

Correlation with Number of Correct Guesses

1.) Perceived Ability to Predict the Future

-.08

1.) Perceived Ability to Predict the Future

-.02

2.) Self-Esteem

-.07

2.) Self-Esteem

.09

3.) Openness

.02

3.) Openness

.04

4.) Conscientious

-.01

4.) Conscientious

.01

5.) Extraversion

-.07

5.) Extraversion

-.13

6.) Agreeableness

.03

6.) Agreeableness

.04

7.) Neuroticism

.04

7.) Neuroticism

-.01

8.) Age

.10

8.) Age

.07

9.) Gender

-.04

9.) Gender

.22**

** = p < .01

As you guessed, we cannot claim that women predict the future better than men based on these results.  Given that 18 correlations were calculated, we would naturally assume that one would be significant due to random chance alone, which was likely the relationship between gender and the number of correct guesses.

So, what do we do about p-hacking?

Authors have presented a wide-range of possible solutions, but three appear to be the most popular:

  • Do-away with p-values altogether. A small number of academic journals have been receptive to this issue, and they often request that submitted papers include confidence intervals and discuss effect sizes instead.
  • Report all Many journals now require submissions to include a supplemental table that notes all measured variables not reported in the manuscript.  A growing number of journals have even started to request that submissions include the full dataset(s) with all measured variables.  A growing concern has also been expressed regarding authors that do not report entire studies because they did not support their results (this will likely be a future StatisticalBullshit.com topic).
  • Only test relationships specified prior to collecting data. Recent databases have been created in which researchers can publicly identify relationships to test in their data before it has been collected, and then only test these relationships once the data has been collected.
  • Adjust p-value cutoffs. Many corrections can be made to account for statistical significance due to random chance alone.  Perhaps the most popular is the Bonferroni correction, in which the p-value cutoff is divided by the number of comparisons made.  For instance, if you performed 10 correlation analyses, you would divide the p-value of .05 by 10, resulting in a new p-value cutoff of .005.  Many researchers and practitioners view this as too restrictive, however.
  • Replicating your results can help ensure that a result was not due to random chance alone.  Lighting rarely strikes twice, and the same completely random relationship rarely occurs twice.

While these solutions were developed in academia, they can also be applied to practice.  For instance, if a work report includes statistical results, you should always ask (1) whether the document or presentation includes statistics other than p-values, such as correlations or t-values, (2) whether other analyses were conducted but not reported, (3) whether the reported relationships were intended to be tested, (4) whether a p-value adjustment is needed, and (5) whether the findings have been replicated using a new scenario or sample.  Only after obtaining answers for these questions should you feel confident in the results!

Of course, there is still a lot more that could be said about p-hacking, but I believe that is a good introduction.  If I missed your favorite method to address p-hacking, or anything else, please let me know by emailing MHoward@SouthAlabama.edu.  Likewise, feel free to email about your own Statistical Bullshit stories or questions.  P-values are one of my favorite (and most popular) Statistical Bullshit topics, so be ready for more posts about p-values in the future.

Until next time, watch out for Statistical Bullshit!

Statistical Bullshit

Statistical Bullshit is everywhere. We have all experienced it.

You’re drifting in and out of a work meeting, while the presenter is droning on and on. They finally get to the big PowerPoint slide – the one with the numbers that “support their claim.” You study the figures and look for hidden issues, but the presenter skips to the next slide before you can even ingest their main points…let alone the things that they were trying to hide.

Or, you’re reading an article about a new research study. Some tables and figures are included, but you are left with the feeling that certain key information is missing. How can you know whether their findings are really true? Or even somewhat true?

Maybe it’s election season. Without fail, both candidates will claim that they have the popular support, and they both claim that statistics show that their policies are the best. Of course, you know that both of them cannot be correct, but it is difficult to know who is right (and who is lying!).

Even yet, you might have heard someone say, “studies have shown.” It could be a family member, possibly a friend…or even your doctor. Were those studies legitimate? Did they really support their findings?

Each of these instances could be Statistical Bullshit. That is, when statistics are manipulated, doctored, or sometimes even ignored to provide a desirable result.

The purpose of this website is to educate about Statistical Bullshit, with the goal of reducing Bullshit practices in society. No longer should people be able to make numerical claims without sufficient justification, and this website can help achieve this goal. It may not be able to change all of society, but it may certainly help you – the reader. So, I hope that after reading this website, you can sit in that work meeting and confidently shout BULLSHIT when that presenter passes through those misleading numbers.


MattCHoward Image Statistical Bullshit is owned and operated by Dr. Matt C. Howard. Dr. Howard is currently an assistant professor of Marketing and Quantitative Methods in the Mitchell College of Business at the University of South Alabama. His personal academic website can be found at MattCHoward.com.