The Power of the Letter “P”

pThere are few letters that signify as much as the letter “p” does in the world of statistics.  A p-value can be the difference between publication and rejection in academia, the deciding factor on a multimillion dollar lawsuit in industry or even the numerical decision used to approve the most recent cancer treatment.

Sadly, P-values are abused and misunderstood. (http://retractionwatch.com/2016/03/07/were-using-a-common-statistical-test-all-wrong-statisticians-want-to-fix-that/).

The purpose of this post is to discuss what a p-value is in layman’s terms; thus, hopefully, enabling you all to more effectively use p-values in your lives and careers.

Before I start, I want to say this post was an incredibly difficult post for me to write.  I challenge you all to consider with something you see as “common knowledge” and attempt to explain it to someone who has (potentially) never encountered the term before.  The exercise is highly useful, but highly challenging (hence the near 2 weeks prior to my post).

Coin Flip:

Consider flipping a coin; we all know the chance of heads is 1/2 and the chance of tails is 1/2.  Now, let’s imagine that you flip a coin and get heads 100 times in a row.  Is such a scenario possible?  Absolutely, but highly unlikely.  Actually, such a result is so unlikely that you will probably assume the coin is unfair! But is it?

How do you test whether the coin is unfair?

There are two ways one can go about determining unfairness.  The first is by testing if the coin is fair.  The other is testing whether the coin is unfair.  One might think that the two previous sentences are identical, but they are not.  To say a coin is fair, one must be able to prove the coin returns results of heads and tails with a 50/50 chance.  To say a coin is unfair, there are unlimited possibilities.  For example, one could determine the coin has a 30% chance of heads, 31%, 32%, 32.05% etc.  Thus, rather than try to isolate one instance of fairness, why not try to prove it is one of the other unlimited instances of unfairness?  Thus, to simplify matters, statisticians use a two-step process:

Step 1: Assume the coin is fair.

Step 2: Test whether our assumption is wrong (the coin is one of the unlimited unfair results) using a p-value.

P-value:

I am going to explain p-value by assuming we flip a fair coin and record 90 heads and 10 tails.

The purpose of calculating a p-value is to ask the question: given a belief (the coin is fair), is the belief wrong according to the data we have collected (90H, 10T)?

The calculation of the p-value will tell us the probability of a fair coin’s obtaining the result of at least 90 heads (90 heads, 91, 92…100 heads). Yes, just how often will such a result occur? 

What does the actual p-value number mean?

Suppose I have the example of 90 heads above and I calculate my p-value to be .01=p-value.  The .01 indicatesthere is a 1% chance a fair coin will get 90, 91, ….99 or 100 heads (refer to red sentence above).

On the other hand, if we calculate a p-value=0.3, one should say there is a 30% chance a fair coin would produce 90 or more heads in a row (based on the red sentence above). And what are we doing?  We are estimating how sure we can be that the coin is fair.

In the world of statistics, .05 is the magical number. Thus, if there is a 5% chance or less a fair coin receives 90-100 heads, statisticians would assume the coin is unfair. Why?  Because it is tradition/ a habit/ a convention; I am sorry, but there is no other reason. 5% is not the same thing as 0%, but at the same time, it does enable us to be somewhat sure that the coin is not a fair coin with those results.

Therefore, a p-value=.3 (30% chance) is too high to be concerned about the fairness of a coin.  In this instance we would conclude the coin is NOT necessarily fair, but we cannot justify calling it unfair given the information. Remember what I have underlined because researchers often forget what I just said in their enthusiasm for their “findings.”

Looking at the underlined sentence above, I want to make special note that the goal of the statistics is not to justify the initial assumption (fair coin), but to determine whether the initial assumption is illogical given the data.  It is this underlined sentence that leads so many researchers astray in the sense that many would conclude that when they calculate p-value=.3, they are proving the coin is fair. Sadly, the researchers are not proving fairness, but concluding unfairness is illogical given the data.

One Additional Interpretation of p-value:

A p-value can also be used to help determine when an input is useful to predict something. For example, is a baby’s height (the input/independent variable) useful in predicting the height of the child as an adult (the “something” predicted/dependent variable)?

This question is the beginning of a subject called regression analysis, which is highly important in research.  The p-values calculated during regression analysis do not estimate the probability of fair or unfair like the coin, but estimates the probability that the newborn height does not predict adult height given the data. Again, notice the logical pattern. Show X might be correlated (or even causal) by demonstrating the extent to which the results you obtained in your study are NOT closely linked to the input or independent variable.

Step 1: Assume newborn height does not help predict adult height.  Why? Easier to disprove!!

Step 2: Test our assumption is wrong using p-value

(Use the same method as above.)

A p-value is not the:

Probability something is true

  • P-values eliminate potential untruths, but do not conclude truths.

Probability repeated experiments would reach similar results

Additional Limitations/Abuses of the p-value:

Business decisions and policies should NOT be based solely on p-values

Are p-values useful?  ABSOLUTELY, but only somewhat; therefore, p-values should be used in addition to other factors such as logic, beliefs and intuition when making decisions and policies.

Two Quick Ideas to Take Away: 

P-value is a calculation to determine the chance certain, or more outlandish, results will occur  (the chance of a fair coin flipping 90 heads or more out of 100 tosses); therefore, p-values do not determine truth, but debunk assumptions  based on data.

.05 is the magical number; thus, 5% chance or less of 90 or more Heads makes me believe the coin is not fair

I sincerely hope you did not get lost in the jargon and you all have a better sense for what a p-value is.  Grapple with this post.  This concept is HIGHLY important.

 

Leave a comment