Misinterpretations of 'p' and 'sig'
Oakes (19863) carried out a study on common misinterpretations of the logic of tests of significance among British psychology academics. Typically, most of these misinterpretations confuse p-values (ie, the probability of the data when assuming that the null hypothesis is true) and, especially, statistical significance8, with the probability of proving or disproving hypotheses (be this the null hypothesis or an alternative hypothesis). Another misinterpretation is the so-called "replication fallacy", which occurs when the p-value is assumed to represent the probability of finding similar results if the research were to be repeated.
Oakes found that 96% of the participants held at least one misinterpretation (mean = 2.5) out of the six given to them (see table 1).
|Table 1. Frequencies and percentages of misinterpretations regarding tests of significance|
|Significance disproves the null hypothesis||1||1.4%|
|The p-value informs of the probability of the null hypothesis||25||35.7%|
|Significance proves the alternative hypothesis||4||5.7%|
|The p-value informs of the probability of the alternative hypothesis||46||65.7%|
|'P' informs of the probability of a wrong decision when rejecting the null||60||85.7%|
|The p-value informs of the probability of the results if replicated||42||60.0%|
|(Participants who answered that all of above were false)||3||4.3%|
Table 1 reflects potential misinterpretations, potential because they were a product of conscious effort (ie, reflection) introduced by the impromptu research. However, Oakes also asked the participants to point out which of those or alternative interpretations they typically held prior to the research. He found that most participants typically interpreted p-values and the level of significance somehow differently than table 1 shows (ie, they had preferred interpretations rather than exhausting all possible known or "logical" interpretations) (see table 2).
|Table 2. Frequencies and percentages of typical interpretations|
|Significance disproves the null hypothesis||1||1.4%||0||0.0%|
|The p-value informs of the probability of the null hypothesis||32||45.7%||-7||-10.7%|
|Significance proves the alternative hypothesis||2||2.9%||+2||+2.8%|
|The p-value informs of the probability of the alternative hypothesis||30||42.9%||+16||+22.8%|
|'P' informs of the probability of a wrong decision if rejecting the null||48||68.6%||+12||+17.1%|
|The p-value informs of the probability of the results if replicated||24||34.3%||+18||+25.7%|
|The p-value is the probability of the data given the null hypothesis is true*||8||11.3%||n/a||n/a|
|(* Correct interpretation freely provided by some participants)|
Most interesting of all, though, is that the ad hoc research prompted them to reflect on other meanings beyond their preferred interpretations. In table 2, the columns under "upon reflection" show changes in interpretation which were prompted by the research (ie, the difference between typical interpretations held before doing the research and interpretations considered 'true', upon reflection, while doing the research). These changes were the following:
- Given the opportunity to freely include any other interpretations, 11.3% of participants claimed to typically use the (correct) interpretation of the p-value as the probability of the data given the null hypothesis is true. Yet, only 3% (2 participants) held it as the only correct interpretation throughout the research while 1.4% (1 participant) came to hold this interpretation as the only correct one upon reflection (these were the 3 participants who thought all research interpretations were false).
- A positive change was that 10% of participants who typically interpreted the p-value as the probability of the null hypothesis being true, thought, upon reflection, that they were wrong in their interpretation.
- Most other changes were negative7:
- 2.8% of the participants who typically did not interpret significance as (absolutely) proving the alternative hypothesis thought, upon reflection, that it could, indeed, support such interpretation.
- 22.8% of the participants who typically did not interpret the p-value as the probability of the alternative hypothesis to be true thought, upon reflection, that it could support such interpretation.
- 17.1% of the participants who typically did not interpret the p-value as the probability of making a wrong decision when rejecting the null hypothesis9 thought, upon reflection, that it could support such interpretation.
- 25.7% of the participants who typically did not interpret the p-value as the probability of replicating results thought, upon reflection, that it could support such interpretation.
Jose D PEREZGONZALEZ (2012). Massey University, Turitea Campus, Private Bag 11-222, Palmerston North 4442, New Zealand. (JDPerezgonzalez).
Want to know more?
- WikiofScience - Hypotheses testing (disambiguation)
- This WikiofScience page lists alternative methods for testing the probability of data or hypotheses.
- WikiofScience - Null hypothesis significance testing
- This WikiofScience page reflects on the pseudoscientific bases of the null hypothesis significance testing (NHST) procedure typically used in the social sciences and medicine.
- WikiofScience - Studies which replicate Oakes's
- You can find more information on two studies that partially replicated Oakes's one, on WikiofScience. One study was done by Falk and Greenbaum in 1995; the other study was done by Haller and Krauss in 2000.