Misinterpretation of 'p' (1986) (2e)

[PEREZGONZALEZ Jose D [ed] (2012). Misinterpretation of 'p' (1986) (2e)5. Journal of Knowledge Advancement & Integration (ISSN 1177-4576), 2012, pages 140-143.] [Printer friendly]

Misinterpretations of 'p' and 'sig'

Oakes (19863) carried out a study on common misinterpretations of the logic of tests of significance among British psychology academics. Typically, most of these misinterpretations confuse p-values (ie, the probability of the data when assuming that the null hypothesis is true) and, especially, statistical significance8, with the probability of proving or disproving hypotheses (be this the null hypothesis or an alternative hypothesis). Another misinterpretation is the so-called "replication fallacy", which occurs when the p-value is assumed to represent the probability of finding similar results if the research were to be repeated.

Oakes found that 96% of the participants held at least one misinterpretation (mean = 2.5) out of the six given to them (see table 1).

Table 1. Frequencies and percentages of misinterpretations regarding tests of significance
Common misinterpretations6 f %
Significance disproves the null hypothesis 1 1.4%
The p-value informs of the probability of the null hypothesis 25 35.7%
Significance proves the alternative hypothesis 4 5.7%
The p-value informs of the probability of the alternative hypothesis 46 65.7%
'P' informs of the probability of a wrong decision when rejecting the null 60 85.7%
The p-value informs of the probability of the results if replicated 42 60.0%
(Participants who answered that all of above were false) 3 4.3%

Table 1 reflects potential misinterpretations, potential because they were a product of conscious effort (ie, reflection) introduced by the impromptu research. However, Oakes also asked the participants to point out which of those or alternative interpretations they typically held prior to the research. He found that most participants typically interpreted p-values and the level of significance somehow differently than table 1 shows (ie, they had preferred interpretations rather than exhausting all possible known or "logical" interpretations) (see table 2).

Table 2. Frequencies and percentages of typical interpretations
Interpretations6 typical upon reflection
f % f %
Significance disproves the null hypothesis 1 1.4% 0 0.0%
The p-value informs of the probability of the null hypothesis 32 45.7% -7 -10.7%
Significance proves the alternative hypothesis 2 2.9% +2 +2.8%
The p-value informs of the probability of the alternative hypothesis 30 42.9% +16 +22.8%
'P' informs of the probability of a wrong decision if rejecting the null 48 68.6% +12 +17.1%
The p-value informs of the probability of the results if replicated 24 34.3% +18 +25.7%
The p-value is the probability of the data given the null hypothesis is true* 8 11.3% n/a n/a
(* Correct interpretation freely provided by some participants)

Most interesting of all, though, is that the ad hoc research prompted them to reflect on other meanings beyond their preferred interpretations. In table 2, the columns under "upon reflection" show changes in interpretation which were prompted by the research (ie, the difference between typical interpretations held before doing the research and interpretations considered 'true', upon reflection, while doing the research). These changes were the following:

  • Given the opportunity to freely include any other interpretations, 11.3% of participants claimed to typically use the (correct) interpretation of the p-value as the probability of the data given the null hypothesis is true. Yet, only 3% (2 participants) held it as the only correct interpretation throughout the research while 1.4% (1 participant) came to hold this interpretation as the only correct one upon reflection (these were the 3 participants who thought all research interpretations were false).
  • A positive change was that 10% of participants who typically interpreted the p-value as the probability of the null hypothesis being true, thought, upon reflection, that they were wrong in their interpretation.
  • Most other changes were negative7:
    • 2.8% of the participants who typically did not interpret significance as (absolutely) proving the alternative hypothesis thought, upon reflection, that it could, indeed, support such interpretation.
    • 22.8% of the participants who typically did not interpret the p-value as the probability of the alternative hypothesis to be true thought, upon reflection, that it could support such interpretation.
    • 17.1% of the participants who typically did not interpret the p-value as the probability of making a wrong decision when rejecting the null hypothesis9 thought, upon reflection, that it could support such interpretation.
    • 25.7% of the participants who typically did not interpret the p-value as the probability of replicating results thought, upon reflection, that it could support such interpretation.

Editor

Jose D PEREZGONZALEZ (2012). Massey University, Turitea Campus, Private Bag 11-222, Palmerston North 4442, New Zealand. (JDPerezgonzalezJDPerezgonzalez).

Want to know more?

WikiofScience - Hypotheses testing (disambiguation)
This WikiofScience page lists alternative methods for testing the probability of data or hypotheses.
WikiofScience - Null hypothesis significance testing
This WikiofScience page reflects on the pseudoscientific bases of the null hypothesis significance testing (NHST) procedure typically used in the social sciences and medicine.
WikiofScience - Studies which replicate Oakes's
You can find more information on two studies that partially replicated Oakes's one, on WikiofScience. One study was done by Falk and Greenbaum in 1995; the other study was done by Haller and Krauss in 2000.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License