20110905 - Misinterpretation of 'p' (1986)
[Data] (Note: A more conceptually sound edition of this article exists4)
[<Normal page] [PEREZGONZALEZ Jose D [ed] (2011). Misinterpretation of 'p' (1986). Journal of Knowledge Advancement & Integration (ISSN 1177-4576), 2011, pages 99-102.]

Misinterpretation of 'p'

Oakes (19863) carried out a study on common misinterpretations of the level of significance (p) among British psychology academics. Typically, most of these misinterpretations confuse the level of significance (ie, the probability of the data assuming that the null hypothesis is true) with the probability of proving or disproving hypotheses (be this the null hypothesis or an alternative hypothesis). Another misinterpretation is the so-called "replication fallacy", which occurs when the probability of the data is assumed to represent the probability of finding similar results if the research were to be repeated, failing to notice that this could only happen if the null hypothesis were, indeed, correct.

Oakes found that 96% of the participants held at least one misinterpretation (mean = 2.5) out of the six given to them (see table 1).


Table 1. Frequencies and percentages of misinterpretations of statistical significance (p)
Common misinterpretations5 f %
Significance disproves the null hypothesis 1 1.4%
Significance informs of the probability of the null hypothesis 25 35.7%
Significance proves the alternative hypothesis 4 5.7%
Significance informs of the probability of the alternative hypothesis 46 65.7%
Significance informs of the probability of making a type I error 60 85.7%
Significance informs of the probability of the results if replicated 42 60.0%
(Participants who answered that all of above were false) 3 4.3%



Table 1 reflects potential misinterpretations, potential because they were a product of conscious effort (ie, reflection) introduced by the impromptu research. However, Oakes also asked the participants to point out which of those or alternative interpretations they typically held prior to the research. He found that most participants typically interpreted the level of significance somehow differently than table 1 shows (ie, they had preferred interpretations rather than exhausting all possible known or "logical" interpretations) (see table 2).


Table 2. Frequencies and percentages of typical interpretations
Interpretations5 typical upon reflection
f % f %
Significance disproves the null hypothesis 1 1.4% 0 0.0%
Significance informs of the probability of the null hypothesis 32 45.7% -7 -10.7%
Significance proves the alternative hypothesis 2 2.9% +2 +2.8%
Significance informs of the probability of the alternative hypothesis 30 42.9% +16 +22.8%
Significance informs of the probability of making a type I error 48 68.6% +12 +17.1%
Significance informs of the probability of the results if replicated 24 34.3% +18 +25.7%
*Significance is the probability of the data given the null hypothesis is true 8 11.3% n/a n/a
(* Correct interpretation freely provided by some participants)

Most interesting of all, though, is that the ad hoc research prompted them to reflect on other meanings beyond their preferred interpretations. In table 2, the columns under "upon reflection" show changes in interpretation which were prompted by the research (ie, the difference between typical interpretations held before doing the research and interpretations considered 'true', upon reflection, while doing the research). These changes were the following:

  • Given the opportunity to freely include any other interpretations, 11.3% of participants claimed to typically use the (correct) interpretation of the level of significance as the probability of the data given the null hypothesis is true. Yet, only 3% (2 participants) held it as the only correct interpretation during the research while 1.4% (1 participant) came to hold this interpretation as the only correct one upon reflection (these were the 3 participants who thought all research interpretations were false).
  • A positive change was that 10% of participants, who typically interpreted significance as the probability of the null hypothesis being true, thought, on reflection, they were wrong in their interpretation.
  • Most other changes were negative6:
    • 2.8% of the participants who typically did not interpret significance as (absolutely) proving the alternative hypothesis thought, on reflection, it could, indeed, support such interpretation.
    • 22.8% of the participants who typically did not interpret significance as the probability of the alternative hypothesis to be true thought, on reflection, it could support such interpretation.
    • 17.1% of the participants who typically did not interpret significance as the probability of making a type I error thought, on reflection, it could support such interpretation.
    • 25.7% of the participants who typically did not interpret significance as the probability of replicating results thought, on reflection, it could support such interpretation.


Methods


Research approach

Not much detail. It appears to have been an ad hoc, exploratory study.

Sample

  • A convenient sample of 70 participants from, probably, a psychology department at a British university. The participants were university lecturers, research fellows and postgraduate students with two or more years' experience doing research.

Materials

  • A questionnaire consisting of a small scenario and six statements. The scenario described a small research with two-independent samples, and provided the relevant results: a t-test which achieved 'p = 0.01'.
    • The six statements asked for a true / false decision regarding whether each particular statement reflected a logical interpretation of the results. Unknown to the participants, all statements were false, representing six common misinterpretations of statistical significance.
    • Either the researcher or the questionnaire itself also provided the 'hint' that "no particular ratio of true to false statements should be anticipated".
  • Following the questionnaire, participants were asked to pinpoint those interpretations they typically held before the research, including any other interpretations not provided during the research.
  • Finally, the participants were debriefed about misinterpretations and the correct interpretation of the level of significance.

Analysis

  • Descriptive results

Generalization potential

This particular research appears to be severely limited to an unknown population of psychology academics (probably Sussex University, in the UK). Yet the results might, at least, serve as a working hypothesis for generalizing to other populations such as the following (in order of decreasing generalizability):

  • British psychology academics and researchers (including students).
  • British academics and researchers (including students) who also use NHST (such as academics from other social sciences, medicine, biology, etc).
  • Professionals trained in British universities (especially psychologists, social scientists, etc).
  • (See also partial replication studies by Haller and Krauss (20002) in Germany, and by Falk and Greenbaum (19951) in Israel, for potential generalizability beyond the UK).
References
1. FALK Ruma & Charles W GREENBAUM (1995). Significance tests die hard: the amazing persistence of a probabilistic misconception. Theory & Psychology, 1995, volume 5, number 1, pages 75-98. DOI 10.1177/0959354395051004.
2. HALLER Heiko & Stefan KRAUSS (2000). Misinterpretations of significance. A problem students share with their teachers. Methods of Psychological Research Online, 2002, volume 7, number 1, pages 1-20.
3. OAKES Michael (1986). Statistical inference: a commentary for the social and behavioral sciences. John Wiley & Sons (Chichester, UK), 1986.
+++ Notes +++
4. The newer edition makes a clearer distinction between p-values and statistical significance.
5. The original research statements have been rephrased here.
6. Notwithstanding this, all participants were debriefed about the correct and incorrect interpretations after the research.

Want to know more?

Wiki of Science - Hypotheses testing (disambiguation)
This Wiki of Science page lists alternative methods for testing the probability of data or hypotheses.
Wiki of Science - Null hypothesis significance testing
This Wiki of Science page reflects on the pseudoscientific bases of the null hypothesis significance testing (NHST) procedure typically used in the social sciences and medicine.
Wiki of Science - Studies which replicate Oakes's
You can find more information on two studies that partially replicated Oakes's one, in Wiki of Science. One study was done by Falk and Greenbaum in 1995; the other study was done by Haller and Krauss in 2000.

Editor

Jose D PEREZGONZALEZ (2011). Massey University, Turitea Campus, Private Bag 11-222, Palmerston North 4442, New Zealand. (JDPerezgonzalezJDPerezgonzalez).


BlinkListblogmarksdel.icio.usdiggFarkfeedmelinksFurlLinkaGoGoNewsVineNetvouzRedditYahooMyWebFacebook

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License