20110908 - Misinterpretation of 'p' (2000)

[Data]

(Note: A more conceptually sound edition of this article exists⁴)
[<Normal page] [PEREZGONZALEZ Jose D [ed] (2011). Misinterpretation of 'p' (2000). Journal of Knowledge Advancement & Integration (ISSN 1177-4576), 2011, pages 110-112.]

Table of Contents

Misinterpretation of 'p'

Generalization potential

Misinterpretation of 'p'

Haller and Krauss (2000²) carried out a study on common misinterpretations of the level of significance (p) among German psychology students and academics, which partly replicates one done by Oakes (1986³). Typically, most of these misinterpretations confuse the level of significance (ie, the probability of the data assuming that the null hypothesis is correct) with the probability of proving or disproving hypotheses (be this the null hypothesis or an alternative hypothesis). Another misinterpretation is the so-called "replication fallacy", which occurs when the probability of the data is assumed to represent the probability of finding similar results if the research were to be repeated, failing to notice that this could only happen if the null hypothesis were, indeed, true.

Haller and Krauss found that most participants held at least one misinterpretation out of the six presented (see table 1). They also found that, overall, 100% of psychology students held one or more misinterpretations (mean = 2.5), almost 90% of psychology researchers also held one or more misinterpretations (mean = 2), and 80% of instructors of statistics in psychology also held one or more misinterpretations (mean = 1.9). The authors thought worrisome the high percentage of instructors with misinterpretations, as these may pass those misinterpretations down to students. Another interesting result, one not highlighted by the authors, though, is the high percentage of researchers (including instructors when carrying out and publishing research) with misinterpretations, as these would perpetuate those when publishing, peer-reviewing others' publications, and making research-informed decisions (such as chairing committees, granting funding, etc).

Table 1. Percentages of misinterpretations of the statistical significance
Common misinterpretations⁵	Stat. instructors	Researchers	Students
Significance disproves the null hypothesis	10%	15%	34%
Significance informs of the probability of the null hypothesis	17%	26%	32%
Significance proves the alternative hypothesis	10%	13%	20%
Significance informs of the probability of the alternative hypothesis	33%	33%	59%
Significance informs of the probability of making a type I error	73%	67%	68%
Significance informs of the probability of the results if replicated	37%	49%	41%

(Participants who answered that all of above were false)	20%	10%	0%

Methods

Research approach

Replication study using a German sample. The original study had been carried out by Oakes (1986³) with a British sample of psychology academics 15 years earlier.

Sample

A convenient sample of 113 participants from departments of psychology in 6 German universities. 44 participants were psychology students, 39 participants were research psychologists not involved with teaching statistics, and 30 participants were instructors of statistics in psychology (including lecturers and tutors).

Materials & analysis

Oakes's (1986³) questionnaire translated into German:
- The questionnaire consisted of a small scenario and six statements. The scenario described a small research with two-independent samples, and provided the relevant results: a t-test which achieved 'p = 0.01'.
- The six statements asked for a true / false decision regarding whether each particular statement reflected a logical interpretation of the results. Unknown to the participants, all statements were false, representing six common misinterpretations of statistical significance.
- The study also provided the 'hint' that "several or none of the statements may be correct".

Generalization potential

This particular research was done with a sample of psychology academics and students from different universities in Germany, and its design appears to be more valid than that of previous studies. It also found similar trends than Oakes, 1986³, in the U.K., and Falk and Greenbaum, 1995¹, in Israel. Thus, these results may be generalizable to the following populations (in order of decreasing generalization power):

German, British and Israeli psychology academics and researchers (including students).
Psychology professionals trained in German, British and Israeli universities.
Psychology professionals and academics elsewhere.
Other scientists (especially from the social sciences, medicine and business) which rely on the NHST procedure.

References

1. FALK Ruma & Charles W GREENBAUM (1995). Significance tests die hard: the amazing persistence of a probabilistic misconception. Theory & Psychology, 1995, volume 5, number 1, pages 75-98. DOI 10.1177/0959354395051004.

2. HALLER Heiko & Stefan KRAUSS (2000). Misinterpretations of significance. A problem students share with their teachers. Methods of Psychological Research Online, 2002, volume 7, number 1, pages 1-20.

3. OAKES Michael (1986). Statistical inference: a commentary for the social and behavioral sciences. John Wiley & Sons (Chichester, UK), 1986.

+++ Notes +++

4. The newer edition makes a clearer distinction between p-values and statistical significance.

5. The original research statements have been rephrased here.

Want to know more?

Haller & Krauss's article: The original article also offers explanations for the falsehood of each statement, and recommendations for teaching statistics in order to prevent those misinterpretations. You can access the original research article as, HALLER Heiko & Stefan KRAUSS (2000). Misinterpretations of significance. A problem students share with their teachers. Methods of Psychological Research Online, 2002, volume 7, number 1, pages 1-20. (Note: the article mistakenly places Oakes's research in the US).
Wiki of Science - Hypotheses testing (disambiguation): This Wiki of Science page lists alternative methods for testing the probability of data or hypotheses.
Wiki of Science - Null hypothesis significance testing: This Wiki of Science page reflects on the pseudoscientific bases of the null hypothesis significance testing (NHST) procedure typically used in the social sciences and medicine.
Wiki of Science - Related studies: You can find more information on two related studies in Wiki of Science. One was the original study done by Oakes in 1986; the other study was a replication done by Falk and Greenbaum in 1995.