Null hypothesis significance testing

Null hypothesis significance testing


The Null Hypothesis Significance Testing (NHST) or the Null Hypothesis Significance Testing Procedure (NHSTP) refers both to a misunderstood procedure of hypothesis testing, especially in the social sciences, and to a tongue-in-cheek reference to such procedure by authors such as Cohen (19941) and Gigerenzer et al (20042), respectively.


The basic procedure is something as follows:

  • a) Set up a null hypothesis such as "There is no mean difference between groups" or "There is no correlation between variables".
  • b) Optionally, set up an alternative non-directional hypothesis such as "There is a (significant) mean difference between groups", or "There is a (significant) correlation between variables". Or set up a directional hypothesis such as "The experimental group will perform significantly better than the control group".
  • c) Optionally, if known how to do it, estimate power and sample size.
  • d) Using a conventional level of 5% (or 10%, or 1%) and an appropriate two-tailed or one-tailed test, reject the null hypothesis in favor of the alternative one if 'p' is equal or lower than the selected conventional level; otherwise do not reject it (you may report that there is not enough evidence to reject it, though). That is, you cannot accept or prove the null hypothesis, but you can reject or disprove it; on the other hand, you can accept or prove the alternative hypothesis, but you cannot reject or disprove it.
  • e) Interpret your results as the probability of the retained hypothesis of being correct, especially if 'p' is not too close to the conventional level. If it is too close to it, and humility is within you, interpret correctness together with a modest chance of making a particular error (either alpha or beta) in your decision.

NHSTP explained

Above procedure is a mix of three different statistical approaches to hypotheses testing:

  • (a) and (d) are steps within Fisher's significance testing theory. This theory gives name to the NHST procedure, and sometimes act as the default hypotheses testing model: a null hypothesis is normally made explicit; conventional probability levels are used to decide whether to reject or not the null hypothesis; and either those conventional levels or the exact probability is reported as support for above decision.
  • (b) and (c) are steps within Neyman-Pearson's hypotheses testing approach, which complements Fisher's approach and, sometimes, act as the default hypotheses testing model. This approach provides an alternative hypothesis to accompany the null (this alternative hypothesis is sometimes made explicit, other times simply assumed), two errors (alpha and beta), which can be determined before the experiment, and, for the most statistically aware, the opportunity to estimate "power" (=1-beta) and sample size from each other (ie, given a sample size and effect size, power can be estimated; alternatively, if the power is known, the appropriate sample size can be estimated).
  • (e) is an example of posterior probability within Bayes's theorem, which complements above two by helping interpret significance levels (p) or probability errors (typically, alpha, α), or both, as the probability of correctness or truth of either hypothesis: the alternative hypothesis, when the research is successful in finding an effect, or the null hypothesis, if research is unsuccessful.

In action

This video describes the 'technology' for calculating 'p' values and assessing 'levels of statistical significance'. Yet it also represents the pseudoscientific approach to NHST, mixing Fisher's, Neyman-Pearson's and Bayes' theories. For example:

  • The video is entitled as hypothesis testing (a Bayesian approach) but it only is about testing data and inferring about the underlying hypothesis (a Fisherian approach).
  • The enunciation of the problem already calls for trouble. Firstly, it ignores all information about the reliability of the data, which may actually support inference from data to hypothesis (eg, whether the rats have been randomly selected, which experimental controls were put in place, etc.). Secondly, it only provides statistics (means and standard deviation) about observed data, but calls for an inference on the effect of the drug on response time, unwittingly suggesting that it is 'safe' to infer from raw data to theoretical assumptions (when we don't even know the quality of the data). Thus, this enunciation of the problem misses an important step in Fisher's approach: information about the experimental design, which would warrant the appropriateness of the statistical apparatus and the inference from sample to the population.
  • The setting of the hypotheses is also problematic. Firstly, it reinforces the idea that testing the data actually tests the hypotheses (a pseudo-Bayesian approach). The 'null hypothesis', for example, should be that there is no difference between the means of both groups, rather than that the drug has no effect (as no information is given regarding experimental control, any hypothetical change in response times may be due to other things that the drug). Secondly, there is no need for the alternative hypothesis as it is simply the negation of the 'null hypothesis' (ie, it is redundant unless it is done within a Neyman-Pearson's approach, which is not the one followed here).
  • The rest of the video, explaining the technique of calculating the probability of the data and assessing its significance, follows Fisher's approach to significance testing.
  • The conclusion to the video, however, is unwarranted. The results may be used as evidence against the null hypothesis (of the group means being equal, and the inference that the drug has no effect) (Fisherian approach), but they are not a "strong indicator […] that the drug definitely has some effect" (a Bayesian approach, and a decision made as if under a Neyman-Pearson's approach). In any case, not even the rejection of the null hypothesis is warranted, as, as said earlier, we lack knowledge of the quality of the data (ie, of how the experiment was designed and run).

Pseudoscientific bases

The NHST procedure is a technology based on a pseudoscience, a pseudoscience born from a confusion between theories, both their philosophical and logical bases as well as their relatively similar concepts, together with good doses of what Gigerenzer et al (20042) called as "wishful thinking". In a nutshell, above theories have some commonalities but are for the most philosophically and logically incompatible. Below is a quick view of some of the commonalities and incompatibilities between them:

Author Fisher Neyman-Pearson Bayes
Philosophy inductive deductive deductive
Approach inferential confirmatory confirmatory
Hypotheses 1 (null) 2 or more 2 or more
Test against hypothetical distribution competing hypotheses competing hypotheses
Logic of test probability of data estimation of error prior probability of hypotheses
Probability is ad hoc evidence long-run probability posterior probability
Objective new knowledge correct prediction correct prediction

Scientific alternatives

1. COHEN Jacob (1994).The Earth is round (p < .05). American Psychologist (ISSN 0003-066X), 1994, volume 49, number 12, pages 997-1003.
2. GIGERENZER Gerd, Stefan KRAUSS & Oliver VITOUCH (2004).The null ritual: what you always wanted to know about significance testing but were afraid to ask. Chapter 21 in David KAPLAN [ed] (2004). The SAGE handbook of quantitative methodology for the social sciences. SAGE (California, USA), 2004. ISBN 9780761923596.

Contributors to this page

Authors / Editors


Other interesting sites
Journal KAI
Wiki of Science
The Balanced Nutrition Index
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License