Significance level: disambiguation
"Level of significance" has several meanings, commonly confused and used interchangeably (Gigerenzer et al, 2004):
- A cutoff point: a conventional level of significance (p < 0.05, or p < 0.01) used for rejecting a "null hypothesis".
- An exact level of significance: an exact probability value (p) is used for reporting information about data but not for making decisions regarding whether a "null hypothesis" gets rejected or not.
- The "alpha level": a conventional significance level (eg, α = 0.05, α = 0.01) used for rejecting a hypothesis in favor of a second one (the "alpha level" represents the probability of rejecting the hypothesis in error, that is when it shouldn't be rejected). This meaning also implies setting a "beta level", the error of failing to reject a hypothesis that should have been rejected.
- A posterior probability: the probability that a hypothesis is correct or truthful.
- A hybrid confusion: a fifth meaning is the common confusion of all above as if they meant the same and were, thus, interchangeable.
What the significance level is not
A typical introduction to the significance level, especially in the social sciences, is an amalgamation of three meanings associated to three different approaches to data analysis. "The result is not [alpha], nor an exact level of significance, nor a conventional level. It is an emotional and intellectual confusion" (Gigerenzer et al, 2004). Yet the confusion gets perpetuated in research publications, and being "too picky" about it may see one's articles rejected for publication by peer-reviewers who are more familiar with this hybrid meaning than with the remaining (more appropriate) ones. Thus, it is good to, at least, introduce this hybrid meaning below, while the interested reader can access the other meanings in their appropriate pages.
The significance level is a value associated to some statistical tests, which indicates the probability of obtaining those or more extreme results. This value can be interpreted as the probability of obtaining those results if the null hypothesis were true (when sampling is random) or as the probability of obtaining those results by chance alone (when sampling is less than random).
The value of this probability (also known as "p", "p-value", "alpha" and "type I error") runs between 0 and 1. The closer to '0' the lower the probability of the results being found if the null hypothesis were true, or the lower the probability of the results being a chance result.
Significance levels are used to reject the null hypothesis that, for example, there is no correlation between variables, there is no difference between groups, or there is no change between treatments. A significance level of '0.05' is conventionally used in the social sciences, although probabilities as high as '0.10' as well as lower probabilities may also be used. Probabilities greater than '0.10' are rarely used. A significance level of '0.05', for example, indicates that there is a 5% probability that the results are due to chance. A significance level of '0.10' indicates a 10% probability that the results are due to chance. Thus, using significance levels above '0.10' is rather "risky", while using lower significance levels is "safer".
Test results with an associated probability equal to or lesser than the significance level are said to be "significant", meaning that the null hypothesis can be rejected in favor of the alternative hypothesis. Thus, the significance level is used as a cut-off point to reject the null hypothesis (or accept the alternative one), while at the same time indicates the chances of being wrong in so doing (ie, the chance of rejecting the null hypothesis when this is, in fact, correct).
The significance level is the maximum probability of error that a researcher wants to make when interpreting results. If he is using a 'two-tailed' test, then this 'p-value' is, in effect, worth '0.025' for each tail. A "trick in the hat" when doing statistical tests is to use 'one-tailed' tests, although this only works with directional hypotheses (ie, when you hypothesise that one group is going to be better than the other, but not viceversa).
Another way of putting it is the following: for each 100 tests done using a significance level of '0.05', you may expect 5% of them to be 'significant' merely by chance (or 1/25 tests, or 2/50 tests, approximately). If you do lots of tests but only obtain a small number of significant results, then you should be cautious when reporting your results, as some of them, although you do not know which, may be significant merely by chance. (See, for example, some of the footnotes in this article).
In order to prevent accepting too many 'significant' results when multiple tests are done, corrections such as the Bonferroni correction can be used. These corrections typically reduce that risk by estimating a lower "p-value" for rejecting the null hypothesis. However, instead of settling for a standard "p-value" (let's say, '0.01' instead of '0.05'), the estimated "p-value" is a function of the number of tests intended and the "initial"p-value".
- The significance level is a probability figure between 0 to 1 associated to most statistical tests.
- This figure is used as a cut-off point to reject the null hypothesis (eg, that there is no difference between groups) and accept the alternative hypothesis (ie, that there is a 'significant' difference between groups).
- Cut-off points are set up by convention. In the social sciences, conventional cut-off points are '0.05' or '0.01', representing a 5% and 1% chance of being wrong when claiming that the results are significant, respectively.
- All things being equal, standard errors will be larger in smaller data sets, so it may make sense to choose '0.1' for alpha in a smaller data set. Similarly, in large data sets (hundreds of thousands of observations or more), it is not uncommon for nearly every test to be significant at the alpha '0.05' level; therefore the more stringent level of '0.01' is often used (or even '0.001' in some instances) (Noymer, undated1).
- A result is said to be significant when its "p value" is equal of lower than the cut-off point. You could see it expressed as, for example, p = .05, or p < .05, or p =.49).
- The significance level is also known as "alpha", "type I error" and "p-level". It normally appears and is reported as "p" in output sheets and scientific papers (eg, p = .05).