Mann-Whitney U test for equality of distributions
The Mann-Whitney U test is a nonparametric statistic test most frequently used to assess whether two independent groups are significantly different from each other. Thus, it is often portrayed as the nonparametric equivalent to 'Student's t test'.
Eg, a typical SPSS output when computing a Mann-Whitney U test is a table as the following:
Variable Mann-Whitney U 497.500 Wilcoxon W 2398.000 Z -2.025 Asymp. Sig. (2-tailed) .043
Any of the three test outputs provided (Mann-Whitney U, Wilcoxon W, or Z) can be used for reporting results, as they are similar to each other and the associated probability (here expressed as 'sig') is equally common to all three. Test results, other than the associated probability, are not overly useful, as they only inform about the group distribution which is lowest (notice the negative sign for 'z' outputs). Other descriptives provided with the test, such as size, mean rank and sum of ranks per group is not very useful either. Thus, in order to interpret these results sensibly, central tendency statistics such as means or medians, as well as dispersion statistics such as the standard deviation or interquartile range, may need to be calculated separately.
Above result may be reported as "a statistically significant group difference (Mann-Whitney U=497.5, p=.043, sig≤.05, 2-tailed)".
More literally, it could be reported as "a probability which would normally occur about 1 time in 25 under the null hypothesis (Mann-Whitney U=497.5, p=.043), rare enough to suggest a statistically significant difference in ranked distributions between groups (given sig≤.05, 2-tailed). It can thus be inferred that both groups are different.".
Properly speaking, Mann-Whitney U is a rank-order test (or nonparametric test) for assessing not differences of means or medians but the distribution of two independent groups when combined into a single sample (ie, whether the scores of two independent groups have a similar ranked distribution). Thus the test assesses the location and range of the lowest group's distribution within the overall sample range3, and contrast this against a theoretical ranked distribution approaching normal ('U' or 'z' distribution, depending on sample size). Because of this, however, the test is also powerful in detecting differences between group means (Sawilowsky, 20071), and is commonly portrayed as the non-parametric substitute for Student's t-test when samples are not normally distributed.
How Mann-Whitney U works
The Mann-Whitney U test works by bringing the data of two independent samples into a single space (ie, by combining the data from two groups into a single sample). The data is then ranked in the overall sample irrespective of the group to which they pertain (although 'remembering' it for later use, of course) (see example below).
The size of each group (n), when multiplied by each other, provides the total range of the test. The 'Us' are calculated using the appropriate procedures, and the probability of the lowest 'U' is then calculated from the corresponding 'U' tables (when N<41) or 'z' tables (when N>41, and after approximating U to z).
Two independent groups (A & B) are brought together, ranking their scores into a single sample space. Group A gets ranks 3, 5-8, while group B gets ranks 1-6 (both groups tie at ranks 3, 5 & 6). Visually, both groups have some commonality, but they seem to sit at each extreme of this sample space. The question of interest is whether they are extreme enough as for us to infer they are, in fact, independent groups.
Only one of the groups (conventionally the group with the smallest 'U', usually that with the smaller ranks3) need to be tested. As can be observed, only the distribution of ranks enters the 'U' equation, not the mean or median of the ranks.
The resulting U (=5.5) is then checked against the appropriate table, which tests the U value against a theoretical distribution similar to the normal distribution. For N<41, each group's size will be taken as degrees of freedom for identifying the correct table and probability. For N>41, U can be approximated to 'z' and a normal distribution with mean=[(nA*nB)/2] will be used, instead. In any case, the table provides a value of p=.10 for the distribution (the other group's U, if calculated, would be 24.5 (=30-5.5, or UA=[nA*nB]-UB, resulting in a similar p=.10).
Above probability thus indicate that these results would occur 1 time in 10 by chance alone, not rarely enough if assuming a conventional significance level (sig<.05).
Group Total sample space A ranked 3 5 6 7 8 B ranked 1 2 3 4 5 6 Test 1 2 3 4 5 6 U 0 0 .5 1 1.5 2.5 = 5.5 p = .10
Eg, let's use the information in the video below as guidance. A survey is carried out on two independent groups, A and B. Group A is composed of 5 students (n=5) and group B is composed of 6 students (n=6). Students' survey scores in group A are: 8, 9, 6, 7, 4. Students' survey scores in group B are: 2, 5, 3, 6, 4, 7.
These data is ranked into a single sample, irrespective of the group to which they pertain (although keeping tabs about group provenance).
From the size of the groups, we can ascertain that the total range for the test is 30 (5*6), which means Ub+Ua=30. This also allows to set the null hypothesis that both groups will perform equally on those comparisons to a rate of 15/15 (ie, group A will be better than group B 15 times, and group B will be better than group A the remaining 15 times, or mean=15). In fact, with information about group size and level of significance, it is possible to set up the required effect size for a statistically significant 'U' beforehand, as well.
In any case, using the direct method, the value of 'U' for one or both groups can be calculated. The smaller 'U' is used for finding its probability against the corresponding U tables (given group size).
score 2 3 4 4 5 6 6 7 7 8 9 group B B B / A B B / A B / A A A U for B 0 0 0.5 1 1.5 2.5 = 5.5 U for A 2.5 4.5 5.5 6 6 = 24.5
In this example, U=5.5 for group sizes n1=5 and n2=6 yields p=.10. Thus, we conclude that the locations of both distributions are not extreme enough as to suspect low likelihood (ie, that the differences in distribution for both groups is not large enough as to be statistically significant).
Mann-Whitney U tests how much one of the groups deviates from the expected 'U' for the common median if both groups pertained to the same population. This split is the total number of possible comparisons between the scores of each group, which can be easily obtained with the formula [(n1*n2)/2] (Ho = 50/50 split of the total range of potential comparisons, or 50% of the time one group will be better than the other, and 50% of the time it will be worst than the other). Each distribution deviation from the 50/50 split is a mirror of the other, so only one 'U' is all that is needed to estimate (the other 'U' can be subtracted from the former). Also, for convenience, only the smallest of the two 'Us' is tested, and is interpreted as the probability of obtaining a result as extreme as (or more extreme, ie, smaller) than the one obtained, by chance alone. (The other 'U', if contrasted, would also yield a similar probability, but would be interpreted as the probability of obtaining a result as extreme as (or more extreme, ie, higher) than the one obtained, by chance alone.)
Eg, SPPs provides group size and the smaller 'U' as outputs. From there, 'Ho' can be calculated as = [(n1*n2)/2], and as each group distribution is symmetrical, information about the greater 'U' can be calculated, as well.
n1 U1 p n2 U2 p Ho 65 (494.0) (.00) 14 416.0 .00 (455.0) 39 405.0 .02 40 (1155.0) (.02) (780.0) 75 (238.5) (.05) 4 61.5 .05 (150.0)
Mann-Whitney U and medians
Mann-Whitney U does not assess differences between means or medians, as these statistics are never part of the formulation or procedure for obtaining and testing 'U'. However, it is a powerful tool for assessing, indirectly, mean differences and, if coherent, median differences.
The coherence of the later is important because when using nonparametric approaches, it is more coherent to report medians than means. Yet, if we don't pay attention to the fact that 'U' does not test medians or means, we may report silly statistics. The illustration below, adapted from Wikipedia (20122), exemplifies that:
We want to ascertain whether 'hares' and 'tortoises' are equally fast on a race track or not. We obtained a random sample of 20 hares and 20 tortoises, and got a pair running against each other on a racetrack. The illustration below identifies which animal won each race (H = hare, T = tortoise) and how many minutes it took the winner to complete the racetrack. A typical SPSS output follows.
Tortoises got a smaller median (= 2.1 minutes) than hares (= 2.2 minutes), and 'U' is statistically significant. Some researchers may report a significant difference between both groups in favor of the tortoises. Such conclusion would be completely erroneous. Indeed, hares are the fastest animal (sum of ranks = 102 versus 151), something that can also be observed in the illustration of the ranks.
H H H H H T T T T T T H H H H H H T T T T T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Hares Tortoises Mean Std.dev. Median IQR Mean Std.dev. Median IQR 1.93 .62 2.20 -.15 2.37 .62 2.10 1.20
Group Mean rank Sum of ranks U (P) Hares 9.27 102.00 36.00 (.108) Tortoises 13.73 151.00
Want to know more?