Dispersion of the mode
The modal dispersion (or dispersion of the mode, MD) is a measure of statistical dispersion in nominal distributions. I introduce it here as a modified version of the variation ratio. The variation ratio assesses dispersion as the proportion of scores not accounted for by the mode. However, it is more of an indicator of the 'peak' of the mode than of real dispersion. For example, two variables with the same number of scores (N) and mode (Mo), but one "using" more values than the other, are equally disperse according to the variation ratio (even when it is reasonable to expect that the variable using less values be less disperse).
The modal dispersion takes into account the total number of values actually "used" in the distribution, so that the spread of the distribution is correspondingly affected. This makes the modal dispersion a more dependable measure of dispersion.
The formula for calculating the modal dispersion is the following (compare results against those obtained as variation ratio):
|modal dispersion = (total number of cases - cases in the mode) / (total number of cases / total values used)|
Eg, three groups (n=100, each) made of more women than men:
Mode = women (n=90); modal dispersion = (100-90) / (100/2) = 10/50 = 0.2
Mode = women (n=80); modal dispersion = (100-80) / (100/2) = 20/50 = 0.4
Mode = women (n=60); modal dispersion = (100- 60) / (100/2) = 40/50 = 0.8
The modal dispersion does not simply ascertain the proportion of cases not in the mode, as the variation ratio does, but is a true measure of dispersion.
Eg, assume two independent surveys randomly done on a city street and asking people their provenance in relation to the location where the survey took place. The results are the following:
Provenance North East West South Mode M.D. Survey A 3 2 5 7 South 2.4 Survey B 0 4 6 7 South 1.8
Although both surveys yield the same mode, survey A is expected to be more dispersed, which is what the modal dispersion shows:
Survey A, MD = (17-7) / (17/4) = 10 / 4.25 = 2.4 (where '4' is the count of values 'North', 'East', 'West' and 'South')
Survey B, MD = (17-7) / (17/3) = 10 / 5.67 = 1.8 (where '3' is the count of values 'East', 'West' and 'South')
The modal dispersion is also relatively easy to interpret, as it, practically, runs between '0' and the maximum count of values in the distribution minus 1:
- Modal dispersion is practically '0' when all but one score in the distribution take one value and the remaining score a second value (this is the lowest possible deviation, short of the variable becoming a 'constant').
- Modal dispersion is practically the maximum amount of values minus one when all values in the distribution have the same frequency except for one, which have one more score onto it (this is the highest possible deviation, short of the distribution being completely rectangular).
Eg, in above cases regarding gender groups, the minimum and maximum dispersions when n=100 are:
Mode = women (n=99); modal dispersion = (100-99)/(100/2)= 1/50= 0.02 (≃0)
Mode = women (n=51); modal dispersion = (100-51)/(100/2)= 49/50= 0.98 (≃2-1)
And in the case of surveys regarding geographical provenance, the minimum and maximum dispersions when n=17 are:
Provenance North East West South Mode M.D. Survey A 1 0 0 16 South 0.12 Survey B 4 4 4 5 South 2.82
Mode = South (n=16); modal dispersion = (17-16) / (17/2) = 1/8.5 = 0.12 (≃0)
Mode = South (n=5); modal dispersion = (17-5) / (17/4) = 12/4.25 = 2.82 (≃4-1)
Interpreting the modal dispersion
A modal dispersion will accompany its corresponding mode, thus providing more information about the underlying distribution of scores. The mode announces the most frequent value (or values) in the distribution, while the modal dispersion provides a relative measure of how spread that distribution is, especially if there is a some indication of the values used. If there is no indication, they may be approximated by counting the mode as '1' and adding it to the modal dispersion (eg, [mode] 1+0.12=1.12, indicative of most scores being onto one value; [mode] 1+2.82=3.82, indicative of scores occupying 4 values, almost in equal manner).
For example, in the case of above surveys regarding geographical provenance, and given the following information:
Provenance Mode M.D. Survey A South 0.12 Survey B South 2.82
Such information can be interpreted as "Both surveys A and B had 'South' as typical geographic provenance, although they differed greatly in their modal dispersion. Considering a maximum of four cardinal points as possible geographical provenance, survey A's dispersion is almost 'zero', indicative of most respondents being from the South, while survey B's dispersion is quite large, indicative of respondents coming from all four cardinal points in almost the same proportion.