Standard deviation

Table of Contents

The standard deviation (σ) is a measure of the average variability in a dataset, which is to say, how close or far away the data is from the mean. It approximates the mean distance (in natural numbers) between each datum and the arithmetic mean.

For example, if we have a set of values like 1, 2, 3 (thus, mean = 2), we could estimate the average deviation of each datum from the mean, firstly, by subtracting the mean from each datum, secondly, by adding the resulting individual deviations as natural numbers (otherwise they would add up to '0'), and, thirdly, by dividing that result by the number of total data in the set. Thus:

mean deviation = [|(1-2)| + |(2-2)| + |(3-2)|] / 3 = (|-1| + |0| + |1|) / 3 = 2/3 = 0.66²

If we have the set of values 1, 2, 9 (thus, mean = 4), the estimating average deviation would be:

mean deviation = [|(1-4)| + |(2-4)| + |(9-4)|] / 3 = (|-3| + |-2| + |5|) / 3 = 10/3 = 3.33³

The most interesting property of the standard deviation is that it gives you a glimpse of how far away from the mean the bulk of your data is. That is, the majority of the data (68.2% if the distribution is normal) would be located between one standard deviation above and below the mean. The majority o the data (95.45% if the distribution is normal) would be located between two standard deviations above and below the mean. And almost the totality of the data (99.73% if the distribution is normal) would be located between three standard deviations above and below the mean. Therefore, if a standard deviation is rather small, it indicates that the data tends to cluster close to the mean, without 'deviating' much away from it. A large standard deviation suggests a spread out distribution, with data 'deviating' far away from the mean.

Furthermore, you can use the standard deviation to assess the relative position of particular data. That is, data that is located three standard deviations above or below the mean are 'sensibly' less representative than data within two standard deviations of the mean, and these less so than data within a standard deviation from the mean. In fact, data located beyond three standard deviations above or below the mean can be considered extreme cases, probably even statistically significantly so.

Properties

The standard deviation is a measure of dispersion of the data (ie, of how close or far away the data is from the mean).
The standard deviation can be used as a measure of uncertainty.
The standard deviation can be used as a measure of significance.
The standard deviation for a population is represented as 'σ'. When calculating 'σ' you also use the total population size ('N').
The standard deviation for a sample is represented as 's'. When estimating 'sd' you use the total sample size minus 1 ('n-1').

Calculation

In Excel: The function STDEV calculates the standard deviation of the selected sample of values.
By hand (from scratch): Firstly, subtract the mean from each datum; secondly, square the resulting individual distances; thirdly, calculate the arithmetic mean of the squared distances (using 'n-1' for samples); finally, perform a square-root on the resulting mean.
By hand (from the variance): Perform a square-root of the variance.

References

1. full reference in the following format AUTHOR (date work).Title. Reference location, date publication. ISBN/ISSN.

+++ Footnotes +++

2. For a population; the mean deviation for a sample would be 1. The actual standard deviations are σ = 0.82, and sd = 1, respectively.

3. For a population; the mean deviation for a sample would be 5. The actual standard deviations are σ = 3.56, and sd = 4.36, respectively.