Tuesday 27 November 2012

Statistics - Measures of Dispersion

The first measure of dispersion of data we'll mention is the range. The range is the difference between the highest value and the lowest value in a set of data. Only these two values are used in the calculation and it is very easy to compute. It has a slight issue, however, that extreme values do influence the result. 

A step on from this range is the interquartile range. This is the difference between the first quartile and the third quartile - giving us the middle 50% of observations. To work out the first quartile, we take N (number of observations) and divide it by 4. This will give us the number of the observation at which the first quartile point is. To work out the third quartile we take N and divide it by 4 and then multiply the result by 3. This gives us the number of the observation at which the third quartlile mark is. From there, we just subtract the value of the first quartile figure away from the third quartile figure. 

The quartiles are then displayed on a box plot diagram. A box plot will look generally as follows:


Mean deviation is another measure of dispersion. This measures the mean of the absolute values of the deviations from the mean. Similar to standard deviation, but not quite. The formula is as follows:

  • Mean Deviation = (Σ|Xi - x̄|) / n
  • Xi = each observation
  •  = the sample mean.
  • n = number of observations.

We take the absolute values here for a very specific reason. It stops the negative and positive values from cancelling each other out, which would give us a mean deviation of close to 0 - very unhelpful! Dispersion is very important, key statistical methods such as regression rely heavily on measures of dispersion. 

The population variance and sample variance are two more concepts I'm going to introduce now. The population variance measures the arithmetic mean of the squared deviations from the population mean. The sample variance essentially does the same, but for a sample. The formula for both are as follows:

  • Population variance = (Σ|X - μ|)^2 / N
  • Sample variance = (Σ|Xi - |)^2 / (n - 1)

These variances can be easily turned into the standard deviations, a very important concept for statisticians. To do this, we just square root the result. We denote the standard deviation of a population and a sample differently. A population is given with this symbol: σ and a sample is given with the letter s. The standard deviation principle will come in key in the next few posts when we begin to introduce confidence intervals, so learn it!

Thanks for reading, have a good day.
Sam.


No comments:

Post a Comment