Monday 26 November 2012

Statistics - Measures of Central Tendancy

By the end of this post I hope that you'll be able to characterise a data-set with one piece of information  and more importantly briefly describe complex data in simple terms. 

We'll start with the mean. The mean is computed by adding up all the values and dividing by the number of observations there are (N or n). Two symbols appear for means, these are x̄ (sample) and μ (population). If you take each score in a distribution and subtract the mean from it, and add all these differences the sum will always be 0. The mean does have some disadvantages, such as extreme scores pulling the mean one way or another. This issue doesn't occur with the median. To work out the population mean, or μ, all data in that population must be added up and divided by the population. For the sample mean, or x̄, all sample data must be added together and divided by the number of observations in the sample. 

A different type of mean to the arithmetic mean above is a weighted mean. This allows us to create accurate calculations even when all the information isn't known. It's fairly straight forward, like above. The formula for weighted means is as follows: x̄ = (w1X1 + w2X2 + ... + wnXn) / (w1 + w2 + ... + wn). 'w' here denotes the weight given to the value 'X', the higher the weight the more influence it has on the mean. If the weights are all 1 you essentially have the same formula for as the arithmetic mean in the previous paragraph. The problem with this is that sometimes the weights aren't known and outliers are very common in economics. 

Next, we move on to the median. This, as many already know, is the middle score when the observations are arranged in order. If there is an even number of scores, it's the mean of the middle two values. There is a unique median for each set of data. It isn't affected by extreme values and is therefore a very good measure of central tendency. 

The logical step now is to introduce the mode. The mode is the most frequently appearing value in a data-set. It is of limited use because it doesn't give any weighting to unique values. Other problems arise, such as some data having no mode and some having more than one. It really doesn't give a good measure of central tendency for a set of data. 

Skewness! A few graphs will crop up in this section. Skewness tells us in which direction the data swings, it is normally represented graphically. We start off with the bell curve. This is when the mode = median = mean, we have no skew and the data fits nicely into this 'bell' shape shown below. The distribution is symmetrical. 



A positively skewed distribution occurs when the mode < median < mean. The data is all bunched to the left of the mean and then it falters out. It looks like below.


Finally, the negatively skewed distribution. It's the stark opposite of above, the mean < median < mode. Here it is. 


You might have worked out by now, by skewness measures the lack of symmetry of the distribution. This lack of symmetry can be given a numerical value ranging from -3.00 to 3.00. A value of 0 indicates a symmetric distribution. It is worked out as follows: sk = 3(x̄ - Median) / s. 's' stands for standard deviation which will be coming up in the next blog post, so stick around for that. 

That's all for this post. I hope now you're able to describe your data set fairly easily, tell us the middle value, the average in which direction it's skewed and so on and so forth. It's a good tool to have, yet still very basic. We'll be stepping up another gear next post as range and deviations are introduced. Keep checking back, thank you guys!

Sam. 


1 comment:

  1. This Is a Nice Blog. Verify the authenticity of the teacher upon whom you are depanding for studying economics. A person who has natural communication skills, can be a good.economic tutor

    ReplyDelete