## Friday, 23 November 2012

### Statistics - Introduction

*Disclaimer: I have a statistics test coming up soon, so expect the next few blog posts to be all statistics related. I'll be getting back to normal economics after the 8th December. Statistics, however, is useful to people outside the discipline of economics as well as econometrics studies.*

I'll start at the very basics of statistics. This post will contain a few definitions and a few formulae, nothing too taxing as we're just setting the scene, so to speak. In statistics, we have two types of variable; a qualitative variable and a quantitative variable. The features of these are as follows:

• Qualitative: Non numeric. For example: gender, religion.
• Quantitative: Numeric. For example: bank balance, age. Can be either discrete of continuous.
• Discrete - Can only have particular numbers. Example: Family members (Only whole numbers)
• Continuous - Anything else. Example: Weight, height.

Data can define on different levels as well. We can have nominal, ordinal, ratio or interval data. Again, these all have different characteristics:

• Nominal Data - Categorised data that cannot be arranged into an order. Eye colour, for example. This data can be either mutually exclusive and/or exhaustive. It's usually qualitative.
• Mutually exclusive - Can only be included in one category. Only one eye colour for example.
• Exhaustive - The data must appear in at least one category.
• Gender is mutually exclusive and exhaustive.
• Ordinal Data - Data that can be arranged into some sort of order. Individuals can be compared with one another here because of rankings, however the distances between each piece of data has no meaning.
• Interval Data - Can tell differences between the distance of data, yet we still aren't able to say that 100 is twice 50.
• Ratio Data - Has a 0 point, e.g number of family members. Now we can say that a family of 4 members is twice as big as a family of 2.

I'll now move on to looking at frequency distributions and how we can present these graphically. A frequency distribution is a grouping of data into mutually exclusive categories showing how many observations are in each class. A class is a subset of the whole range of the data, each class will be the same size and all the classes together will make up equal to or more than the range of data.

The class midpoint is the average of the upper and lower limits of the class. The class frequency is how many observations fall into that class. The class interval is the difference between the upper and lower limit of the class.

There is a rule for how to divide your data into classes. It's named the 2 to the k rule. In this example we refer to 'N', this is the number of observations. The rule is you choose k so that 2 to the power of k > N. This value of k will be the number of classes. How to choose the size of these classes is fairly straight forward as well. Minus the upper score by the lower score and then divide this value by the k we just got from the last formula. A decimal value may well be given, in that case round UP to the nearest sensible number. The data can now be displayed clearly in a frequency distribution diagram/table.

Relative frequency distribution is essentially the same as above, except adding one more column in the table. This new column shows the percentage of observations in that class. Divide the amount of observations in that class by the total amount of observations and multiply by 100 to get the percentage. Simplez.

The data can be displayed graphically in many forms, none of them being 'wrong' per say. Histograms, frequency polygons and cumulative frequency distributions are examples of these graphical representations. These are all for examining data and trends in that data.

You may ask the question about line graphs here, when do we use these? In general, line graphs are used when time is involved. They can show the change over a period of time. Bar charts are for showing different categories that have no clear link, or to measure frequencies. Pie charts are simply for showing proportions. Scatter graphs show us co-variation of two variables, but NOT time. Any over diagram are subjective and not advised.

Introduction to statistics complete! I hope that makes enough sense to you all, comment your problems if not and I'll get back to you ASAP. Thanks for reading, good luck!

Sam.