## Saturday, 1 December 2012

### Statistics - Sampling Methods and Estimation

In statistics we have to use samples because it's normally near on impossible to get data for the entire population. As long as the sampling is done well, the results will usually be good enough. Logic would tell you that the larger the sample, the better.. and this is true. There are two concepts we need to understand here, those are random sampling and sampling distribution.

• Random sampling - The goal of this is representativeness, we aim to get an equal probability of selection to every member of the population. There are a few methods:
• Simple random sampling - A sample so that every item or person in a population has the same chance of being included.
• Systematic random sampling - Items or individuals are arranged in some sort of order. A random starting point is selected and then every nth member is selected. Alphabetic order for example.
• Stratified random sampling - A population is divided into sub groups (strata) and a sample is selected from each strata.
• Cluster sampling - A population is divided up into primary units and then samples are selected from the primary units.
• Non-probability sampling - Inclusion in the sample is based on the judgement of the person selecting the sample. (Eeek!)

• Sampling Distribution - This is the theoretical distribution of a statistic for all possible samples of a certain sample size, N. It's a device to link the samples characteristics to the population.
• If repeated sample sizes of size N are drawn from a normal population with a mean of mew and a standard deviation, σ, then the sampling distribution of sample means will be normal with a mean of mew and a standard deviation of σ / SqrRoot(N).
• The 'Central Limit Theorem' states that if repeated samples of size N are drawn from a population, as N becomes large the sampling distribution or sample means will approach normality.
• Or, in easier terms: Large samples are more reliable!

The more basic method of estimation is confidence intervals. From a sample we don't know the population mean, but we would like to estimate this with maximum efficiency. To do this we use a range, and say how certain we are that this range includes the population mean. We give a confidence interval in the form of a percentage, for example we could say that at a 99% confidence interval, between 33% and 39% of adults will vote for Labour in the next election (Made up!). A bigger confidence interval is more likely to contain the true population mean.

The next post will go further into the concept of confidence intervals and we will introduce such things as error margins. Stay tuned, thanks guys!

Sam.