A follow-up to a
previous post, owing to requests for further clarity. For those who requested, sorry for the delay in posting. I've sacrificed some technical
accuracy (no pun) to make the post a little readable. However, it is not a very-quick-read post.
In everyday parlance, the two - Precision and Accuracy - are often used interchangeably. In the field of statistics however, the two terms denote separate ideas.
I. The 'thought-survey' (term thoughtlessly borrowed from its experimental counterpart)Let's say you undertake a survey where the outcome of interest was 'Average annual income of undergraduate students in the University of Michigan, Ann Arbor'. And you got $2000 as an estimate.
However, like a good statistician you doubt whether this was a 'good' estimate. So you repeat the survey again - using the same survey design. This time you get $3000. Hmmm....quite a variance i.e this estimate is quite different from the earlier one.
So to be really convinced you do the survey again using the same survey design and end up with $1000. Wow! These numbers are really 'fluctuating'. You are now rapidly running out of time and money. So you promise yourself that you'll repeat the survey just one last time. And this time you get an estimate of $2,800. What's happenning? Why are these numbers so different? You've got four estimates : $2000, $3000, $1000, $2800.
You decide to discuss this with your friend, Peter, who was also coincidentally at that time conducting a similar project. Peter says "That's funny. I got the following estimates: $7,500 , $8,000 , $7,200 and $7,700"
Now both of you are confused. We have two issues on hand:
1) Why is it that your estimates are all over the place?2) Why are your estimates so different from Peter's?Fear not. Fortune favors the intelligently persevering (and now i'm sounding like Aesop). You stumble upon a previous study that said that the 'Average annual income of undergraduate students in the University of Michigan, Ann Arbor' is $2200 for the current year. The difference between this and Peter's/your study is that this is based on a Census. In other words, based on a complete coverage of all students.
Your numbers are however an estimate of this true number since you have sampled only a part of the student population. But guess what? You find - surprise, surprise - that your estimate on average is equal to the true value. That is, you take an average of your (four) sample estimates and this is equal to the true population mean.
II. Relation with Bias and Variance1) Your estimate is therefore an Accurate estimate. UNBIASED. But you also note that your estimates are varying ('fluctuate'). They are therefore not very Precise estimates.
2) Peter's estimate were on the other hand NOT ACCURATE. They were on the higher side. There was a BIAS (a positive bias). But to be fair they were PRECISE estimates. They had relatively LOW VARIANCE i.e the estimates were close together.
Thus:
HIGH Accuracy => LOW Bias
HIGH Precision => LOW Variance
III. Visualizing Accuracy and Precision conceptsThe best way to do this is to imagine that you were playing a game of darts. You then have four scenarios depending on whether the Accuracy and Precision is 'Low' or 'High'. See the following diagram (click to enlarge if required). See how your estimates and Peters' were between the two extremes of Ideal vs Sinful (!) estimates. Just for ease of visualization this has been represented in the form of a 2 dimensional picture. With reference to this example, just look along the horizontal line - the dotted line that i've drawn.
IV. But do we really conduct repeat surveysNo! In the practical world, we never have the luxury of time or money to conduct repeat surveys to see how good our estimates are. In practice, the survey statistician would first ensure that the design is UNBIASED. Along with ensuring unbiasedness, a survey statistician would ensure that the variance of the estimate was low. In your case, you had an unbiased sample. The reason for high variance was probably low sample size.
In Peter's case, he probably had a 'helpful' friend living in an off-campus condo who offered to introduce Peter to his friends in condos around. Obviously, these were all also those who worked in places offering higher wages (how else a condo :) , which is why his estimates were biased. The sample of course, is not a random sample and is unscientifically done. So it matters little that his estimates were precise since they were quite biased.
Also, please remember that the fact that the average of your four estimates is equal to the population value is just for illustration. The property of unbiasedness (or lack of it) is on average across many, many estimates.
V. Two basic reporting componentsTo enable the research user to assess precision, we always report the Margin of Error along with the Estimate in research reports/presentations i.e Estimate +/- Margin of Error; which gets expressed as a Confidence Interval. The estimate, of course, is designed to be unbiased. That is, the estimate is taken to be equal to the true population value.