Friday, November 24, 2006

Two Great TED-talk videos

As many of you might know, TED stands for Technology-Entertainment-Design. The TED conference is an annual one held in Monterey, California. As the introduction video says "TED is a preview of heaven".

These are truly inspiring talks from a range of disciplines - science, music, sport etc. Starting from first 1984 session which included the "public unvieling of the Macintosh and the Sony Compact disk", TED does truly continue to inspire.

Below are two statistics' videos. I showed the first five minutes of the following video to my class this week and they liked it. Even the ones who hate statistics liked it :). The talk is by Prof. Hans Rosling of the Karolinska Institute who "brings global data to life". IF YOU ARE GOING TO BE WATCHING THIS, DO SEE IT TILL THE 5:15 MARK AT LEAST.



The second one was thanks to Sandeep who pointed me to it. Another very nice (what else from TED) one. This is by Prof. Peter Donnelley of Oxford who "explores the common mistakes humans make in interpreting statistics, and the devastating impact these errors can have on the outcome of criminal trials".



Other videos are available on the blog. Enjoy!

Wednesday, November 15, 2006

Precision and Accuracy : Different words, Same thing?

A follow-up to a previous post, owing to requests for further clarity. For those who requested, sorry for the delay in posting. I've sacrificed some technical accuracy (no pun) to make the post a little readable. However, it is not a very-quick-read post.

In everyday parlance, the two - Precision and Accuracy - are often used interchangeably. In the field of statistics however, the two terms denote separate ideas.

I. The 'thought-survey' (term thoughtlessly borrowed from its experimental counterpart)

Let's say you undertake a survey where the outcome of interest was 'Average annual income of undergraduate students in the University of Michigan, Ann Arbor'. And you got $2000 as an estimate.

However, like a good statistician you doubt whether this was a 'good' estimate. So you repeat the survey again - using the same survey design. This time you get $3000. Hmmm....quite a variance i.e this estimate is quite different from the earlier one.

So to be really convinced you do the survey again using the same survey design and end up with $1000. Wow! These numbers are really 'fluctuating'. You are now rapidly running out of time and money. So you promise yourself that you'll repeat the survey just one last time. And this time you get an estimate of $2,800. What's happenning? Why are these numbers so different? You've got four estimates : $2000, $3000, $1000, $2800.

You decide to discuss this with your friend, Peter, who was also coincidentally at that time conducting a similar project. Peter says "That's funny. I got the following estimates: $7,500 , $8,000 , $7,200 and $7,700"

Now both of you are confused. We have two issues on hand:
1) Why is it that your estimates are all over the place?
2) Why are your estimates so different from Peter's?

Fear not. Fortune favors the intelligently persevering (and now i'm sounding like Aesop). You stumble upon a previous study that said that the 'Average annual income of undergraduate students in the University of Michigan, Ann Arbor' is $2200 for the current year. The difference between this and Peter's/your study is that this is based on a Census. In other words, based on a complete coverage of all students.

Your numbers are however an estimate of this true number since you have sampled only a part of the student population. But guess what? You find - surprise, surprise - that your estimate on average is equal to the true value. That is, you take an average of your (four) sample estimates and this is equal to the true population mean.

II. Relation with Bias and Variance
1) Your estimate is therefore an Accurate estimate. UNBIASED. But you also note that your estimates are varying ('fluctuate'). They are therefore not very Precise estimates.

2) Peter's estimate were on the other hand NOT ACCURATE. They were on the higher side. There was a BIAS (a positive bias). But to be fair they were PRECISE estimates. They had relatively LOW VARIANCE i.e the estimates were close together.

Thus:

HIGH Accuracy => LOW Bias
HIGH Precision => LOW Variance
III. Visualizing Accuracy and Precision concepts
The best way to do this is to imagine that you were playing a game of darts. You then have four scenarios depending on whether the Accuracy and Precision is 'Low' or 'High'. See the following diagram (click to enlarge if required). See how your estimates and Peters' were between the two extremes of Ideal vs Sinful (!) estimates. Just for ease of visualization this has been represented in the form of a 2 dimensional picture. With reference to this example, just look along the horizontal line - the dotted line that i've drawn.IV. But do we really conduct repeat surveys
No! In the practical world, we never have the luxury of time or money to conduct repeat surveys to see how good our estimates are. In practice, the survey statistician would first ensure that the design is UNBIASED. Along with ensuring unbiasedness, a survey statistician would ensure that the variance of the estimate was low. In your case, you had an unbiased sample. The reason for high variance was probably low sample size.

In Peter's case, he probably had a 'helpful' friend living in an off-campus condo who offered to introduce Peter to his friends in condos around. Obviously, these were all also those who worked in places offering higher wages (how else a condo :) , which is why his estimates were biased. The sample of course, is not a random sample and is unscientifically done. So it matters little that his estimates were precise since they were quite biased.


Also, please remember that the fact that the average of your four estimates is equal to the population value is just for illustration. The property of unbiasedness (or lack of it) is on average across many, many estimates.

V. Two basic reporting components

To enable the research user to assess precision, we always report the Margin of Error along with the Estimate in research reports/presentations i.e Estimate +/- Margin of Error; which gets expressed as a Confidence Interval. The estimate, of course, is designed to be unbiased ('tending to' the true population value after many runs)

Saturday, November 04, 2006

Pirating films? You Goonda!

Yes. That's right. If you are in the state of Karnataka (in India), you better not burn a CD of some movie and pass it to your friend Rajesh. If you do so you are a goonda. This was the announcement made by the Chief minister of Karnataka.
Keeping his film background in mind, Kumaraswamy announced that the Goonda Act will be imposed very strictly on those who pirate movies in the state.
So, what is a 'goonda'? Loosely, it corresponds to 'goon'. Apparently the word goon came from goonda! So a goonda is like a dreadful fellow that you chance to meet on a lonely dark street who pulls you by the collar and requests you politely to part with your wallet, your chain, laptop and some other sundry things that you might not have any use for.

And guess what - you also have a 'Goonda Act'! I mean, i thought that was really lazy. Couldn't they come up a nicer name like say, "Prevention of unlawful blah-blah act". I would hate to ever get arrested under the goonda act.

Friend: "So, i heard you did time?"
Me: Yeah, that's right.
Friend: What did you get booked under?
Me: What a nice day outside.
Friend: Don't change the topic. Don't tell me you were booked under the 'goonda act'. harhahahrha (roar of laughter)

But now:
Friend: "So, i heard you did time?"
Me: Yeah, that's right.
Friend: What did you get booked under?
Me: Well (ahem, ahem), the 'Prevention of unjustified use of muscle power act'
Friend: Wow. Pretty Cool. Say, my sister wanted to meet up with you one of these days.

Anyhow, this 'Goonda act' is a wide-span act. Covers everything from land grabbing, prostitution rackets, narcotic trade and illegal lotteries. AND here's the news:

Though land grabbing, prostitution rackets, narcotic trade and illegal lotteries are flourishing in the city, the Bangalore police have not invoked the stringent Goonda Act against those indulging in these activities.

Obviously. Narcotics, land grabbing are pretty lame as compared to video piracy.

Police Inspector : What's he here for?
Constable : He forcibly evicted these people from their land and took it over.
Police Inspector : Oh, well, ok. And what the hell were these people doing when their land was being grabbed?
Constable : They were watching a pirated version of Mithun Chakravarti's 'Gunda'
Inspector : What!!! Shameless idiots. Put them behind bars. Forget their land, they don't deserve freedom. And beat them up so they'll never watch any of Mithun's movies anymore in their lifetime.
Constable: And this guy who grabbed land?
Inspector: Aww, let him go...we have to deal with these pirated video guys first.

[Note: Do watch Mithun-da's Gunda if you get a chance. Your life will change, guaranteed. For a flavour, go to the link i've indicated above. Phenomenal movie.]

For a hilarious article on the term Goonda see here.

Of course, the other brilliant announcement was to rename beloved Bangalore to Bengaluru. All this reminds me of what Claude Arpi once wrote :

One day an Indian friend of mine was visiting Israel. His guest asked him: 'How does India work?' My friend was a bit surprised by the question, but before he could answer, his Israeli colleague told him: 'Here we work with our guts.' My friend's answer came at once: "In India, it is the Grace which sustains us".

I quite agree.

P.S - i am not suggesting at all that people do illegal download/distribution of movies but...goonda?