Friday, January 28, 2011

Myths and Misconceptions in Survey Sampling - 2 (SRS is the "safest" sampling scheme)

In the first of this series, we looked at what SRS is. It certainly is the simplest sampling scheme. But not the safest.

Think of a population of 100 people made up of 50 girls and 50 boys. Your task is to sample 10 persons. You go ahead and take a SRS sample expecting that you'll anyway get a "representative sample" expecting 5 girls and 5 boys. But do not be very surprised if you get 3 girls and 7 boys. Why? Chance. For the same reason that you do not always get one head and one tail exactly when you toss a coin twice. Best way to understand this is to conduct a small simulation in R:

#create a population of boys, girls and total  
pb <-  rep("B", 50)                             
pg <- rep("G", 50)                           
pt <-  c(pg, pb)
#create an empty list to store results of sampling
s1 <-- rep(list(rep("NA",10)),1000) 
#take 1000 samples of 10 persons each (with replacement)
for (i in 1:1000) {
     s1[[i]] <- sample(pt, 10)
#calculate % of boys selected in each sample of 10 and plot results  
s2 <- sapply(s1, function(x) length(x[x=="B"])*10)
Created by Pretty R at

Of course, practically we would do this exercise just once and not 1000 times. But the above histogram illustrates that there is a chance (though small) that you can end up having even 2 boys in a sample of 10 individuals. A better approach in this case is to stratify the population into girls and boys and choose 5 out of 50 from each.
Bottomline: SRS is not the safest sampling scheme. Stratification is an insurance against chance.

Yet, there are times when you might need to use SRS:
1) You have nothing but a list of elements but no extra information on them. e.g drawing a sample from a list of voters in an area
2) When being correct might weaken your case (such is life). Sharon Lohr in her book gives the example of a legal case where a complicated sampling scheme might seem like "you are making the number up".

Wednesday, January 12, 2011

Myths and Misconceptions in Survey Sampling - 1 (SRS , EPSEM and Self-weighting samples)

1. Misconception 1: SRS , EPSEM and Self-weighting samples are interchangeable terms
Many use these terms interchangeably in conversation or writing. But, for example:
"Similarly we must avoid the common confusion of epsem with simple random sampling (srs). Probably most  survey samples are epsem but very few are srs (outside academic writing)

In fact, even Wikipedia makes that mistake!
"A self-weighting sample, also known as an EPSEM (Equal Probability of Selection Method) sample...

Let's see why these terms are not interchangeable.
1) Simple Random Sampling (SRS)
The basic form of sampling similar to a draw of balls from an urn (aren't you tired of urn examples?).
"Simple Random Sampling is a method of selecting n units out of the N such that even one of the NCn distinct samples has an equal chance of being drawn. In practice a simple random sample is drawn unit by unit.

So, each population frame element has an equal chance of being selected irrespective of how many at a time the sampling is done - individually, pair-wise, three-at-a-time etc. Also,
"We shall restrict the term simple random sampling to situations where the elements are selected individually, hence the elements are also the sampling units. This differs from cluster sampling where the sampling units are clusters containing several elements.

"Sample Designs assigning equal probabilities to all individual elements of a frame population are called "epsem" for Equal Probability Selection Method
1) EPSEM is not one specific sampling method but consists of many types as long as they all result in (known) equal selection probabilities
2) SRS is a type of EPSEM but every EPSEM need not be (and usually is not) an SRS. For example,  Probability Proportionate to Size (PPS) is an EPSEM design that is a type of cluster sampling (see above quote from Kish) i.e not SRS.

3) Self-weighting samples:
We collect data from samples not as an end in itself but to learn something about the population from which the samples were drawn. In this process, a weight is attached to each sample element. Why? To correct any possible imbalances that might (will) crop up in the process of implementing the design. For example, say you draw a simple random sample from a list of people. At this stage every person has an equal chance of being selected and thus the same weight.  

Now if each person that was contacted responded to the survey and there were no other survey problems (like the filled-in questionnaire getting lost!), each person would still have the same weight, which would actually be the inverse of his selection probability) . This kind of a sample would be called a self-weighting sample.  

But in practice, you find that the response rate to your survey among the upscale people is lower than the rest and you've ended up getting more responses from the non-upscale folk. This would make your results biased. To compensate for this we would calculate an apply a weight that will upweight an upscale person (and therefore downweight the rest). So what started off as an EPSEM design is no longer a self-weighting sample. In fact, this is actually what ones finds in practice and so:
"Others have used the phrase self-weighting sample, although some eschew this term, given that weighting typically involves nonresponse adjustment and some form of calibration such as ratio adjustment or raking, and these lead to unequal weights even when all elements of the sample have been selected with equal probability.
 Encyclopedia of Survey Research Methods, Ed. Paul Lavrakas. The entry "EQUAL PROBABILITY OF SELECTION"

1. SRS is an EPSEM method but not every EPSEM is SRS
2. Every SRS is self-weighting (in principle) but every self-weighting sample need not be SRS
3. EPSEM samples are self-weighting in principle but not in practice

Tuesday, January 11, 2011

This is our culture - Murder a classic!

I actually had tears in my eyes when i saw this video. After watching this video (not the entire length for God's sake but just the first 30 seconds) you may well ask why? "Isn't this a regular movie sequence? Idiotic may be but tears in your eyes? C'mon, you're overdoing it!".

Here's the reason: The song(kirtana) is originally a divine soul-touching classical composition. Please read the entire article about the kirtana's saint-composer, Sri Bhadrachala Ramadasu at the wiki-link here.

Here are the lyrics:
O, Rama! Is it that a word from your mouth is so precious and rare?
Why don’t you respond when I call you? I who have never forgotten your thoughts even in my dreams.
The kriti evokes devotional thoughts in anyone who has at least a tinge of sattva. How could they even do this? Don't they have a sense of shame? A sense of heritage, culture, pride, dignity? The scary thing for me is that i feel the overwhelming majority of people will end up liking the 'song' in this fashion. Now we know why the country is in this state.

Now after all this if you are wondering what the original is like, please hear the below video. Note that this is not really classically rendered (i for one find some elements added rather corny) but i've included it to demonstrate that while one may take some liberty, at least preserve the basic ethos of the song (this is difficult to define but i think easy to know).

And this is one way it is traditionally rendered. I've taken this video since it shows you the kind of absorption and involvement that goes into the kirtana unlike that crass video right at the top.

Sunday, January 09, 2011

Truck Wisdom

This is the guy who coined "With friends like these who needs enemies". I have a sneaky feeling he has a lot of friends. All locked in there.