Archive

Posts Tagged ‘Sampling’

“DEWEY DEFEATS TRUMAN” – A case study in trusting the untrustworthy

Last week, I posted a commentary on the dangers of trusting polls and research derived from samples of convenience.  In this, the infamous 1948 Dewey-Truman election was referenced, in which the Chicago Tribune headlined that New York Governor and Republican challenger Thomas E. Dewey defeated incumbent President Harry S. Truman.  The headlined letters were simple, and they were big:  “DEWEY DEFEATS TRUMAN.” 

Now, we all know that President Dewey went on to accomplish great and terrible things during his reign as commander and chief of the United States of America.  During his term, he helped establish NATO, fought the communist accusations and Red Scare of Senator McCarthy, sent troops to help with the Korean War, fired beloved war hero General MacArthur, and renovated the White House, thus ending his term with a dismal 22 percent approval rating.  Yes, President Dewey was indeed a controversial, both beloved and hated, president.  He is the talk of history classrooms throughout the nation!

Pardon the sarcasm.  In fact, Dewey never won the election.  Despite the Tribune’s headlines, Truman went on to win the electoral vote 303-189, and democrats regained the House and Senate.  After I posted this reference in last week’s post, I slowly began to realize, with the help of a fellow colleague, that perhaps not everyone remembers or knows about this infamous blunder.  And lest we forget, as history is forgotten, so it repeats itself.  So I am doing my part to help such disasters (though comedic as they are) from repeating themselves. 

Some have claimed that this blunder was a result of conservative bias within the Tribune, but what underlay this was more so trusting inaccurate exit polling and data sources.  Such controversies have occurred since (namely the Bush-Gore election of 2000), but whereas the media were blame for lax voting in these instances, this was a product of trusting untrustworthy data. 

A cautionary tale, to be sure, and one we can still learn from.  I can assure you that newspaper editors joke about this incident to others, but deep inside under locked doors they fear that their own paper may fall victim to such missteps.  Organizations and businesses should take heed as well, as trusting data that is not gathered accurately can lead to decisions that are not in the best interest of your organizations. 

And for posterity’s sake, let us once again be reminded of this infamous photograph (Truman taunting the media the day after his victory as he boards a train in St. Louis). 

Polling: A double-edged sword

May 26, 2011 3 comments

Let us pretend for a moment that we all understand the foundations of probability theory – because this is a necessity for the purposes of this post.  Even the most seasoned of researchers and statisticians cannot possibly fully grasp something as ethereal as probability.  This is because in a sense probability of occurrences is somewhat akin to gravity – we know it exists because it works.  So long as we don’t go spinning off into space, we know that gravity is indeed doing its job well enough.  Probability is the same way.  We know that if we flip a coin 1 million times, 500,000 of those times will be a heads up occurrence.  (Of course, if gravity were to fail then so would the laws of probability, because once we flip the coin into the air, it would float out into the great unknown reaches of space!) 

So, why am I saying this?  Surely it’s not because I have given up on trying to understand why I can do what I do as a researcher without question (though some still question it).  My previous post talked a bit about the power of random sampling.  Similar to gravity and coin flipping, we know that if we randomly choose people out of a particular population, then those people will truly be representative of that population. 

Which brings me to this post – a second in a series of the power of sampling, if you will.  Many times, businesses and organizations will throw a short survey up on their website for any “passerby” to take.  These are called polls, and usually consist of a few quick questions aimed at gathering a pulse of a certain group of people.  They have their uses, but they should never be confused with scientific research.  In order for survey research to be scientific, a sample must be collected at random.  Non-random sampling is indeed sampling, but leads to results that cannot be claimed as representative. 

Now, we are all familiar with political polling, and some of these polls are indeed scientifically gathered.  However, because of the changing nature of political attitudes, political polling often only is accurate in a particular point in time.  Non-random polling (appropriately referred to as convenience sampling), however, is only accurate of the people who participate in the poll to begin with.  One of the first things you’ll learn (or at least should) in any statistics course is that people who take the time to fill out a poll of convenience (what you typically find in pop up windows when you visit a website) are impassioned to do so.  In other words, they have had either great or terrible experiences with a particular item.  They rarely capture apathetic viewpoints – and let’s face it, most people are indifferent to most things. 

But some may argue: “What polls lack in representation, they certainly make up for in convenience.”  And when organizations are concerned about quick answers to their questions, then perhaps that argument makes sense.  But when scrutinized sufficiently, such an argument shatters as quickly as glass house when the ground starts shaking.  Yes, convenience sampling, by its very nature and name, is designed to give quick and cheap estimates.  However, when answers are trying to be forged from intricate questions, decisions should not be made from such unrepresentative findings.  (Hence the double-”edgedness” of polling.)   

Good research demands the appropriate and arduous steps to ensure that what you are basing decisions on, whether they be on how to bolster sales and tackle a new market or printing tomorrow’s news headline on who won the presidency (Dewey ring a bell? Just look to your right ), are accurate and representative.  Again, convenience polling and sampling have their purposes (umm…I guess), but they only tell one side of an infinite sided die.  Such is bliss, but randomness is science! 

What about you out there?  Have you stumbled across examples of poorly conducted research (namely from the perspective of sampling issues)?  We would like to hear some of your experiences – and they don’t have to be as mind-blowing and historically signficant as the Dewey-Truman headline.

The Power of a Sample – Voodoo or Science?

May 18, 2011 3 comments

A recent study carried out by our company and The Civil Rights Project for Jefferson County Public Schools came under fire for a common misconception among those who don’t fully understand the power of random sampling. Without going into a long, drawn-out discussion of what the study entailed, the project aimed to gain an understanding of the Louisville community’s perceptions of the student assignment plan and the diversity goals it seeks to accomplish. Perhaps the methods would not have come under such scrutiny had the findings been less controversial, but regardless, the methods did indeed come under attack.

But if we take a moment to understand the science behind sampling methods, and realize that it is not voodoo magic, then I think the community can begin to focus on the real issues the study uncovered. To put it simply, sampling is indeed science. Without going into the theory of probablity and the numerous mathematical assesssments to test the validity of a sample, we can say that a random sample, so long as the laws of probablity and nature hold true, and some tear in the fabric of the universe has not occured, is certainly representative of any population it attempts to embody.

Let us first begin to understand why this is so. When I taught statistics and probability to undergrads during my days as an instructor, I found I needed to keep this explanation simple – not because my students lacked the intellengence to fully understand this, but more so because probablity theory can get a little sticky, and keeping the examples simple seemed to work best. Imagine we have a coin – a fair sided coin that is not weighted in any way (aside from a screw up from the Treasury, in which case your coin could be worth a bundle of cash). We all know this example. If you flip it, you have a 50-50 chance of getting a particular side of that coin. In essence, that is the law of probability (the simplest of many).

Random sampling is the same way. While there are various methods to go about sampling a population randomly, Simple Random Sampling is the easiest and most commonly used. To put it simply, each member of a population is assigned a unique value, and a random generator picks values within a defined range (say 1 to 1,000,000). Each member of that population has an equal chance of being selected. These chosen members become the lucky ones to be a true representation of a population. They are not “chosen” in the sense that they get to drink the Koolaid and ascend beyond, but they are chosen to speak on behalf of an entire population. Pretty cool, huh?!

These samples are representative because, well, probability tells that it is. I can spend pages and pages of your precious, valuable time discussing why this is the case, but that discussion will undoubtedly put you to sleep. However, this is why not every person in a population needs to be surveyed. And, it is a great cost conserving measure when you only have to sample, say, 500 people to respresent a much larger population. Here I can bore you again with monotonic relationships and exponential sampling benefits, but I will not do that. (You can thank me later).

Now for the real bang! Say you want to measure satisfaction of city services within a small city of 50,000 people. In order to have a representative sample, all you need is a sampling of 382 people (with a 5% margin of error). Now, say that you want to do the same study, only on the entire city of Louisville, with a population of nearly 1.5 million. What size sample do you think you need? Are you ready for this? The number is 385! Wow. Only 3 more randomly selected residents are needed for a population 30 times greater. The beauty of sampling, and wonders of monotonic relationships! More on that later. You can play around with all sorts of sample size calculators (or do it by long hand, if you dare). I suggest this site.

Of course, if you want a smaller margin of error (in essence, if you want to be more confident that your sample is truly accurate of your population), you need to a larger sample. But I’ll post a discussion on margins of error and confidence levels another day. I leave you now to ponder the brillance of statistics!!

Follow

Get every new post delivered to your Inbox.