Archive

Posts Tagged ‘Ethics’

Statistics – When science goes awry due to the lies of men

David J. Hand recently authored a short, concise book about statistics, aptly named Statistics, and in it he attempts to bring forth an argument that statistics is a fascinating and very applicable science.  I don’t argue with Hand in the slightest – I do find statistics very interesting, namely because I recognize its everyday uses.  It is interesting enough, and if you have time and are keen on this sort of subject matter, then by all means go ahead and pick yourself up a copy.

My personal interest in the book aside, I would like to focus on a line of text at the beginning of Chapter 1.  I’m sure most of us are familiar of the infamous Twain quote, “There are lies, damned lies, and statistics.”  The musings of Twain that led to these words implied that statistics can be twisted, turned, mutilated, cooked, and subjected to other forms of mutation to get them to say what you want them to say.  In short, many people have manipulated statistics to support a lie.

But perhaps we are less familiar with Frederick Mostellar, who once said “it is easy to lie with statistics, but easier to lie without them.”  Mostellar was one of the most recognized statisticians of the 20th century – he helped found the statistics department at Harvard, was president of numerous professional organizations dealing with statistics, and was possibly one of the most dedicated teachers of statistics in the United States.

Being such an expert in the field, of course Mostellar recognized that statistics could be manipulated to get them to say what one wants them say.  However, a statistic isn’t just a number we use to describe things; it is a representation of the world we live in.  Underneath the surface of percentages or coefficients of determination is an entire world, built upon solid science and mathematical certainties.  In this sense, statistics are a beautiful thing.  We can use them to understand an otherwise complex, seemingly chaotic world – condense it down to numbers and figures that help to explain our surroundings.

Of course, there are those who have been burned by statistics.  Most people, I’m sure, unknowingly.  But let us not blame the statistic.  After all, it’s hard to place blame on an inanimate representation such as a number.  It doesn’t mean us any harm.  The real fault should be placed on those people who use statistics for ill, those who manipulate the numbers for their own gain.  Statistics in their purity are not contradictory – it’s those who use them that sometimes are.  Of course, that can be applied to any discipline, not just statistics.

Undertaking research (and creating the subsequent statistics of findings) is certainly something not to be taken lightly.  It requires dedication, motivation, and a clear goal to obtain truth.  There will always be those who bend the numbers to either create a false truth or hide the real truth, but statistics should not be slighted or disregarded because of that.  What the science should provoke is conversation and understanding in an effort to come to rational conclusions on how to move forward.  As my own boss and mentor often says, statistics is perhaps one of the only sciences in which two different individuals studying them can come up with two different conclusions, and so long as they are not truly contradictory, both of them can be right.  And that is the quintessential model of decision-making, using grounded results and findings to move an initiative forward.

Maybe you don’t agree with Hand’s (and to some degree my own) proposition that statistics is fascinating and cool and the most exciting of disciplines.  But at the very least I hope you agree that statistics make the world around us better known, and thus more real.

If you would like to read Hand’s Statistics, here is the information:
Hand, David J.
Statistics, Sterling Publishing Co., New York, 2010

The Power of a Sample – Voodoo or Science?

May 18, 2011 3 comments

A recent study carried out by our company and The Civil Rights Project for Jefferson County Public Schools came under fire for a common misconception among those who don’t fully understand the power of random sampling. Without going into a long, drawn-out discussion of what the study entailed, the project aimed to gain an understanding of the Louisville community’s perceptions of the student assignment plan and the diversity goals it seeks to accomplish. Perhaps the methods would not have come under such scrutiny had the findings been less controversial, but regardless, the methods did indeed come under attack.

But if we take a moment to understand the science behind sampling methods, and realize that it is not voodoo magic, then I think the community can begin to focus on the real issues the study uncovered. To put it simply, sampling is indeed science. Without going into the theory of probablity and the numerous mathematical assesssments to test the validity of a sample, we can say that a random sample, so long as the laws of probablity and nature hold true, and some tear in the fabric of the universe has not occured, is certainly representative of any population it attempts to embody.

Let us first begin to understand why this is so. When I taught statistics and probability to undergrads during my days as an instructor, I found I needed to keep this explanation simple – not because my students lacked the intellengence to fully understand this, but more so because probablity theory can get a little sticky, and keeping the examples simple seemed to work best. Imagine we have a coin – a fair sided coin that is not weighted in any way (aside from a screw up from the Treasury, in which case your coin could be worth a bundle of cash). We all know this example. If you flip it, you have a 50-50 chance of getting a particular side of that coin. In essence, that is the law of probability (the simplest of many).

Random sampling is the same way. While there are various methods to go about sampling a population randomly, Simple Random Sampling is the easiest and most commonly used. To put it simply, each member of a population is assigned a unique value, and a random generator picks values within a defined range (say 1 to 1,000,000). Each member of that population has an equal chance of being selected. These chosen members become the lucky ones to be a true representation of a population. They are not “chosen” in the sense that they get to drink the Koolaid and ascend beyond, but they are chosen to speak on behalf of an entire population. Pretty cool, huh?!

These samples are representative because, well, probability tells that it is. I can spend pages and pages of your precious, valuable time discussing why this is the case, but that discussion will undoubtedly put you to sleep. However, this is why not every person in a population needs to be surveyed. And, it is a great cost conserving measure when you only have to sample, say, 500 people to respresent a much larger population. Here I can bore you again with monotonic relationships and exponential sampling benefits, but I will not do that. (You can thank me later).

Now for the real bang! Say you want to measure satisfaction of city services within a small city of 50,000 people. In order to have a representative sample, all you need is a sampling of 382 people (with a 5% margin of error). Now, say that you want to do the same study, only on the entire city of Louisville, with a population of nearly 1.5 million. What size sample do you think you need? Are you ready for this? The number is 385! Wow. Only 3 more randomly selected residents are needed for a population 30 times greater. The beauty of sampling, and wonders of monotonic relationships! More on that later. You can play around with all sorts of sample size calculators (or do it by long hand, if you dare). I suggest this site.

Of course, if you want a smaller margin of error (in essence, if you want to be more confident that your sample is truly accurate of your population), you need to a larger sample. But I’ll post a discussion on margins of error and confidence levels another day. I leave you now to ponder the brillance of statistics!!

Follow

Get every new post delivered to your Inbox.