Sunday, July 18, 2010

Taking the Fear Out of Statistics

To revive the passion for a subject from my student days, I began teaching a course on statistics at a college in Northern California. The experience has taught me quite a bit about why students think that the subject is a refined form of torture. There are too many formulas; the concepts are hard to grasp for those whose facility with algebra and even basic mathematical operations are shaky. There were students in my class who were attempting to pass the course for the fourth and the fifth time. It was what was preventing them from graduating.

With this as background, I decided that the only way students were going to pass the course was if I could make statistics come alive for them, if somehow I could connect it to events from everyday life.

I was lucky. The gubernatorial race in California was heating up. Papers were full of election predictions. Major magazines had stories like "Anti-depressants don't work," with the proof coming from treatment and placebo groups. There was plenty of material that let me convince students that statistical literacy would not only help them become better citizens of a democracy, it would also help them with their careers, no matter what they chose to specialize in.

Once this psychological barrier was broken, suddenly the subject became relevant and even enjoyable!

Statistics had two major goals, I told students at the beginning of the semester: First, we must learn to draw meaning from data when all the data are known. That meant organizing, describing and summarizing data. Second, draw conclusion (inference) about the whole population when we have only sample data.

This put into perspective the syllabus for the course. Descriptive statistics included measures of central tendencies, variances and standard deviation. We then moved onto probability, the foundation of inferential statistics, and its application to medicine, insurance, economics, social and biological sciences and so on. This naturally led to detailed description of binomial and normal distributions and the famous bell curve.

For probability, I found it instructive to demonstrate the ideas with a quarter, a dice and a bell. I was able to take much of the fear out of the fearsome formula for normal distribution by actually "ringing" my bell and emphasizing that the formula simply described the symmetric shape of a bell.

As students learned to look up binomial probability tables, z scores and t-values, their confidence soared. They struggled at first to understand what the values and the scores actually meant but once they mastered it, they were able to solve some fairly complicated problems.

From there, I went to fundamental ideas of estimation and hypothesis testing. The Null Hypothesis, the p-value and the idea of what is "statistically significant" caused a lot of problems, particularly because of double negatives inherent in the concepts. I suspect this is where many statistics students are ready to throw in the towel. I persisted and eventually made some headway, but not before students telling me decisively that statistics has a strange way of testing whether a medicine works or not! I had to agree.

The final part was regression and correlation. Here, I had to use a full lecture reviewing algebra and the equation of a straight line. From there, predicting variables with the regression line became more straightforward than it would have been otherwise.

My best moment from this demanding course came at the end when students told me that had indeed developed an appreciation of statistics, that they would look at poll predictions with new and understanding eyes. Example: Candidate A is expected to get 60% of the votes with a margin of error of +- 4%.. "That implies that the confidence level is 95%," they told me. "Which means what?" I asked. "If pollsters had 100 simple random voter samples to work with, each sample consisting of the same number of voters, 95 of those samples would contain in the confidence interval the actual percentage of votes that candidate A would get. 5 of those samples would not. That's 95% confidence level."

The response certainly gave a boost to my confidence in teaching statistics!

No comments: