Regression is used to predict an unmeasurable (dependent) variable or variables from a measurable (independent) variable or variables. The regression equation is* any* function of the independent variables that leads to a predicted value for the dependent variable. The functions are kept as simple as possible by the principle of *Occam's Razor - *if adding the term doesn't improve the prediction well enough then it is not added. Choosing functions requires experience and practical knowledge about the variables involved. No regression equation is perfect, but we can get sensibly close. In George Box's words 'all models are wrong, but some are useful' (1987).

A real-world example could be the effect number of cigarettes smoked has on life expectancy. A recent study in the bmj suggested that smoking one extra cigarette reduces life expectancy by 11 minutes. The calculation was made using the original regression model worked out by Richard Doll et al (1994) using data from 34,000 male doctors (doctors are usually well-off, so theoretically they should be healthy!). He worked out what the life expectancy could be predicted to be for smokers compared to non-smokers by fitting a regression model to the observed data - how long did they live and did they smoke?

Of course, the time isn't the only concern, because it could just be a choice of spending 11 minutes smoking a cigarette or spending it doing something else more productive, but it is a way to get people thinking about the bad effects of smoking and the power one cigarette has over your life.

## See eNotes Ad-Free

Start your **48-hour free trial** to get access to more than 30,000 additional guides and more than 350,000 Homework Help questions answered by our experts.

Already a member? Log in here.