One-Way Analysis of Variance (ANOVA) is a technique for studying the relationship between a quantitative dependent variable and a single qualitative independent variable. Usually we are interested in whether the level of the dependent variable differs for different values of the qualitative variable. We will use as an example real data from a study reported in 1935 by B. Lowe of the Iowa Agricultural Experiment Station.* Perhaps this originated at coffee break one morning. Donuts are traditionally a fried food and as such absorb some of the fat they are fried in. The amount and type of fat absorbed has implications for the healthfulness of the donuts. This study investigated whether there was any relationship between the quantitative variable "amount of fat absorbed" and a qualitative variable "type of fat". (Unfortunately we do not know just what the fats were. You could think of them as corn oil, soybean oil, lard, and Quaker State.) You can find the data at our site as a plain text file and as an Excel spreadsheet.

ANOVA is commonly used with experimental studies and that is the case here. The experiment consists of frying some donuts in each of four fats. Twenty-four batches of donuts were prepared and six randomly assigned to each of the four fats. The results, in grams of fat absorbed for each batch, and as they might commonly be laid out on a page were:

Fat1 | Fat2 | Fat3 | Fat4 |

164 | 178 | 175 | 155 |

172 | 191 | 193 | 166 |

168 | 197 | 178 | 149 |

177 | 182 | 171 | 164 |

156 | 185 | 163 | 170 |

195 | 177 | 176 | 168 |

We can compare the four fats by performing an analysis of variance or by constructing four parallel boxplots.

To construct boxplots for the four types of fats we need to first create the following table:

Fat1 | Fat2 | Fat3 | Fat4 | |

Q1 | 165 | 179 | 172 | 157.25 |

Minimum | 156 | 177 | 163 | 149 |

Q2 | 170 | 183.5 | 175.5 | 165 |

Maximum | 195 | 197 | 193 | 170 |

Q3 | 175.75 | 189.5 | 177.5 | 167.5 |

To create boxplots, select **Chart Wizard**, then in chart type select **line**, click on
**Next**, choose **Data Range** as the above entire table, click on **Rows**, finally
press **Finish**. Next right click on each line of the graph, select **format series** and choose **none** option for line.
Also clear the gridlines by rightclicking on one of the gridlines and click on **clear**. Next right click on any point on the graph,
choose **format data series**, then select **options**, click on
**high-low lines**,and **Up-down bars** to get the following (simplified) boxplots:

It certainly looks like more of Fat 2 gets absorbed while Fat 4 seems least absorbed. But wait a minute! If we repeat the experiment we would most likely get different numbers. Could this change the rankings of the fats? Is it possible that all four fats are absorbed to about the same degree and we are just seeing random fluctuations from one assignment of batches to fats to another? To see if that is likely we do a hypothesis test. The null as usual is backwards: we hypothesize no difference among the fats. As always, the null provides a specific model with which we can play "what if". If the null were true, would such differences be ordinary or extra ordinary?

Now we need to perform one way ANOVA using Excel.
Select **Tools**, then **Data Analysis**, choose **Anova:Single Factor**,
press **ok**, then select **Input range** as the whole table given on Excel
spreadsheet excluding the column of rows, next tick mark on **labels** and then
press **ok** to get the following output:

Anova: Single FactorSUMMARYGroups Count Sum Average Variance Fat1 6 1032 172 178 Fat2 6 1110 185 60.4 Fat3 6 1056 176 97.6 Fat4 6 972 162 67.6ANOVASource of Variation SS df MS F P-value F crit Between Groups 1636.5 3 545.5 5.406342914 0.006875948 3.098392654 Within Groups 2018 20 100.9 Total 3654.5 23

The P-value of **0.0068** is for a test of the hypothesis which says that the mean amount of fat absorbed
is the same for all four types of fat. Because it is so small, we reject the hypothesis of equal
absorption.

Like any statistical test, this one is based on some assumptions. We will only mention the ones we can check with software. These are two: that the numbers for each fat are normally distributed and that they share a common variance. We can check these roughly from the boxplots. There we see roughly similar spreads and no serious departures from normality.

*Our source is Chapter 12 of Snedecor and Cochran, *Statistical
Methods* (7th. ed.), 1980, Iowa State University Press, Ames, IA.

© 2008 statistics.com, portions © 2007 Robert W. Hayden and used by permission