One-Way Analysis of Variance with Excel

One-Way Analysis of Variance (ANOVA) is a technique for studying the relationship between a quantitative dependent variable and a single qualitative independent variable. Usually we are interested in whether the level of the dependent variable differs for different values of the qualitative variable. We will use as an example real data from a study reported in 1935 by B. Lowe of the Iowa Agricultural Experiment Station.* Perhaps this originated at coffee break one morning. Donuts are traditionally a fried food and as such absorb some of the fat they are fried in. The amount and type of fat absorbed has implications for the healthfulness of the donuts. This study investigated whether there was any relationship between the quantitative variable "amount of fat absorbed" and a qualitative variable "type of fat". (Unfortunately we do not know just what the fats were. You could think of them as corn oil, soybean oil, lard, and Quaker State.) You can find the data at our site as a plain text file and as an Excel spreadsheet.

ANOVA is commonly used with experimental studies and that is the case here. The experiment consists of frying some donuts in each of four fats. Twenty-four batches of donuts were prepared and six randomly assigned to each of the four fats. The results, in grams of fat absorbed for each batch, and as they might commonly be laid out on a page were:

Fat1 Fat2 Fat3 Fat4
164 178 175 155
172 191 193 166
168 197 178 149
177 182 171 164
156 185 163 170
195 177 176 168

We can compare the four fats by performing an analysis of variance or by constructing four parallel boxplots.

To construct boxplots for the four types of fats we need to first create the following table:

  Fat1 Fat2 Fat3 Fat4
Q1 165 179 172 157.25
Minimum 156 177 163 149
Q2 170 183.5 175.5 165
Maximum 195 197 193 170
Q3 175.75 189.5 177.5 167.5

To create boxplots, select Chart Wizard, then in chart type select line, click on Next, choose Data Range as the above entire table, click on Rows, finally press Finish. Next right click on each line of the graph, select format series and choose none option for line. Also clear the gridlines by rightclicking on one of the gridlines and click on clear. Next right click on any point on the graph, choose format data series, then select options, click on high-low lines,and Up-down bars to get the following (simplified) boxplots:

{short description of image}

It certainly looks like more of Fat 2 gets absorbed while Fat 4 seems least absorbed. But wait a minute! If we repeat the experiment we would most likely get different numbers. Could this change the rankings of the fats? Is it possible that all four fats are absorbed to about the same degree and we are just seeing random fluctuations from one assignment of batches to fats to another? To see if that is likely we do a hypothesis test. The null as usual is backwards: we hypothesize no difference among the fats. As always, the null provides a specific model with which we can play "what if". If the null were true, would such differences be ordinary or extra ordinary?

Now we need to perform one way ANOVA using Excel. Select Tools, then Data Analysis, choose Anova:Single Factor, press ok, then select Input range as the whole table given on Excel spreadsheet excluding the column of rows, next tick mark on labels and then press ok to get the following output:

Anova: Single Factor						

SUMMARY						

Groups	Count	Sum	Average	Variance		
Fat1	6	1032	172	178		
Fat2	6	1110	185	60.4		
Fat3	6	1056	176	97.6		
Fat4	6	972	162	67.6		


ANOVA						

Source of Variation	SS	  df	MS	F	       P-value	      F crit
Between Groups		1636.5	  3	545.5	5.406342914    0.006875948    3.098392654
Within Groups		2018	  20	100.9			
						
Total			3654.5	  23				

The P-value of 0.0068 is for a test of the hypothesis which says that the mean amount of fat absorbed is the same for all four types of fat. Because it is so small, we reject the hypothesis of equal absorption.

Like any statistical test, this one is based on some assumptions. We will only mention the ones we can check with software. These are two: that the numbers for each fat are normally distributed and that they share a common variance. We can check these roughly from the boxplots. There we see roughly similar spreads and no serious departures from normality.


*Our source is Chapter 12 of Snedecor and Cochran, Statistical Methods (7th. ed.), 1980, Iowa State University Press, Ames, IA.

© 2008 statistics.com, portions © 2007 Robert W. Hayden and used by permission