Two-Way Analysis of Variance with Excel

Two-Way Analysis of Variance (ANOVA) is a technique for studying the relationship between a quantitative dependent variable and two qualitative independent variables. Usually we are interested in whether the level of the dependent variable differs for different values of the qualitative variables. We will use as an example data from a student project reported in Stats: Data and Models (2nd ed.), by De Veaux, Velleman and Bock, Addison-Wesley, 2008, Chapter 29, Exercise 11. The student was interested in her success at basketball free throws. This study investigated whether there was any relationship between the quantitative variable "number of shots 'made' (i.e., successfully completed out of 50 tries)" and two qualitative variables "Time of Day" and "Shoes Worn". ANOVA is commonly used with experimental studies and that is the case here. You can find the data at our site as a plain text file, as an Excel spreadsheet.

The data look like as below:

  A B C
1 Time Shoes Made
2 Morning Others 25
3 Morning Others 26
4 Night Others 27
5 Night Others 27
6 Morning Favorite 32
7 Morning Favorite 22
8 Night Favorite 30
9 Night Favorite 34
10 Morning Others 35
11 Morning Others 34
12 Night Others 33
13 Night Others 30
14 Morning Favorite 33
15 Morning Favorite 37
16 Night Favorite 36
17 Night Favorite 38

We can compare the two times or the two shoes by looking at summary statistics or at parallel boxplots. Now write the data in following format.

  D E
1 Morning Night
2 25 27
3 26 27
4 32 30
5 22 34
6 35 33
7 34 30
8 33 36
9 37 38

To find descriptive statistics in Excel 2003, first select Tools > Data Analysis. In Analysis Tools select Descriptive Statistics and click OK. In Input Range select the range (for Morning: from cell D2 to D9 and for Night: from cell E2 to E9) for which you want to find descriptive statistics. In Output options tick on the circle of New Sheet and then tick on check box of Summary statistics and click OK. In 2007 first we have to go to Data, select Data Analysis then we have same procedure as in 2003. We will get the summary statistics as below,

	Morning					Night	
Mean			30.5		Mean			31.8750
Standard Error		1.9180		Standard Error		1.4322
Median			32.5		Median			31.5
Mode			#N/A		Mode			27
Standard Deviation	5.4248		Standard Deviation	4.0510
Sample Variance		29.4286		Sample Variance		16.4107
Kurtosis		-1.3629		Kurtosis		-1.2421
Skewness		-0.5297		Skewness		0.1894
Range			15		Range			11
Minimum			22		Minimum			27
Maximum			37		Maximum			38
Sum			244		Sum			255
Count			8		Count			8

It looks like she does better at night. But that could just be due to natural variability. We can check with ANOVA.

To perform Two-Way ANOVA we have to arrange the data in the following format.

  G H I
1   Others Favorite
2 Morning 25 32
3   26 22
4   35 33
5   34 37
6 Night 27 30
7   27 34
8   33 36
9   30 38

(In this format it is also easy to highlight all the "Favorite" data or all the "MOrning" data to get summary statistics for subgroups.) To find the Two-Way ANOVA, first select Tool > Data Analysis >. In Analysis Tools select ANOVA: Two-Factor With Replication. Select the range (select the above table from cell G1 to I9) for which you want to find Two-Way ANOVA. Write 4 in Rows per sample option, since we have four replication per treatment. In Output options tick on the circle of New Sheet and click OK. We get output as follows:

Anova: Two-Factor With Replication			

SUMMARY		Others		Favorite	Total

Count		4		4		8
Sum		120		124		244
Average		30		31		30.5
Variance	27.3333		40.6667		29.4286
Count		4		4		8
Sum		117		138		255
Average		29.25		34.5		31.875
Variance	8.25		11.6667		16.4107
Count		8		8				
Sum		237		262				
Average		29.6250		32.75				
Variance	15.4107		25.9286				

Source of Variation	SS	    df	    MS		F	   P-value	F crit
Sample			7.5625	    1	    7.5625	0.3441	   0.5684	4.7472
Columns			39.0625	    1	    39.0625	1.7773	   0.2072	4.7472
Interaction		18.0625	    1	    18.0625	0.8218	   0.3825	4.7472
Within			263.75	    12	    21.9792			
Total			328.4375    15				

("Columns" is the two shoes and "Sample" the two times of day.) Unfortunately the p-values for all variables are quite large, suggesting that any effect we saw could well have been due to chance. However, there is an alternative interpretation: with just 16 observations, we will only be able to detect a fairly large difference. It appeared the shoes made a difference of about 3 successes in 50 tries. If that is enough of a difference to matter in practice, we might repeat the experiment with more trials. Before we do that, though, we might make some displays to see if data of this sort matches the assumptions of ANOVA. No sense gathering more if it does not;-)


{short description of image}

{short description of image}

We can't expect perfection with only eight numbers in each group (How could eight numbers look bell-shaped?) but there are no signs here of serious skewness or outliers. All four groups have similar variabilities.

If we see signs the assumptions are not met then the remedies are similar to what they are in the univariate case. For example, outliers or bimodality must be investigated as to their cause. A transformation of the dependent variable may help just as it can in the univariate case. However, it is most likely to be effective if all the groups are skewed, and in the same direction, or if there is a systematic change in variability as group means increase.

© 2007 Robert W. Hayden