Two-Way Analysis of Variance (ANOVA) is a technique for studying the relationship between a quantitative dependent variable and two qualitative independent variables. Usually we are interested in whether the level of the dependent variable differs for different values of the qualitative variables. We will use as an example data from a student project reported in Stats: Data and Models (2nd ed.), by De Veaux, Velleman and Bock, Addison-Wesley, 2008, Chapter 29, Exercise 11. The student was interested in her success at basketball free throws. This study investigated whether there was any relationship between the quantitative variable "number of shots 'made' (i.e., successfully completed out of 50 tries)" and two qualitative variables "Time of Day" and "Shoes Worn". ANOVA is commonly used with experimental studies and that is the case here. You can find the data at our site as a plain text file, as an Excel spreadsheet.
The data look like as below:A | B | C | |
1 | Time | Shoes | Made |
2 | Morning | Others | 25 |
3 | Morning | Others | 26 |
4 | Night | Others | 27 |
5 | Night | Others | 27 |
6 | Morning | Favorite | 32 |
7 | Morning | Favorite | 22 |
8 | Night | Favorite | 30 |
9 | Night | Favorite | 34 |
10 | Morning | Others | 35 |
11 | Morning | Others | 34 |
12 | Night | Others | 33 |
13 | Night | Others | 30 |
14 | Morning | Favorite | 33 |
15 | Morning | Favorite | 37 |
16 | Night | Favorite | 36 |
17 | Night | Favorite | 38 |
We can compare the two times or the two shoes by looking at summary statistics or at parallel boxplots. Now write the data in following format.
D | E | |
1 | Morning | Night |
2 | 25 | 27 |
3 | 26 | 27 |
4 | 32 | 30 |
5 | 22 | 34 |
6 | 35 | 33 |
7 | 34 | 30 |
8 | 33 | 36 |
9 | 37 | 38 |
To find descriptive statistics in Excel 2003, first select Tools > Data Analysis. In Analysis Tools select Descriptive Statistics and click OK. In Input Range select the range (for Morning: from cell D2 to D9 and for Night: from cell E2 to E9) for which you want to find descriptive statistics. In Output options tick on the circle of New Sheet and then tick on check box of Summary statistics and click OK. In 2007 first we have to go to Data, select Data Analysis then we have same procedure as in 2003. We will get the summary statistics as below,
Morning Night Mean 30.5 Mean 31.8750 Standard Error 1.9180 Standard Error 1.4322 Median 32.5 Median 31.5 Mode #N/A Mode 27 Standard Deviation 5.4248 Standard Deviation 4.0510 Sample Variance 29.4286 Sample Variance 16.4107 Kurtosis -1.3629 Kurtosis -1.2421 Skewness -0.5297 Skewness 0.1894 Range 15 Range 11 Minimum 22 Minimum 27 Maximum 37 Maximum 38 Sum 244 Sum 255 Count 8 Count 8
It looks like she does better at night. But that could just be due to natural variability. We can check with ANOVA.
To perform Two-Way ANOVA we have to arrange the data in the following format.
G | H | I | |
1 | Others | Favorite | |
2 | Morning | 25 | 32 |
3 | 26 | 22 | |
4 | 35 | 33 | |
5 | 34 | 37 | |
6 | Night | 27 | 30 |
7 | 27 | 34 | |
8 | 33 | 36 | |
9 | 30 | 38 |
(In this format it is also easy to highlight all the "Favorite" data or all the "MOrning" data to get summary statistics for subgroups.) To find the Two-Way ANOVA, first select Tool > Data Analysis >. In Analysis Tools select ANOVA: Two-Factor With Replication. Select the range (select the above table from cell G1 to I9) for which you want to find Two-Way ANOVA. Write 4 in Rows per sample option, since we have four replication per treatment. In Output options tick on the circle of New Sheet and click OK. We get output as follows:
Anova: Two-Factor With Replication SUMMARY Others Favorite Total Morning Count 4 4 8 Sum 120 124 244 Average 30 31 30.5 Variance 27.3333 40.6667 29.4286 Night Count 4 4 8 Sum 117 138 255 Average 29.25 34.5 31.875 Variance 8.25 11.6667 16.4107 Total Count 8 8 Sum 237 262 Average 29.6250 32.75 Variance 15.4107 25.9286 ANOVA Source of Variation SS df MS F P-value F crit Sample 7.5625 1 7.5625 0.3441 0.5684 4.7472 Columns 39.0625 1 39.0625 1.7773 0.2072 4.7472 Interaction 18.0625 1 18.0625 0.8218 0.3825 4.7472 Within 263.75 12 21.9792 Total 328.4375 15
("Columns" is the two shoes and "Sample" the two times of day.) Unfortunately the p-values for all variables are quite large, suggesting that any effect we saw could well have been due to chance. However, there is an alternative interpretation: with just 16 observations, we will only be able to detect a fairly large difference. It appeared the shoes made a difference of about 3 successes in 50 tries. If that is enough of a difference to matter in practice, we might repeat the experiment with more trials. Before we do that, though, we might make some displays to see if data of this sort matches the assumptions of ANOVA. No sense gathering more if it does not;-)
We can't expect perfection with only eight numbers in each group (How could eight numbers look bell-shaped?) but there are no signs here of serious skewness or outliers. All four groups have similar variabilities.
If we see signs the assumptions are not met then the remedies are similar to what they are in the univariate case. For example, outliers or bimodality must be investigated as to their cause. A transformation of the dependent variable may help just as it can in the univariate case. However, it is most likely to be effective if all the groups are skewed, and in the same direction, or if there is a systematic change in variability as group means increase.
© 2007 Robert W. Hayden