Download the 3-way Anova cheat sheet in full resolution: 3-way Anova with R cheat sheet

This article is part of Quantide’s web book “Raccoon – Statistical Models with R“. Raccoon is Quantide’s third web book after “Rabbit – Introduction to R” and “Ramarro – R for Developers“. See the full project here.

The second chapter of Raccoon is focused on T-test and Anova. Through example it shows theory and R code of:

This post is the fourth section of the chapter, about 3-way Anova.

Throughout the web-book we will widely use the package qdata, containing about 80 datasets. You may find it here: https://github.com/quantide/qdata.

 

Example: Braking distances (3-way ANOVA)

Data description

Distance data contain measurements of braking distances on the same car equipped with several configurations of:

  • Tire: Factor with 3 levels GT, LS, MX
  • Tread: Factor with 2 levels 1.5, 10
  • ABS: Factor with levels disabled,enabled

For each combination of levels of the above three factors, 2 measurements of brake distance have been registered.

The objective of the experiment is to find which factor(s) influence the brake distance, and in which direction.

Data loading

Descriptives

Let us first get a graphical insight of the data. We want to graphically explore how the average braking distance changes given specific levels of the factors considered. Thus we first plot the univariate effects.

anova17

Plot of univariate factors effects on Distance response variable

The mean of the braking distance seems to change mainly with Tire type and ABS levels, while different levels of Tread seem to not influence the mean braking distance.

It may be interesting however to see whether different combinations of the levels of the factors differently affect the average braking distance. We could for example see whether a specific type of tire, combined with a specific level of tread affect the braking distance. Thus, let us plot the two-way interactions.

anova18

Plots of two-way interaction effects of factors on Distance response variable

The braking distance seems to decrease when ABS is enabled as compared to ABS disabled independently of the tire type (no interaction). However, there may be an interaction between Tread and ABS, that means that ABS might affect the braking distance differently based on whether it is combined with a specific type of tire.

So far we have commented on plots based on descriptive statistics only. In order to say whether the differences between the several means plotted are significant or not, we need to insert these factors in a model and run tests on each factor/interaction included in the model.

Inference and models of 3-way Anova

Let’s build a full model, that is the model with the three factors considered, the two-way interactions between all factors and the three-way interaction:

Notice that this is equivalent to writing:

Only main effects ABS and Tire seem to influence the mean of the braking distance (similarly to what we had guessed by looking at the plots!). Interactions seem not significant, meaning that those differences we had noticed in the two-way interactions plots were too small to be statistically significant. We can now start to drop the interactions from the three-way interaction.

In general we are interested in finding the most provident model which better explains the variability in the data. Once the full model has been fitted, we start by dropping the highest-level interactions because we “move” within the hierarchical models paradigm.

To obtain the model without three-way interaction, we can use update():

Since two ways interactions seem to not affect the mean braking distance, we could now try to remove all two-way interactions, always using update() function:

The result is not surprising, as it is very similar to that found with the previous models. It is better to check if the three removed effects together are still not significant using anova():

The combined effect of the three-way interactions is not significant, thus we can continue on with the model with the main effects only.
The tables of effects from model follow below

And now the table of means from model

Finally, we remove the Tread effect to obtain the final model with significant effects only:

Residual analysis of 3-way Anova

anova19

Compound residual plot for final brake distance model

Since the leverages are constant (as is typically the case in a balanced aov situation) the 4th plot draws factor level combinations instead of the leverages for the x-axis. (Notice that the factor levels are ordered by mean fitted value)

Distance means for each ABS and Tire category follow:

While the table of grand means from model can be found as follows:

Last we can get some predicted values from ANOVA model:

Question: predicted values are different from the above calculated sampling averages. Why?