This article is part of Quantide’s web book “Raccoon – Statistical Models with R“. Raccoon is Quantide’s third web book after “Rabbit – Introduction to R” and “Ramarro – R for Developers“. See the full project here.

The second chapter of Racooon is focused on T-test and Anova. Through example it shows theory and R code of:

This post is the second section of the chapter, about 2-sample t-test and paired t. 

Throughout the web-book we will widely use the package qdata, containing about 80 datasets. You may find it here:

Example: Chicken hormones (2 -sample \(t\)-test)

Data description

To compare two growth hormones, 18 chicken were tested: 10 being assigned to hormone A and 8 to hormone B. The gains in weights (grams) over the period of experiment were measured.

Data loading

Let us load and have a look at the data


We begin by calculating some descriptive statistics:

And now we draw a boxplot and a stripchart (in this case in vertical) of the data:


Box-and-whiskers plot of gain by hormone


Stripchart with connect line of gain by hormone

The geom_point() function and geom_line() functions add the red points of two group averages to second graph and then link them with a segment.
group_by() and summarise() functions applies separately the mean() function to values of gain corresponding to hormone variable levels. In this case, if the segment is near to parallel to the x-axis, then the two means are probably equal. Anyway, both graphs and statistics show that hormone B tends to generate heavier chickens.

Inference and models

Let us check the normality of gain in the two groups through q-q plots


Normal probability plot of gain by hormone

From graph, no evidence of non-normality is evident. Now let us perform Anderson-Darling’s test for gain in the two groups, exploiting the function tapply:

The last line of above code, returns a 2-element list, with AD test for each sub-group. There is no evidence to reject the normality hypothesis within the two groups.
The next line of code tests the homoscedasticity assumption, that is if the variances within the two groups are equal:

All three previous tests accept \(H_0\), i.e., there is no evidence to say that the variability changes across levels.

Now the \(t\)-test compares the two group means.  

The null hypothesis is \(H_0: beta_1 = beta_2\), where \(beta_1\) and \(beta_2\) are the expectations of dependent variable for hormones A and B, respectively.

The \(t\)-test rejects the null hypothesis.

Now the same test may be conducted using a model approach.
The following lines of code perform the model fitting and print the main results.

The parameters shown are the model intercept and the “slope” coefficient for hormone B. The intercept in this case is the estimate of mean growth for A hormone, whereas the slope in this case is the estimate of difference between the mean growth obtained with hormon B and the mean growth obtained with hormone A.
In other words, the mean growth estimate for hormoneA is 619.1, while the mean growth estimate for hormoneB is 619.1 + 203.5 = 822.6. We will return to parameterization issue later.

Using summary the significancy of parameters is also shown:

The test results are equal to the ones of \(t\)-test, and the \(t\)-test value is equal to minus \(t\)-test value of hormoneB coefficient.

Why these results? “Behind the scene”, the model has been built using the following **model matrix** (\(underline{X}\))

What this matrix means?

For modeling purposes, the independent variable hormone has been “split” in two dummy variables: hormoneA and hormoneB:

  • hormoneA is 1 when data contain measures relative to hormone A, and is 0 when data contain measures relative to hormone B
  • hormoneB is 1 when data contain measures relative to hormone B, and is 0 when data contain measures relative to hormone A

Obviously, when hormoneA is 0, hormoneB is 1 and vice versa. Then, one dummy variable (in this case, hormoneA), can be removed from the model, because it does not add any useful information, and makes the model mathematically non-manageable. Consequently, the model matrix will contains only the “all 1s” intercept column (that refers to the case when hormone A is used) and the “displacement” hormoneBcolumn which represents the constant added to mean growth when using hormone B instead of hormone A.

In other words, the model actually estimated is the following:

\( growth_i = beta_0 + beta_1 cdot delta_B(hormone_i) + varepsilon_i \)

Where \(delta_B(hormone_i)\) represents the function that returns 1 only when the hormone of \(i\)-th unit is B, otherwise, returns 0.  

\(beta_0\) in this manner is the value of mean growth when using hormone A (i.e., when \(delta_B(hormone_i)==0\)), while \(beta_1\) is the “jump” in mean of growth when using hormone B instead of hormone A.

Another approach available to manage such type of test is through the analysis of variance model, i.e. by using the aov() function, that directly performs one analysis of variance:

Printing a summary method for analysis of variance objects reflects the different approach to what is essentially the same information as returned by lm(). While a printed linear-model object shows individual coefficients, printed aov objects show terms, which may correspond to several coefficients. However, methods for linear models can always be applied explicitly on aov objects, such as those for the coefficients or residuals.

Above output shows that hormone, with all its levels, is globally significant (p=0.012) in explaining the dependent variable variability.

In this case, since the independent variable (factor) contains two levels only, the resulting p-value for hormone obtained from aov is identical to the p-value of hormone B obtained using lm() (actually, in this example they are the same test).

If the independent variable (factor) contains more than two levels, then aov() returns the global significancy (not the “by level significancy”) of independent variable effect on dependent variable.

A few more concise code to obtain the same results as above is

Next lines of code show that actually aov objects are lm objects with also analysis of variance results added.

The following commands are useful to read/define the type of contrasts (i.e., the type of coding of regressors matrix \(underline{X}\)) used when unordered or ordered factors are used.

The first line of code simply shows the current predefined type of contrasts for unordered and ordered factors, while the second line of code sets the predefined contrasts.

To extract coefficients from an already calculated (lm or aov) model, one must use the coefficients method:

Next line of code shows that the hormoneB coefficient is simply the difference between average growths for hormoneA and hormoneB chickens.

R can use several types of contrasts: contr.treatment sets to zero the coefficient of first factor level, that becomes the intercept, and consequently all the means of other levels are relative to the first level. Another option is contr.sum, that sets to zero the sum of coefficients, and then the reference value of coefficients becomes the gran mean, calculated on all factor levels. R uses contr.treatment as default.

Let’s try to change type of contrasts:

The aov() results are the same:

but, on the whole, the results are changed:

The values of the coefficients are not the same because with contr.treatment the intercept represents the mean in the first group (hormone A) and the “slope” represents the difference between the mean in the second group (hormone B) and the mean in the first group; conversely, with contr.sum the intercept represents the grand mean, and the slope represents the difference between the mean in the first group and the grand mean.
The differences in coefficients are perhaps more perceptible looking at the model matrix:

Now, firstly we will restore predefined contrasts

and then we will fit a model without intercept to data. In general, removing the intercept is a discouraged practice, but sometimes no intercept models can help to enhance coefficients meaning:

The above coefficients represent, respectively, the mean in the group hormoneA and in the group hormoneB.

The model matrix can be useful to understand the meaning of the coefficients:

Next code shows what can happen when one uses a no-intercept model and does not address for that in analysis of variance:

The significancy of factor is much more strong than in previous calculations, but this is wrong, because this result actually compares the model with 0 intercept with the model that uses one mean for hormoneAand one mean for hormoneB.

Using the following code gives results equivalent to t-test:

anova() performs an analysis of variance to compare two nested models (see the paragraph on Comparing two nested models in Some Theory on Linear Models chapter).
In this case we are comparing a model with the group variable hormone against the model with grand mean only.
The result is the same as seen individually in lm() and aov() outputs.

Model diagnostics

Since the assumpions on linear models, the residuals should arise from a gaussian distribution; then, a Normal Probability Plot of residuals will be plotted to check this assumption.


Overlaid Normal probability plots of residuals

The two colors correspond to two groups.
Residuals seem gaussian, irrespective of group.

The Anderson-Darling (AD) test confirms that:

This confirms that the analysis and its results may be considered correct.

Example: Park assistant (paired \(t\))

Data description

In an experiment to investigate if an optional equipment of cars may help to park the cars themselves, 20 drivers were asked to park the car with and without the equipment. The parking time (secs) for each driver and equipment is recorded.

Data loading


A simple boxplot:


Box-and-whiskers plot of parking times for the two cars

A stripchart with mean points and mean line:


Stripchart with connect line of parking times for the two cars

And now the \(t\)-test.

Before testing the means, a check for normality is made by using qq-plot and Anderson-Darling test


Normal probability plot of parking times of two cars

There is not evidence of non-normality within groups, and the AD tests confirm above results:

and finally the two independent samples \(t\)-test:

Following this result, the null hypothesis is not rejected (at a 5% level).

But if we produce a scatterplot of Car_A Vs Car_B,


Scattrplot of parking times of Car_A Vs. Car_B

we can see that data of two cars are clearly NOT independent!!
Measures on Car_A are strongly related to measures on Car_B, because each driver drove either cars.

Then the correct test for such as data is:

or, equivalently, a paired \(t\)-test(x=carctl\(Car_A, y=carctl\)Car_B, paired=TRUE)

In either cases the results are the same: a difference statistically significant exists between mean parking times of two cars.

In this case the Normality test must be performed on differences between measures within same driver.

Check of normality by Normal probability plot:


Normal probability plot of difference of parking times of two cars

And the Anderson-Darling test:

For either tests, the Normality assumption is not rejected.

Now we try the model approach to the same test. The model approach in this case follows the same procedure seen for the one-sample t-test, but using the differences between times as dependent variable.

Fitting of model:

or, equivalently one may use also:

q-q plot (or Normal probability plot) of residuals shall be identical to previous one.

The Anderson-Darling test too is identical to previuosly made test:


Note that for repeated measures experiments, several analytical options are also available: for example through the Error() function in formula specification, or using mixed-model methods (lme4 or nlmepackages are the most used and known).
In this manual we won’t deep these arguments, because they would require a distinct section for each of them.