| manufacturer | model | displ | year | cyl | mpg | fl | class | trans |
|---|---|---|---|---|---|---|---|---|
| chevrolet | malibu | 2.4 | 2008 | 4 | 22 | r | midsize | auto |
| volkswagen | jetta | 2.0 | 2008 | 4 | 22 | p | compact | auto |
| hyundai | tiburon | 2.7 | 2008 | 6 | 16 | r | subcompact | manual |
| hyundai | tiburon | 2.7 | 2008 | 6 | 17 | r | subcompact | manual |
| honda | civic | 1.6 | 1999 | 4 | 28 | r | subcompact | manual |
| nissan | maxima | 3.0 | 1999 | 6 | 18 | r | midsize | auto |
| toyota | camry | 2.4 | 2008 | 4 | 21 | r | midsize | auto |
| honda | civic | 1.6 | 1999 | 4 | 24 | r | subcompact | auto |
| pontiac | grand prix | 3.1 | 1999 | 6 | 18 | r | midsize | auto |
| toyota | corolla | 1.8 | 2008 | 4 | 26 | r | compact | auto |
Introduction
This post compares EPA city miles per gallon (mpg) across manufacturers using linear mixed‑effects models. The goal is to assess whether manufacturers differ in city mpg after accounting for vehicle attributes such as engine displacement, transmission type, model, and year. The intended audience is a general technical reader with basic familiarity with regression models.
Data
The data come from the mpg dataset (GGplot2 Development Team, n.d.). We focus on front‑wheel‑drive petrol cars in the compact, midsize, and subcompact classes to keep the comparison balanced. Because models were selected based on having new releases across 1999–2008 (used as a proxy for popularity), manufacturers appear in the sample according to availability rather than a strict experimental design. This affects whether manufacturers are best treated as random or fixed effects.
On one hand, manufacturers could be treated as random effects because they were not experimentally selected; on the other hand, we are interested in manufacturer‑specific differences, so treating manufacturers as fixed effects would be useful if each manufacturer is well represented. After filtering the data some manufacturers are missing or sparsely represented, so the scope of inference is limited — we can test whether manufacturers differ, but precise fixed‑effect estimates for every manufacturer would be unreliable.
Models are treated as random and are nested inside manufacturers because the sample contains multiple models per manufacturer and the particular models present are effectively a random sample from all possible models.
Because of the way the data is collected, (selection of models which had a new release every year between 1999 and 2008 - this was used as a proxy for the popularity of the car), we need to be careful about how manufacturers are treated in the analysis. On one hand it seems only fair to treat manufacturers as also being random, since these were not specifically selected in the design of experiment (like car model these appear randomly in the data based on popularity criteria) (Faraway 2016). On the other hand, we are interested in the actual differences between manufacturers, so treating them as fixed effects is desirable.
However, to warrant the use of fixed effects for manufacturers, we need to have sufficient data for each manufacturer and post filtering, the data should have all the manufacturers of interest. As can be seen this is not the case.

It can be seen that several key manufacturers are missing from the final dataset, consequently we may only be able to answer question like Is the mpg among manufacturers significantly different ? as opposed to What is the average mpg of manufactuers to be able to compare them more directly
Of course, the car models are random in the study design, so we will treat them as such. Although these are nested inside manufacturers.
EDA
Data summary is as follows -
| total_rows | total_manufacturers | total_models | min_mpg | max_mpg | mean_mpg | total_years |
|---|---|---|---|---|---|---|
| 87 | 8 | 15 | 16 | 28 | 20 | 2 |
| manufacturer | model | counts | mean_mpg |
|---|---|---|---|
| audi | a4 | 7 | 18.9 |
| chevrolet | malibu | 5 | 18.8 |
| honda | civic | 8 | 24.5 |
| hyundai | sonata | 7 | 19.0 |
| hyundai | tiburon | 7 | 18.3 |
| nissan | altima | 6 | 20.7 |
| nissan | maxima | 3 | 18.7 |
| pontiac | grand prix | 5 | 17.0 |
| toyota | camry | 7 | 19.9 |
| toyota | camry solara | 7 | 19.9 |
| toyota | corolla | 5 | 25.6 |
| volkswagen | gti | 5 | 20.0 |
| volkswagen | jetta | 6 | 19.3 |
| volkswagen | new beetle | 2 | 20.0 |
| volkswagen | passat | 7 | 18.6 |
There are (exactly) two years of data covering 15 models. We treat year as a categorical variable rather than continuous. A short sample for one model is shown below.
| manufacturer | model | displ | year | cyl | mpg | fl | class | trans |
|---|---|---|---|---|---|---|---|---|
| audi | a4 | 1.8 | 1999 | 4 | 18 | p | compact | auto |
| audi | a4 | 1.8 | 1999 | 4 | 21 | p | compact | manual |
| audi | a4 | 2.8 | 1999 | 6 | 16 | p | compact | auto |
| audi | a4 | 2.8 | 1999 | 6 | 18 | p | compact | manual |
| audi | a4 | 2.0 | 2008 | 4 | 20 | p | compact | manual |
| audi | a4 | 2.0 | 2008 | 4 | 21 | p | compact | auto |
| audi | a4 | 3.1 | 2008 | 6 | 18 | p | compact | auto |
We observe multiple rows per model/year because of different displacements and transmission types. The data only contain three distinct cylinder sizes (4, 6, 8), which is captured by displacement, so we remove cyl to simplify the analysis.

Fuel type (regular vs premium) is not central to this study and shows little correlation with city mpg, so we drop the fuel type column as well.

We visualize how mpg varies across manufacturers and by other attributes to get an initial sense of the patterns.

There does seem to be quite a lot variability in the city mpg among manufacturers.
Although, it is definitely possible that this is driven by different models and their attributes more than manufacturers themselves.
The plot below suggests (to intuition) that with larger displacement the mpg reduces. Although there is a hint of nonlinearity in the trend, we shall ignore this initially.
Midsize cars are slightly less efficient (lower mpg) than compact and subcompact cars. This is expected since these are heavier with higher displacement engines.
It is hard to get an accurate read on the effect of transmission and so we shall defer it to the model to figure out.

The mpg distribution has a right skew, so we take the logarithm of mpg to stabilize variance and to make regression coefficients interpretable as approximate percentage changes.

Analysis
Baseline model
We start with a simple linear model that includes manufacturer as a fixed effect along with displacement and year. This fixed‑effects model provides a baseline and helps identify which predictors explain the most variation in log(mpg). Transmission and class become less important after accounting for displacement and manufacturer, so we simplify the model accordingly.
Analysis of Variance Table
Response: log(mpg)
Df Sum Sq Mean Sq F value Pr(>F)
manufacturer 7 0.66676 0.09525 25.9752 < 2.2e-16 ***
displ 1 0.43577 0.43577 118.8345 < 2.2e-16 ***
class 2 0.01534 0.00767 2.0922 0.1307
trans 1 0.00590 0.00590 1.6102 0.2084
year 1 0.13722 0.13722 37.4189 4.133e-08 ***
Residuals 74 0.27136 0.00367
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
It can be seen that manufacturer and displ are highly significant predictors of log(mpg). As seen in the plot earlier, 1L increase in displacement can reduce MPG by ~15% ()
Transmission is not significant in the presence of other predictors as was the observation from plot earlier.
Class is not significant either, although this may be due to confounding with displacement as suspected before. A formal test of displacement being positively related with class for any given model also validates this finding.

Analysis of Variance Table
Response: displ
Df Sum Sq Mean Sq F value Pr(>F)
class 2 11.759 5.8796 27.0251 1.876e-09 ***
model 13 12.390 0.9531 4.3807 2.222e-05 ***
Residuals 71 15.447 0.2176
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Consequently we can simplify the model by removing trans and class, which results in a decent fit to the data and diagnostics are satisfactory.

Mixed effects model
Because models are nested within manufacturers and the specific models in the data are essentially a random sample, we fit mixed‑effects models that include random intercepts for manufacturer and for model nested within manufacturer. This accounts for the correlation between observations from the same model and between models from the same manufacturer.
Fixed Effects:
coef.est coef.se
(Intercept) 3.33 0.04
displ -0.15 0.01
year2008 0.08 0.01
Random Effects:
Groups Name Std.Dev.
manufacturer:model (Intercept) 0.05
manufacturer (Intercept) 0.06
Residual 0.05
---
number of obs: 87, groups: manufacturer:model, 15; manufacturer, 8
AIC = -199.6, DIC = -250.4
deviance = -231.0
It can be seen that variance due to manufacturer and models nested inside manufacturer are comparable in magnitudes. The variation in mpg is due to both which is sesnible outcome.
It can be tested if including model effects is meaningful using a likelihood ratio test. (Scheipl, Greven, and Kuechenhoff 2008)
simulated finite sample distribution of RLRT.
(p-value based on 10000 simulated values)
data:
RLRT = 13.167, p-value < 2.2e-16
So we reject the hypothesis that manufacturer:model nesting effect is not significant.
Additionally we can check confidence intervals for variance components. Manufacturer variance intervals do not contain 0.
Computing bootstrap confidence intervals ...
101 message(s): boundary (singular) fit: see help('isSingular')
| 2.5 % | 97.5 % | |
|---|---|---|
| .sig01 | 0.0061072 | 0.0746714 |
| .sig02 | 0.0000000 | 0.1061337 |
| .sigma | 0.0446244 | 0.0623480 |
| (Intercept) | 3.2466563 | 3.4197283 |
| displ | -0.1767928 | -0.1253388 |
| year2008 | 0.0587856 | 0.1048725 |
Diagnostics
Residual diagnostics suggest approximate normality and roughly constant variance. Random‑effect Q–Q plots indicate the manufacturer:model random effects are reasonably close to normal; manufacturer effects are also approximately normal.

We can also check assumption of normality of random effects. Which holds up reasonably well (particularly for manufacturer:model).

Conclusion
We set out to test whether manufacturers differ in city mpg after controlling for other attributes. Accounting for model‑level clustering, year and displacement, we find remaining manufacturer‑level variation in city mpg. In other words, even after controlling for model and engine size, manufacturers differ in typical city mpg.
The variability among manufacturers is of the order of +/- 6%, which translates to +/-1.2 mpg on average
Limitations: this analysis is limited to petrol, front‑wheel‑drive cars in the compact/midsize/subcompact classes. Results should not be generalized beyond these groups without further study. Results are also sensitive to the treatment of manufacturers as random effects due to the sampling design.