My dog pants a lot. Don’t judge him. He’s brachycephalic (shorter skull/muzzle than other dogs), so he has issues thermoregulating. But he doesn’t always pant. Let me illustrate.
Panting!
Not panting! And a bonus skeptical look (“really? another post about me?”).
I’ve noticed that his panting seems to be related to temperature (shocking, I know). But it might be useful to know exactly when he may or may not be panting (and thus on the verge of overheating). Especially if I’m debating some exercise for the both of us. What if I constructed a model for this? So naturally, I collected some data and used logistic regression (or binomial modeling) to put a temperature on this panting situation.
Here’s the data.
temp<-c(10,13.2,12.1,15,17,18,15,14.9,16.0,19,21,22,24.5,28,27,20.1,25.6,27.2,29,28.2) dog.panting<-c(0,0,0,0,0,1,0,1,0,0,1,1,1,1,1,1,0,1,1,1) data<-data.frame(temp, dog.panting)
temp = temperature degrees C
dog.panting = binary (1 = panting, 0 = not panting)
Always plot your data first!
plot(dog.panting~temp)
Ok, now we can construct a basic model using “glm”.
m1<-glm(data=data,dog.panting~temp, family="binomial") summary(m1)
Here’s an excerpt of the summary output.
Call: glm(formula = dog.panting ~ temp, family = “binomial”, data = data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.3202 2.7210 -2.323 0.0202 *
temp 0.3345 0.1409 2.373 0.0176 *
Null deviance: 27.526 on 19 degrees of freedom
Residual deviance: 17.201 on 18 degrees of freedom
This suggests some evidence for temperature in predicting panting (surprise!). But we should probably validate the model first. Overdispersion (greater variance than predicted by the model) is probably the first thing you would check for glms. However, given a Bernoulli glm (response variable is a vector with zeros and ones), overdispersion cannot occur (McCullagh and Nelder 1989). You could also look at predicted values versus residuals.
plot(m1)
For a gaussian linear model, you’d normally (heheh, get it, normally?) look for no pattern in residuals. But it’s a bit tricky for binomial glms.
Zuur et al (2009) are quite comforting when they say:
“The graphical model validation in a Binomial GLM with a 0-1 response variable is some sort of an art…Because the observed data are zeros and ones, we can now see two clear bands in these graphs. This makes it rather difficult to say anything sensible about these graphs, and one can wonder whether there is any point in using them.”
Thanks Zuur et al! There are some other diagnostics to use (e.g. binning predictors and summarizing residuals, ROC plots), but there is no clear consensus. So we’ll leave that issue for a future post.
One bonus of binomial glms is that you can express the effect size of a predictor using the odds ratio. Wait, that’s odd. What are the odds again? Odds are the ratios of probabilities. The probability of occurrence (panting) over non-occurrence (not panting). Check this out and come back. I’ll wait.
Ok, good?
Good!
exp(coef(m1))
Here, I’ve exponentiated the coefficients from the model. Which returns this:
(Intercept) temp
0.001799644 1.397234438
So for every increase in 1 degree Celsius, the odds of Wilson panting increase by a factor of 1.4!
We might wish to plot the model predictions on top of the data. ggplot2 has some functionality here.
library(ggplot2) ggplot(data,aes(x=temp,y=dog.panting))+geom_point()+theme_bw()+ labs(x="Temperature (C)", y="Dog Panting")+ geom_smooth(method="glm", method.args=list(family="binomial"))
Clearly I’ve got to collect some more data. But dare I say that this is enough data for an NSF???
For more complicated models (binomial mixed models), you may have to construct the predictions from scratch. We’ll do that next time!
Thanks again for the data Wilson!
If you’re smitten with Wilson, check out this script modeling his logistic growth!
Literature Cited
McCullagh P and J Nelder. 1989. Generalized Linear Models. Second edition. Chapman and Hall, London, UK.
Zuur, AF, EN Ieno, NJ Walker, AA Saveliev and GM Smith. 2009. Mixed Effects Models and Extensions in Ecology with R. Springer, New York, New York, USA.
Pingback: Modeling Logistic Growth Data in R | brian s. cheng·