Logistic Regression in R

My dog pants a lot. Don’t judge him. He’s brachycephalic (shorter skull/muzzle than other dogs), so he has issues thermoregulating. But he doesn’t always pant. Let me illustrate.

Panting!

img_3121

Not panting! And a bonus skeptical look (“really? another post about me?”).

img_3518

I’ve noticed that his panting seems to be related to temperature (shocking, I know). But it might be useful to know exactly when he may or may not be panting (and thus on the verge of overheating). Especially if I’m debating some exercise for the both of us. What if I constructed a model for this? So naturally, I collected some data and used logistic regression (or binomial modeling) to put a temperature on this panting situation.

Here’s the data.

temp<-c(10,13.2,12.1,15,17,18,15,14.9,16.0,19,21,22,24.5,28,27,20.1,25.6,27.2,29,28.2)
dog.panting<-c(0,0,0,0,0,1,0,1,0,0,1,1,1,1,1,1,0,1,1,1)
data<-data.frame(temp, dog.panting)

temp = temperature degrees C
dog.panting = binary (1 = panting, 0 = not panting)

Always plot your data first!


plot(dog.panting~temp)

dog-panting-temperature

Ok, now we can construct a basic model using “glm”.

m1<-glm(data=data,dog.panting~temp, family="binomial")
summary(m1)

Here’s an excerpt of the summary output.

Call: glm(formula = dog.panting ~ temp, family = “binomial”, data = data)

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept)  -6.3202     2.7210  -2.323   0.0202 *
temp          0.3345     0.1409   2.373   0.0176 *

Null deviance: 27.526  on 19  degrees of freedom
Residual deviance: 17.201  on 18  degrees of freedom

This suggests some evidence for temperature in predicting panting (surprise!). But we should probably validate the model first. Overdispersion (greater variance than predicted by the model) is probably the first thing you would check for glms. However, given a Bernoulli glm (response variable is a vector with zeros and ones), overdispersion cannot occur (McCullagh and Nelder 1989). You could also look at predicted values versus residuals.

plot(m1)

glm-diagnosticsFor a gaussian linear model, you’d normally (heheh, get it, normally?) look for no pattern in residuals. But it’s a bit tricky for binomial glms.
Zuur et al (2009) are quite comforting when they say:

The graphical model validation in a Binomial GLM with a 0-1 response variable is some sort of an art…Because the observed data are zeros and ones, we can now see two clear bands in these graphs. This makes it rather difficult to say anything sensible about these graphs, and one can wonder whether there is any point in using them.”

Thanks Zuur et al! There are some other diagnostics to use (e.g. binning predictors and summarizing residuals, ROC plots), but there is no clear consensus. So we’ll leave that issue for a future post.

One bonus of binomial glms is that you can express the effect size of a predictor using the odds ratio. Wait, that’s odd. What are the odds again? Odds are the ratios of probabilities. The probability of occurrence (panting) over non-occurrence (not panting). Check this out and come back. I’ll wait.

Ok, good?

Good!

exp(coef(m1))

Here, I’ve exponentiated the coefficients from the model. Which returns this:
(Intercept)        temp
0.001799644 1.397234438

So for every increase in 1 degree Celsius, the odds of Wilson panting increase by a factor of 1.4!

We might wish to plot the model predictions on top of the data. ggplot2 has some functionality here.

library(ggplot2)
ggplot(data,aes(x=temp,y=dog.panting))+geom_point()+theme_bw()+
  labs(x="Temperature (C)", y="Dog Panting")+
  geom_smooth(method="glm", method.args=list(family="binomial"))

wilson-panting-with-smootherClearly I’ve got to collect some more data. But dare I say that this is enough data for an NSF???

For more complicated models (binomial mixed models), you may have to construct the predictions from scratch. We’ll do that next time!

Thanks again for the data Wilson!

img_3365

If you’re smitten with Wilson, check out this script modeling his logistic growth!

Literature Cited

McCullagh P and J Nelder. 1989. Generalized Linear Models. Second edition. Chapman and Hall, London, UK.

Zuur, AF, EN Ieno, NJ Walker, AA Saveliev and GM Smith. 2009. Mixed Effects Models and Extensions in Ecology with R. Springer, New York, New York, USA.

One response to “Logistic Regression in R

  1. Pingback: Modeling Logistic Growth Data in R | brian s. cheng·

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s