My dog rocks. Wilson is friendly to almost everyone (mailmen excepted) and he’s very soft. We’ve had him since he was a puppy and because the wife and I are dorky scientists, we’ve collected (non-invasive) data from him since day one. So today we’ll be modeling growth data, courtesy of Wilson, using R, the “nls” function, and the packages “car” and “ggplot2”. For reference, I drew on Fox and Weisburg (2010).
library("car"); library("ggplot2") #Here's the data mass<-c(6.25,10,20,23,26,27.6,29.8,31.6,37.2,41.2,48.7,54,54,63,66,72,72.2, 76,75) #Wilson's mass in pounds days.since.birth<-c(31,62,93,99,107,113,121,127,148,161,180,214,221,307, 452,482,923, 955,1308) #days since Wilson's birth data<-data.frame(mass,days.since.birth) #create the data frame plot(mass~days.since.birth, data=data) #always look at your data first!
Wilson’s growth looks like a logistic function. As a puppy, he put on the pounds quickly (yep, I remember that), and he has flattened out around 75 lbs (thank god). Although I will say that he still thinks he is a lap dog.
A logistic growth model can be implemented in R using the nls function. “nls” stands for non-linear least squares.
The logistic growth function can be written as
y = Wilson’s mass, or could be a population, or any response variable exhibiting logistic growth
phi1 = the first parameter and is the asymptote (e.g. Wilson’s stable adult mass)
phi2 = the second parameter and there’s not much else to say about it
phi3 = the third parameter and is also known as the growth parameter, describes how quickly y approaches the asymptote
x = the input variable, in our case, days since Wilson’s birth
One important difference between “nls” and other models (e.g. ordinary least squares) is that “nls” requires initial starting parameters. This is because R will iteratively evaluate and tweak model parameters to minimize model error (hence the least squares part), but R needs a place to start. There are functions in R that obviate the need for imputing the initial parameters, these are called “self starting” functions and in our case it would be the “SSlogis” function. But for now, we’ll skip that and give R some initial parameters manually.
This calls for the coefficients of a linear model (slope and intercept) using the logit transform (log of the odds) and scaling the y by a reasonable first approximation of the asymptote (e.g. 100 lbs)
(Intercept) days.since.birth -1.096091866 0.002362622
Then, we can plug these values into the nls function as starting parameters.
Here, we construct the model using the starting parameters. Trace returns the iterations.
3825.744 : 100.000 -1.096 0.002 3716.262 : 81.463877204 -0.886292540 0.002512256 3489.696 : 66.115027751 -0.731862991 0.003791928 1927.422 : 63.447368011 -1.036245947 0.007113719 204.0813 : 71.10755282 -2.06528377 0.01379199 123.3499 : 71.76966631 -2.39321544 0.01639836 121.3052 : 71.62701380 -2.45163194 0.01692932 121.2515 : 71.58084567 -2.46029145 0.01701556 121.2502 : 71.57256235 -2.46167633 0.01702943 121.2501 : 71.57121657 -2.46189524 0.01703163 121.2501 : 71.57100257 -2.46192995 0.01703198 121.2501 : 71.57096862 -2.46193546 0.01703204
The first column is the error (sums of squared error?) and the remaining columns are the model parameters. R took 11 iterations to reach model parameters it was happy with.
Formula: mass ~ phi1/(1 + exp(-(phi2 + phi3 * days.since.birth))) Parameters: Estimate Std. Error t value Pr(>|t|) phi1 71.570969 1.201983 59.54 < 2e-16 *** phi2 -2.461935 0.162985 -15.11 6.88e-11 *** phi3 0.017032 0.001227 13.88 2.42e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.753 on 16 degrees of freedom Number of iterations to convergence: 11 Achieved convergence tolerance: 1.985e-06
We can see that our initial parameters weren’t too far off.
Next, let’s create the model predictions and plot the data. (I’ve noticed that copying and pasting this ggplot script isn’t working in R because of the quotation marks. Anybody know the solution for this? Temporarily, just substitute the quotation marks from this text with regular ones within R or R Studio.
#set parameters phi1<-coef(wilson) phi2<-coef(wilson) phi3<-coef(wilson) x<-c(min(data$days.since.birth):max(data$days.since.birth)) #construct a range of x values bounded by the data y<-phi1/(1+exp(-(phi2+phi3*x))) #predicted mass predict<-data.frame(x,y) #create the prediction data frame#And add a nice plot (I cheated and added the awesome inset jpg in another program) ggplot(data=data,aes(x=days.since.birth,y=mass))+ geom_point(color='blue',size=5)+theme_bw()+ labs(x='Days Since Birth',y='Mass (lbs)')+ scale_x_continuous(breaks=c(0,250,500,750, 1000,1250))+ scale_y_continuous(breaks=c(0,10,20,30,40,50,60,70,80))+ theme(axis.text=element_text(size=18),axis.title=element_text(size=24))+ geom_line(data=predict,aes(x=x,y=y), size=1)
Hey, that looks like a pretty good model! It suggests that Wilson will asymptote at 71.57 lbs (Wilson, lose some weight buddy!). The model is under predicting his weight in the later region of the data, probably because of the two data points near the inflection point. Overall, I’m pretty happy with the model though!
There are some other packages (e.g. package “grofit” looks promising) and growth functions (e.g. gompertz) worth exploring because they can streamline some of the code, but we’ll save that for a future post.
Thanks for the data Wilson!
Check this out if you want to see how temperature affects whether Wilson is panting or not.