agecurveAb
uses ensemble machine learning with SuperLearner
to fit population mean antibody curves by age. If covariates W
are included, the predicted curve is marginally adjusted for W
.
agecurveAb(Y, Age, W = NULL, id = NULL, family = gaussian(), SL.library = c("SL.mean", "SL.glm", "SL.gam", "SL.loess"), cvControl = list(), RFnodesize = NULL, gamdf = NULL)
Y | Antibody measurement. Must be a numeric vector. |
---|---|
Age | Age of the individual at the time of measurement. Must be a numeric vector. |
W | An optional vector, matrix, or data.frame of covariates for each individual used to marginally adjust the curve |
id | An optional cluster or repeated measures id variable. For cross-validation splits, |
family | Outcome family, choose |
SL.library | Library of algorithms to include in the ensemble (see the |
cvControl | Optional list to control cross-valiation (see |
RFnodesize | Optional argument to specify a range of minimum node sizes for the random Forest algorithm. If |
gamdf | Optional argument to specify a range of degrees of freedom for natural smoothing splines in a generalized additive model. If |
agecurveAb
returns a list of objects, which includes the inputs used for estimation, along with fitted results (pY
) and the SuperLearner
object itself (SLfit
). Note that the estimation dataset excludes any observations with missing values in Y
, Age
, W
(if not NULL), or id
(if specified). Also note that factors in W
are converted to design-matrix-style indicator variables. Objects are sorted by Age
for more convenient plotting. If covariates are included, then pY
is the mean predicted antibody level at Age=a
, averaged over the covariates W
.
The agecurveAb
function is a wrapper for SuperLearner
that provides a convenient interface for this specific estimation problem. If the SL.library
argument includes just one model or algorithm, then there is no 'ensemble' but the function provides a standard interface for using single algorithms (e.g., SL.loess
for [stats]{loess}
).
The function assumes a continuous outcome as the default (family="gaussian"
). If a binary outcome is passed to the function with the family="gaussian"
argument it will still estimate seroprevalence as a function of age and other covariates, but it will not necessarily bound predictions between 0 and 1. If you specify family="binomial"
then the predictions will be bound between 0 and 1. Note that some estimation routines do not support binary outcomes (e.g., SL.loess
), and you will see an error if you specify a binomial family with them in the library.
Use cvControl
to optionally control the V-fold cross-validation. The default is to use V=10 folds, without stratification. (see SuperLearner.CV.control
for details).
If SL.randomForest
is included in the library, agecurveAb
will select the minimum node size (between 15 and 40) with cross-validation to avoid over-fitting. If you wish to control the randomForest node size options using a range other than 15-40, you can do so by passing an argument RFnodesize
through this function.
Similarly, if SL.gam
is included in the library, agecurveAb
will select the optimal degrees of freedom for natural splines (between 2 and 10) with cross-validation to get the correct amount of smoothing. If you wish to control the GAM df search, you can do so by passing an argument gamdf
through this function.
van der Laan MJ, Polley EC, Hubbard AE. Super Learner. Stat Appl Genet Mol Biol. 2007;6: 1544–6115. http://www.ncbi.nlm.nih.gov/pubmed/17910531
# NOT RUN { # load the Garki project serology data data("garki_sero") garki_sero$village <- factor(garki_sero$village) garki_sero$sex <- factor(garki_sero$sex) # control village measurements in round 5 dc <- subset(garki_sero,serosvy==5 & tr=="Control") # intervention village measurements in round 5 di <- subset(garki_sero,serosvy==5 & tr=="Intervention") # fit an age-antibody curve in control and intervention villages # adjusted for sex and village # set a seed for perfectly reproducible # splits in the V-fold cross validation set.seed(12345) ccurve <-agecurveAb(Y=log10(dc$ifatpftitre+1), Age=dc$ageyrs, W=dc[,c("sex","village")], id=dc$id) set.seed(12345) icurve <-agecurveAb(Y=log10(di$ifatpftitre+1), Age=di$ageyrs, W=di[,c("sex","village")], id=di$id) # plot the curves plot(ccurve$Age,ccurve$pY,type="l",ylim=c(0,4),bty="l",las=1) lines(icurve$Age,icurve$pY,col="blue") # }