agecurveAb uses ensemble machine learning with SuperLearner to fit population mean antibody curves by age. If covariates W are included, the predicted curve is marginally adjusted for W.

agecurveAb(Y, Age, W = NULL, id = NULL, family = gaussian(),
  SL.library = c("SL.mean", "SL.glm", "SL.gam", "SL.loess"),
  cvControl = list(), RFnodesize = NULL, gamdf = NULL)

Arguments

Y

Antibody measurement. Must be a numeric vector.

Age

Age of the individual at the time of measurement. Must be a numeric vector.

W

An optional vector, matrix, or data.frame of covariates for each individual used to marginally adjust the curve

id

An optional cluster or repeated measures id variable. For cross-validation splits, id forces observations in the same cluster or for the same individual to be in the same validation fold.

family

Outcome family, choose gaussian for continuous outcomes and binomial for binary outcomes (default family="gaussian")

SL.library

Library of algorithms to include in the ensemble (see the SuperLearner package for details).

cvControl

Optional list to control cross-valiation (see SuperLearner for details).

RFnodesize

Optional argument to specify a range of minimum node sizes for the random Forest algorithm. If SL.library includes SL.randomForest, then the default is to search over node sizes of 15,20,...40. Specifying this option will override the default.

gamdf

Optional argument to specify a range of degrees of freedom for natural smoothing splines in a generalized additive model. If SL.library includes SL.gam, then the default is to search over a range of df=2-10. Specifying this option will override the default.

Value

agecurveAb returns a list of objects, which includes the inputs used for estimation, along with fitted results (pY) and the SuperLearner object itself (SLfit). Note that the estimation dataset excludes any observations with missing values in Y, Age, W (if not NULL), or id (if specified). Also note that factors in W are converted to design-matrix-style indicator variables. Objects are sorted by Age for more convenient plotting. If covariates are included, then pY is the mean predicted antibody level at Age=a, averaged over the covariates W.

Details

The agecurveAb function is a wrapper for SuperLearner that provides a convenient interface for this specific estimation problem. If the SL.library argument includes just one model or algorithm, then there is no 'ensemble' but the function provides a standard interface for using single algorithms (e.g., SL.loess for [stats]{loess}).

The function assumes a continuous outcome as the default (family="gaussian"). If a binary outcome is passed to the function with the family="gaussian" argument it will still estimate seroprevalence as a function of age and other covariates, but it will not necessarily bound predictions between 0 and 1. If you specify family="binomial" then the predictions will be bound between 0 and 1. Note that some estimation routines do not support binary outcomes (e.g., SL.loess), and you will see an error if you specify a binomial family with them in the library.

Use cvControl to optionally control the V-fold cross-validation. The default is to use V=10 folds, without stratification. (see SuperLearner.CV.control for details).

If SL.randomForest is included in the library, agecurveAb will select the minimum node size (between 15 and 40) with cross-validation to avoid over-fitting. If you wish to control the randomForest node size options using a range other than 15-40, you can do so by passing an argument RFnodesize through this function.

Similarly, if SL.gam is included in the library, agecurveAb will select the optimal degrees of freedom for natural splines (between 2 and 10) with cross-validation to get the correct amount of smoothing. If you wish to control the GAM df search, you can do so by passing an argument gamdf through this function.

References

van der Laan MJ, Polley EC, Hubbard AE. Super Learner. Stat Appl Genet Mol Biol. 2007;6: 1544–6115. http://www.ncbi.nlm.nih.gov/pubmed/17910531

See also

tmleAb, SuperLearner

Examples

# NOT RUN {
# load the Garki project serology data
data("garki_sero")
garki_sero$village <- factor(garki_sero$village)
garki_sero$sex <- factor(garki_sero$sex)

# control village measurements in round 5
dc <- subset(garki_sero,serosvy==5 & tr=="Control")

# intervention village measurements in round 5
di <- subset(garki_sero,serosvy==5 & tr=="Intervention")

# fit an age-antibody curve in control and intervention villages
# adjusted for sex and village
# set a seed for perfectly reproducible
# splits in the V-fold cross validation
set.seed(12345)
ccurve <-agecurveAb(Y=log10(dc$ifatpftitre+1),
                    Age=dc$ageyrs,
                    W=dc[,c("sex","village")],
                    id=dc$id)
set.seed(12345)
icurve <-agecurveAb(Y=log10(di$ifatpftitre+1),
                    Age=di$ageyrs,
                    W=di[,c("sex","village")],
                    id=di$id)

# plot the curves
plot(ccurve$Age,ccurve$pY,type="l",ylim=c(0,4),bty="l",las=1)
lines(icurve$Age,icurve$pY,col="blue")
# }