washb_tmle.Rd
Estimate intention-to-treat parameters using targeted maximum likelihood estimation (TMLE), potentially adjusted for covariates and missing outcomes
washb_tmle(Y, tr, W = NULL, id = 1:length(Y), pair = NULL, Delta = rep(1, length(Y)), family = "gaussian", contrast, Q.SL.library = c("SL.mean", "SL.glm", "SL.bayesglm", "SL.gam", "SL.glmnet"), g.SL.library = Q.SL.library, pval = 0.2, FECR = NULL, seed = NULL, print = TRUE)
Y | Outcome variable (continuous, such as LAZ, or binary, such as diarrhea) |
---|---|
tr | Treatment group variable (binary or factor) |
W | Data frame that includes adjustment covariates |
id | ID variable for independent units. For pair-matched designs, this is the matched pair and should be the same as the |
pair | An optional ID variable to identify the matched pair unit (In WASH Benefits, blocks) if conducting a matched-pair analysis. This argument is used to drop pairs that are missing one or more treatment groups. Incomplete pairs is not an issue in the overall Bangladesh trial (there were no incomplete blocks), but is an issue in the Kenya trial where there were some incomplete blocks. |
Delta | indicator of missing outcome. 1 - observed, 0 - missing. For values with |
family | Outcome family: |
contrast | Vector of length 2 that includes the treatment groups to contrast from the |
g.SL.library | Library of algorithms to include in the SuperLearner for the treatment model Pr(A|W) and for the missingness model Pr(Delta|A,W) (if Delta is specified) |
pval | The p-value threshold used to pre-screen covariates ( |
FECR | (default is |
seed | A seed for the pseudo-random cross-validation split used in model selection (use for perfectly reproducible results). |
Logical for printed output, defaults to true. If false, no output will be printed to the console if the returned object is saved to an R object. |
|
Q.SL.Library | Library of algorithms to include in the SuperLearner for the outcome model |
A tmle()
fit object (see the tmle
package for details). The $estimates
list includes parameter estimates along with variance estimates and confidence intervals. If FECR=TRUE
, then the tmle object also includes results in $estimates$FECR
.
The washb_tmle
function is mainly a convenience wrapper for the tmle
function. It estimates intention-to-treat effects in a trial using targeted maximum likelihood estimation (TMLE). In brief, the function does the following: it restricts the data to complete observations in the two arms listed in the contrast
argument, it pre-screens covariates (W
), if specified, to select those that have a univariate association with the outcome, and then it estimates the intention-to-treat effect using TMLE. If family='binomial'
, then the function returns effects on the absolute, relative, and odds ratio scale. If Delta
is specified (i.e., observations with missing outcomes are included), then the function will adjust the effects for missingness using inverse probability of censoring weights, with the weights estimated using super learning of Pr(Delta|A,W)
.
If the analysis is pair-matched (as for primary outcomes), be sure to specify the pair ID in the id
argument. Do not include pair IDs in the adjustment covariate set.
If adjustment covariates (W
) are specified, then by default they are pre-screened and the subset that is associated with the outcome based on a likelihood ratio test are used in the estimation. There are some other important defaults to be aware of. First, the washb_tmle
function estimates the treatment mechanism even though it is a randomized trial. There are two reasons for this -- one theoretical and one practical. The theoretical reason is that estimating the treatment mechanism gains efficiency (see Balzer et al. 2016); the practical reason is that unless the analysis is conducted at the cluster level (i.e., providing cluster means to the washb_tmle
function), then the empirical treatment probabilities differ slightly due to varying cluster sizes. Estimating the treatment mechanism ensures that the variance calculation correctly accounts for the empirical treatment probabilities in the data.
Another default is that washb_tmle
uses the SuperLearner
algorithm to adjust for covariates and to predict the treatment mechanism and censoring mechanism (if adjusting for missing outcomes). The default algorithm library includes the simple mean, main terms GLM, main terms Bayes GLM with non-informative priors, generalized additive models (degree 2), and lasso (glmnet). These are the pre-specified algorithms from the original trial statistical analysis plan. You can type listWrappers()
to see the full set of algorithms implemented in the super learner. If you just wish to use a main effects GLM model to adjust for the covariates, then you can specify Q.SL.library="SL.glm"
. If you are dealing with very small sample sizes (e.g., in a substudy), then you may wish to use even simpler libraries, such as a set of univariate regressions (as in Balzer et al. 2016).
By default the function uses the same algorithm library to predict the outcome (Q.SL.library
) and the treatment and censoring mechanisms (g.SL.library
). You can specify a different library for the treatment and censoring mechanisms using the g.SL.library
argument.
If you want to adjust for missing outcomes in the analysis using inverse probability of censoring weights (IPCW), then you need to include observations that have a missing outcome (Y
) set to an arbitrary value (e.g., 9
) with Delta=0
for those observations. Observations with missing outcomes also need to have treatment (tr
) and covariate (W
) information, which are used to create weights for Pr(Delta|A,W)
. Please see the package's vignette for a detailed example of how to estimate an IPCW-TMLE parameter.
A standard parameter of interest for soil transmitted helminth infection intensity (eggs per gram) is the fecal egg count reduction (FECR) percentage, which is defined as FECR = (EY1-EY0)/EY0 = (EY1/EY0)-1. To estimate the FECR using washb_tmle
simply specify FECR='arithmetic'
or FECR='geometric'
(the default is FECR=FALSE
), and a list of FECR estimates will be added to the returned object in $estimates$FECR
. If estimating the FECR using geometric means (computed on log(Y)
outcomes), ensure that you pass washb_tmle
log transformed outcomes. washb_tmle
estimates the standard error and 95 percent confidence intervals for the FECR with the influence curve and the delta method. The FECR parameter estimate and its variance account for missing outcomes if specified (Delta
) and repeated observations (id
). The result is returned as a proportion (EY1-EY0)/EY0; simply multiply the FECR estimate by 100 to report the percentage reduction.
Note: washb_tmle
depends on the tmle
package, the SuperLearner
package, as well as the internal washb_prescreen
and design_matrix
functions.
Gruber S, van der Laan M. tmle: An R Package for Targeted Maximum Likelihood Estimation. J Stat Softw. 2012;51: 1–35. (https://www.jstatsoft.org/article/view/v051i13)
Balzer LB, van der Laan MJ, Petersen ML, SEARCH Collaboration. Adaptive pre-specification in randomized trials with and without pair-matching. Stat Med. 2016; doi:10.1002/sim.7023 (http://onlinelibrary.wiley.com/doi/10.1002/sim.7023/abstract)
#TBD