Target maximum likelihood estimation of intention-to-treat effects in the WASH Benefits trials

Estimate intention-to-treat parameters using targeted maximum likelihood estimation (TMLE), potentially adjusted for covariates and missing outcomes

washb_tmle(Y, tr, W = NULL, id = 1:length(Y), pair = NULL,
  Delta = rep(1, length(Y)), family = "gaussian", contrast,
  Q.SL.library = c("SL.mean", "SL.glm", "SL.bayesglm", "SL.gam",
  "SL.glmnet"), g.SL.library = Q.SL.library, pval = 0.2, FECR = NULL,
  seed = NULL, print = TRUE)

Arguments

Y	Outcome variable (continuous, such as LAZ, or binary, such as diarrhea)
tr	Treatment group variable (binary or factor)
W	Data frame that includes adjustment covariates
id	ID variable for independent units. For pair-matched designs, this is the matched pair and should be the same as the `pair` argument. For analyses that are not pair-matched, then it should typically be the cluster.
pair	An optional ID variable to identify the matched pair unit (In WASH Benefits, blocks) if conducting a matched-pair analysis. This argument is used to drop pairs that are missing one or more treatment groups. Incomplete pairs is not an issue in the overall Bangladesh trial (there were no incomplete blocks), but is an issue in the Kenya trial where there were some incomplete blocks.
Delta	indicator of missing outcome. 1 - observed, 0 - missing. For values with `Delta=0`, ensure the missing outcomes in `Y` are set to an arbitrary constant (e.g., 9).
family	Outcome family: `gaussian` (continuous outcomes, like LAZ) or `binomial` (binary outcomes like diarrhea or stunting)
contrast	Vector of length 2 that includes the treatment groups to contrast from the `tr` variable, reference group first (e.g., `contrast=c('Control','Nutrition')`).
g.SL.library	Library of algorithms to include in the SuperLearner for the treatment model Pr(A\|W) and for the missingness model Pr(Delta\|A,W) (if Delta is specified)
pval	The p-value threshold used to pre-screen covariates (`W`) based on a likelihood ratio test in a univariate regression with the outcome (`Y`). Variables with a univariate association p-value below this threshold will be used in the final model. Defaults to 0.2.
FECR	(default is `NULL`). Estimate the fecal egg count reduction (FECR) proportion by specifying either `FECR='arithmetic'` to estimate it on the artithmetic mean scale or `FECR='geometric'` on the geometric mean scale. If `FECR='geometric'` ensure that you use log-transformed eggs per gram. When estimating the FECR, also ensure that you specify `family='gaussian'` (see details).
seed	A seed for the pseudo-random cross-validation split used in model selection (use for perfectly reproducible results).
print	Logical for printed output, defaults to true. If false, no output will be printed to the console if the returned object is saved to an R object.
Q.SL.Library	Library of algorithms to include in the SuperLearner for the outcome model

Value

A tmle() fit object (see the tmle package for details). The $estimates list includes parameter estimates along with variance estimates and confidence intervals. If FECR=TRUE, then the tmle object also includes results in $estimates$FECR.

Details

The washb_tmle function is mainly a convenience wrapper for the tmle function. It estimates intention-to-treat effects in a trial using targeted maximum likelihood estimation (TMLE). In brief, the function does the following: it restricts the data to complete observations in the two arms listed in the contrast argument, it pre-screens covariates (W), if specified, to select those that have a univariate association with the outcome, and then it estimates the intention-to-treat effect using TMLE. If family='binomial', then the function returns effects on the absolute, relative, and odds ratio scale. If Delta is specified (i.e., observations with missing outcomes are included), then the function will adjust the effects for missingness using inverse probability of censoring weights, with the weights estimated using super learning of Pr(Delta|A,W).

If the analysis is pair-matched (as for primary outcomes), be sure to specify the pair ID in the id argument. Do not include pair IDs in the adjustment covariate set.

If adjustment covariates (W) are specified, then by default they are pre-screened and the subset that is associated with the outcome based on a likelihood ratio test are used in the estimation. There are some other important defaults to be aware of. First, the washb_tmle function estimates the treatment mechanism even though it is a randomized trial. There are two reasons for this -- one theoretical and one practical. The theoretical reason is that estimating the treatment mechanism gains efficiency (see Balzer et al. 2016); the practical reason is that unless the analysis is conducted at the cluster level (i.e., providing cluster means to the washb_tmle function), then the empirical treatment probabilities differ slightly due to varying cluster sizes. Estimating the treatment mechanism ensures that the variance calculation correctly accounts for the empirical treatment probabilities in the data.

Another default is that washb_tmle uses the SuperLearner algorithm to adjust for covariates and to predict the treatment mechanism and censoring mechanism (if adjusting for missing outcomes). The default algorithm library includes the simple mean, main terms GLM, main terms Bayes GLM with non-informative priors, generalized additive models (degree 2), and lasso (glmnet). These are the pre-specified algorithms from the original trial statistical analysis plan. You can type listWrappers() to see the full set of algorithms implemented in the super learner. If you just wish to use a main effects GLM model to adjust for the covariates, then you can specify Q.SL.library="SL.glm". If you are dealing with very small sample sizes (e.g., in a substudy), then you may wish to use even simpler libraries, such as a set of univariate regressions (as in Balzer et al. 2016).

By default the function uses the same algorithm library to predict the outcome (Q.SL.library) and the treatment and censoring mechanisms (g.SL.library). You can specify a different library for the treatment and censoring mechanisms using the g.SL.library argument.

If you want to adjust for missing outcomes in the analysis using inverse probability of censoring weights (IPCW), then you need to include observations that have a missing outcome (Y) set to an arbitrary value (e.g., 9) with Delta=0 for those observations. Observations with missing outcomes also need to have treatment (tr) and covariate (W) information, which are used to create weights for Pr(Delta|A,W). Please see the package's vignette for a detailed example of how to estimate an IPCW-TMLE parameter.

A standard parameter of interest for soil transmitted helminth infection intensity (eggs per gram) is the fecal egg count reduction (FECR) percentage, which is defined as FECR = (EY1-EY0)/EY0 = (EY1/EY0)-1. To estimate the FECR using washb_tmle simply specify FECR='arithmetic' or FECR='geometric' (the default is FECR=FALSE), and a list of FECR estimates will be added to the returned object in $estimates$FECR. If estimating the FECR using geometric means (computed on log(Y) outcomes), ensure that you pass washb_tmle log transformed outcomes. washb_tmle estimates the standard error and 95 percent confidence intervals for the FECR with the influence curve and the delta method. The FECR parameter estimate and its variance account for missing outcomes if specified (Delta) and repeated observations (id). The result is returned as a proportion (EY1-EY0)/EY0; simply multiply the FECR estimate by 100 to report the percentage reduction.

Note: washb_tmle depends on the tmle package, the SuperLearner package, as well as the internal washb_prescreen and design_matrix functions.

References

Gruber S, van der Laan M. tmle: An R Package for Targeted Maximum Likelihood Estimation. J Stat Softw. 2012;51: 1–35. (https://www.jstatsoft.org/article/view/v051i13)

Balzer LB, van der Laan MJ, Petersen ML, SEARCH Collaboration. Adaptive pre-specification in randomized trials with and without pair-matching. Stat Med. 2016; doi:10.1002/sim.7023 (http://onlinelibrary.wiley.com/doi/10.1002/sim.7023/abstract)

Examples