Pre-screen covariates using a likelihood ratio test.

washb_prescreen(Y, Ws, family = "gaussian", pval = 0.2, print = TRUE)

Arguments

Y

Outcome variable (continuous, such as LAZ, or binary, such as diarrhea)

Ws

data frame that includes candidate adjustment covariates to screen

family

GLM model family (gaussian, binomial, poisson, or negative binomial). Use "neg.binom" for Negative binomial.

pval

The p-value threshold: any variables with a p-value from the lielihood ratio test below this threshold will be returned. Defaults to 0.2

print

Logical for whether to print function output, defaults to TRUE.

Value

Function returns the list of variable names with a likelihood ratio test p-value <0.2 (unless a custom p-value is specified).

Examples

#Prescreen function applied to the Bangladesh diarrheal disease outcome. #The function will test a matrix of covariates and return those related to child diarrheal disease with #a <0.2 p-value from a likelihood ratio test. #Load diarrhea data: library(washb) data(washb_bangladesh_diar) washb_bangladesh_diar <- washb_bangladesh_diar data(washb_bangladesh_enrol) washb_bangladesh_enrol <-washb_bangladesh_enrol # drop svydate and month because they are superceded in the child level diarrhea data washb_bangladesh_enrol$svydate <- NULL washb_bangladesh_enrol$month <- NULL # merge the baseline dataset to the follow-up dataset ad <- merge(washb_bangladesh_enrol,washb_bangladesh_diar,by=c("dataid","clusterid","block","tr"),all.x=FALSE,all.y=TRUE) # subset to the relevant measurement # Year 1 or Year 2 ad <- subset(ad,svy==1|svy==2) #subset the diarrhea to children <36 mos at enrollment ### (exlude new births that are not target children) ad <- subset(ad,sibnewbirth==0) ad <- subset(ad,gt36mos==0) # Exclude children with missing data ad <- subset(ad,!is.na(ad$diar7d)) #Re-order the tr factor for convenience ad$tr <- factor(ad$tr,levels=c("Control","Water","Sanitation","Handwashing","WSH","Nutrition","Nutrition + WSH")) #Ensure that month is coded as a factor ad$month <- factor(ad$month) #Sort the data for perfect replication when using V-fold cross-validation ad <- ad[order(ad$block,ad$clusterid,ad$dataid,ad$childid),] ###Subset to a new dataframe the variables to be screened: Ws <- subset(ad,select=c("fracode","month","agedays","sex","momage","momedu","momheight","hfiacat","Nlt18","Ncomp","watmin","elec","floor","walls","roof","asset_wardrobe","asset_table","asset_chair","asset_khat","asset_chouki","asset_tv","asset_refrig","asset_bike","asset_moto","asset_sewmach","asset_mobile")) ###Run the washb_prescreen function prescreened_varnames<-washb_prescreen(Y=ad$diar7d,Ws,family="binomial")
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> #> Likelihood Ratio Test P-values: #> P-value #> fracode 0.12509 #> month 0.00000 #> agedays 0.00001 #> sex 0.15910 #> momage 0.85834 #> momedu 0.00113 #> momheight 0.83709 #> hfiacat 0.00044 #> Nlt18 0.14600 #> Ncomp 0.85845 #> watmin 0.01749 #> elec 0.00166 #> floor 0.00882 #> walls 0.17286 #> roof 0.44633 #> asset_wardrobe 0.00334 #> asset_table 0.27762 #> asset_chair 0.26366 #> asset_khat 0.05397 #> asset_chouki 0.88290 #> asset_tv 0.10924 #> asset_refrig 0.01527 #> asset_bike 0.00498 #> asset_moto 0.23256 #> asset_sewmach 0.00352 #> asset_mobile 0.71326 #> #> #> Covariates selected (P<0.2): #> P-value #> fracode 0.125086717080 #> month 0.000001277665 #> agedays 0.000008584708 #> sex 0.159101014584 #> momedu 0.001131482118 #> hfiacat 0.000436393481 #> Nlt18 0.145996390524 #> watmin 0.017492576342 #> elec 0.001659255089 #> floor 0.008816329210 #> walls 0.172858463464 #> asset_wardrobe 0.003338351150 #> asset_khat 0.053968452008 #> asset_tv 0.109235025959 #> asset_refrig 0.015267527279 #> asset_bike 0.004977085684 #> asset_sewmach 0.003515782703
###Rerun the function with a stricter p=value prescreened_varname2s<-washb_prescreen(Y=ad$diar7d,Ws,family="binomial", pval=0.5)
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> #> Likelihood Ratio Test P-values: #> P-value #> fracode 0.12509 #> month 0.00000 #> agedays 0.00001 #> sex 0.15910 #> momage 0.85834 #> momedu 0.00113 #> momheight 0.83709 #> hfiacat 0.00044 #> Nlt18 0.14600 #> Ncomp 0.85845 #> watmin 0.01749 #> elec 0.00166 #> floor 0.00882 #> walls 0.17286 #> roof 0.44633 #> asset_wardrobe 0.00334 #> asset_table 0.27762 #> asset_chair 0.26366 #> asset_khat 0.05397 #> asset_chouki 0.88290 #> asset_tv 0.10924 #> asset_refrig 0.01527 #> asset_bike 0.00498 #> asset_moto 0.23256 #> asset_sewmach 0.00352 #> asset_mobile 0.71326 #> #> #> Covariates selected (P<0.5): #> P-value #> fracode 0.125086717080 #> month 0.000001277665 #> agedays 0.000008584708 #> sex 0.159101014584 #> momedu 0.001131482118 #> hfiacat 0.000436393481 #> Nlt18 0.145996390524 #> watmin 0.017492576342 #> elec 0.001659255089 #> floor 0.008816329210 #> walls 0.172858463464 #> roof 0.446334998090 #> asset_wardrobe 0.003338351150 #> asset_table 0.277621120615 #> asset_chair 0.263662641534 #> asset_khat 0.053968452008 #> asset_tv 0.109235025959 #> asset_refrig 0.015267527279 #> asset_bike 0.004977085684 #> asset_moto 0.232560670543 #> asset_sewmach 0.003515782703