washb_prescreen.Rd
Pre-screen covariates using a likelihood ratio test.
washb_prescreen(Y, Ws, family = "gaussian", pval = 0.2, print = TRUE)
Y | Outcome variable (continuous, such as LAZ, or binary, such as diarrhea) |
---|---|
Ws | data frame that includes candidate adjustment covariates to screen |
family | GLM model family (gaussian, binomial, poisson, or negative binomial). Use "neg.binom" for Negative binomial. |
pval | The p-value threshold: any variables with a p-value from the lielihood ratio test below this threshold will be returned. Defaults to 0.2 |
Logical for whether to print function output, defaults to TRUE. |
Function returns the list of variable names with a likelihood ratio test p-value <0.2 (unless a custom p-value is specified).
#Prescreen function applied to the Bangladesh diarrheal disease outcome. #The function will test a matrix of covariates and return those related to child diarrheal disease with #a <0.2 p-value from a likelihood ratio test. #Load diarrhea data: library(washb) data(washb_bangladesh_diar) washb_bangladesh_diar <- washb_bangladesh_diar data(washb_bangladesh_enrol) washb_bangladesh_enrol <-washb_bangladesh_enrol # drop svydate and month because they are superceded in the child level diarrhea data washb_bangladesh_enrol$svydate <- NULL washb_bangladesh_enrol$month <- NULL # merge the baseline dataset to the follow-up dataset ad <- merge(washb_bangladesh_enrol,washb_bangladesh_diar,by=c("dataid","clusterid","block","tr"),all.x=FALSE,all.y=TRUE) # subset to the relevant measurement # Year 1 or Year 2 ad <- subset(ad,svy==1|svy==2) #subset the diarrhea to children <36 mos at enrollment ### (exlude new births that are not target children) ad <- subset(ad,sibnewbirth==0) ad <- subset(ad,gt36mos==0) # Exclude children with missing data ad <- subset(ad,!is.na(ad$diar7d)) #Re-order the tr factor for convenience ad$tr <- factor(ad$tr,levels=c("Control","Water","Sanitation","Handwashing","WSH","Nutrition","Nutrition + WSH")) #Ensure that month is coded as a factor ad$month <- factor(ad$month) #Sort the data for perfect replication when using V-fold cross-validation ad <- ad[order(ad$block,ad$clusterid,ad$dataid,ad$childid),] ###Subset to a new dataframe the variables to be screened: Ws <- subset(ad,select=c("fracode","month","agedays","sex","momage","momedu","momheight","hfiacat","Nlt18","Ncomp","watmin","elec","floor","walls","roof","asset_wardrobe","asset_table","asset_chair","asset_khat","asset_chouki","asset_tv","asset_refrig","asset_bike","asset_moto","asset_sewmach","asset_mobile")) ###Run the washb_prescreen function prescreened_varnames<-washb_prescreen(Y=ad$diar7d,Ws,family="binomial")#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred#> #> Likelihood Ratio Test P-values: #> P-value #> fracode 0.12509 #> month 0.00000 #> agedays 0.00001 #> sex 0.15910 #> momage 0.85834 #> momedu 0.00113 #> momheight 0.83709 #> hfiacat 0.00044 #> Nlt18 0.14600 #> Ncomp 0.85845 #> watmin 0.01749 #> elec 0.00166 #> floor 0.00882 #> walls 0.17286 #> roof 0.44633 #> asset_wardrobe 0.00334 #> asset_table 0.27762 #> asset_chair 0.26366 #> asset_khat 0.05397 #> asset_chouki 0.88290 #> asset_tv 0.10924 #> asset_refrig 0.01527 #> asset_bike 0.00498 #> asset_moto 0.23256 #> asset_sewmach 0.00352 #> asset_mobile 0.71326 #> #> #> Covariates selected (P<0.2): #> P-value #> fracode 0.125086717080 #> month 0.000001277665 #> agedays 0.000008584708 #> sex 0.159101014584 #> momedu 0.001131482118 #> hfiacat 0.000436393481 #> Nlt18 0.145996390524 #> watmin 0.017492576342 #> elec 0.001659255089 #> floor 0.008816329210 #> walls 0.172858463464 #> asset_wardrobe 0.003338351150 #> asset_khat 0.053968452008 #> asset_tv 0.109235025959 #> asset_refrig 0.015267527279 #> asset_bike 0.004977085684 #> asset_sewmach 0.003515782703###Rerun the function with a stricter p=value prescreened_varname2s<-washb_prescreen(Y=ad$diar7d,Ws,family="binomial", pval=0.5)#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred#> #> Likelihood Ratio Test P-values: #> P-value #> fracode 0.12509 #> month 0.00000 #> agedays 0.00001 #> sex 0.15910 #> momage 0.85834 #> momedu 0.00113 #> momheight 0.83709 #> hfiacat 0.00044 #> Nlt18 0.14600 #> Ncomp 0.85845 #> watmin 0.01749 #> elec 0.00166 #> floor 0.00882 #> walls 0.17286 #> roof 0.44633 #> asset_wardrobe 0.00334 #> asset_table 0.27762 #> asset_chair 0.26366 #> asset_khat 0.05397 #> asset_chouki 0.88290 #> asset_tv 0.10924 #> asset_refrig 0.01527 #> asset_bike 0.00498 #> asset_moto 0.23256 #> asset_sewmach 0.00352 #> asset_mobile 0.71326 #> #> #> Covariates selected (P<0.5): #> P-value #> fracode 0.125086717080 #> month 0.000001277665 #> agedays 0.000008584708 #> sex 0.159101014584 #> momedu 0.001131482118 #> hfiacat 0.000436393481 #> Nlt18 0.145996390524 #> watmin 0.017492576342 #> elec 0.001659255089 #> floor 0.008816329210 #> walls 0.172858463464 #> roof 0.446334998090 #> asset_wardrobe 0.003338351150 #> asset_table 0.277621120615 #> asset_chair 0.263662641534 #> asset_khat 0.053968452008 #> asset_tv 0.109235025959 #> asset_refrig 0.015267527279 #> asset_bike 0.004977085684 #> asset_moto 0.232560670543 #> asset_sewmach 0.003515782703