Pre-screen covariates using a likelihood ratio test.

washb_prescreen(Y, Ws, family = "gaussian", pval = 0.2, print = TRUE)

Arguments

Y	Outcome variable (continuous, such as LAZ, or binary, such as diarrhea)
Ws	data frame that includes candidate adjustment covariates to screen
family	GLM model family (gaussian, binomial, poisson, or negative binomial). Use "neg.binom" for Negative binomial.
pval	The p-value threshold: any variables with a p-value from the lielihood ratio test below this threshold will be returned. Defaults to 0.2
print	Logical for whether to print function output, defaults to TRUE.

Value

Function returns the list of variable names with a likelihood ratio test p-value <0.2 (unless a custom p-value is specified).

Examples


#Prescreen function applied to the Bangladesh diarrheal disease outcome.
#The function will test a matrix of covariates and return those related to child diarrheal disease with
#a <0.2 p-value from a likelihood ratio test.

#Load diarrhea data:
library(washb)

data(washb_bangladesh_diar)
washb_bangladesh_diar <- washb_bangladesh_diar

data(washb_bangladesh_enrol)
washb_bangladesh_enrol <-washb_bangladesh_enrol

 # drop svydate and month because they are superceded in the child level diarrhea data
washb_bangladesh_enrol$svydate <- NULL
washb_bangladesh_enrol$month <- NULL

# merge the baseline dataset to the follow-up dataset
ad <- merge(washb_bangladesh_enrol,washb_bangladesh_diar,by=c("dataid","clusterid","block","tr"),all.x=FALSE,all.y=TRUE)

# subset to the relevant measurement
# Year 1 or Year 2
ad <- subset(ad,svy==1|svy==2)

#subset the diarrhea to children <36 mos at enrollment
### (exlude new births that are not target children)
ad <- subset(ad,sibnewbirth==0)
ad <- subset(ad,gt36mos==0)

# Exclude children with missing data
ad <- subset(ad,!is.na(ad$diar7d))

#Re-order the tr factor for convenience
ad$tr <- factor(ad$tr,levels=c("Control","Water","Sanitation","Handwashing","WSH","Nutrition","Nutrition + WSH"))

#Ensure that month is coded as a factor
ad$month <- factor(ad$month)

#Sort the data for perfect replication when using V-fold cross-validation
ad <- ad[order(ad$block,ad$clusterid,ad$dataid,ad$childid),]


###Subset to a new dataframe the variables to be screened:
Ws <- subset(ad,select=c("fracode","month","agedays","sex","momage","momedu","momheight","hfiacat","Nlt18","Ncomp","watmin","elec","floor","walls","roof","asset_wardrobe","asset_table","asset_chair","asset_khat","asset_chouki","asset_tv","asset_refrig","asset_bike","asset_moto","asset_sewmach","asset_mobile"))

###Run the washb_prescreen function
prescreened_varnames<-washb_prescreen(Y=ad$diar7d,Ws,family="binomial")
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> 
#> Likelihood Ratio Test P-values:
#>                P-value
#> fracode        0.12509
#> month          0.00000
#> agedays        0.00001
#> sex            0.15910
#> momage         0.85834
#> momedu         0.00113
#> momheight      0.83709
#> hfiacat        0.00044
#> Nlt18          0.14600
#> Ncomp          0.85845
#> watmin         0.01749
#> elec           0.00166
#> floor          0.00882
#> walls          0.17286
#> roof           0.44633
#> asset_wardrobe 0.00334
#> asset_table    0.27762
#> asset_chair    0.26366
#> asset_khat     0.05397
#> asset_chouki   0.88290
#> asset_tv       0.10924
#> asset_refrig   0.01527
#> asset_bike     0.00498
#> asset_moto     0.23256
#> asset_sewmach  0.00352
#> asset_mobile   0.71326
#> 
#> 
#> Covariates selected (P<0.2):
#>                       P-value
#> fracode        0.125086717080
#> month          0.000001277665
#> agedays        0.000008584708
#> sex            0.159101014584
#> momedu         0.001131482118
#> hfiacat        0.000436393481
#> Nlt18          0.145996390524
#> watmin         0.017492576342
#> elec           0.001659255089
#> floor          0.008816329210
#> walls          0.172858463464
#> asset_wardrobe 0.003338351150
#> asset_khat     0.053968452008
#> asset_tv       0.109235025959
#> asset_refrig   0.015267527279
#> asset_bike     0.004977085684
#> asset_sewmach  0.003515782703

###Rerun the function with a stricter p=value
prescreened_varname2s<-washb_prescreen(Y=ad$diar7d,Ws,family="binomial", pval=0.5)
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> 
#> Likelihood Ratio Test P-values:
#>                P-value
#> fracode        0.12509
#> month          0.00000
#> agedays        0.00001
#> sex            0.15910
#> momage         0.85834
#> momedu         0.00113
#> momheight      0.83709
#> hfiacat        0.00044
#> Nlt18          0.14600
#> Ncomp          0.85845
#> watmin         0.01749
#> elec           0.00166
#> floor          0.00882
#> walls          0.17286
#> roof           0.44633
#> asset_wardrobe 0.00334
#> asset_table    0.27762
#> asset_chair    0.26366
#> asset_khat     0.05397
#> asset_chouki   0.88290
#> asset_tv       0.10924
#> asset_refrig   0.01527
#> asset_bike     0.00498
#> asset_moto     0.23256
#> asset_sewmach  0.00352
#> asset_mobile   0.71326
#> 
#> 
#> Covariates selected (P<0.5):
#>                       P-value
#> fracode        0.125086717080
#> month          0.000001277665
#> agedays        0.000008584708
#> sex            0.159101014584
#> momedu         0.001131482118
#> hfiacat        0.000436393481
#> Nlt18          0.145996390524
#> watmin         0.017492576342
#> elec           0.001659255089
#> floor          0.008816329210
#> walls          0.172858463464
#> roof           0.446334998090
#> asset_wardrobe 0.003338351150
#> asset_table    0.277621120615
#> asset_chair    0.263662641534
#> asset_khat     0.053968452008
#> asset_tv       0.109235025959
#> asset_refrig   0.015267527279
#> asset_bike     0.004977085684
#> asset_moto     0.232560670543
#> asset_sewmach  0.003515782703

Pre-screen covariates using a likelihood ratio test.

Arguments

Value

Examples

Contents