Abstract
Objectives
Respondent-Driven Sampling (RDS) is a variant of link-tracing, a sampling technique for surveying hard-to-reach communities that takes advantage of community members' social networks to reach potential participants. While the RDS sampling mechanism and associated methods of adjusting for the sampling at the analysis stage are well-documented in the statistical sciences literature, methodological focus has largely been restricted to estimation of population means and proportions, while giving little to no consideration to the estimation of population network parameters. As a network-based sampling method, RDS is faced with the fundamental problem of sampling from population networks where features such as homophily (the tendency for individuals with similar traits to share social ties) and differential activity (the ratio of the average number of connections by attribute) are sensitive to the choice of a sampling method.
Methods
Many simple approaches exist to generate simulated RDS data, with specific levels of network features (mainly homophily and differential activity), where the focus is on estimating means and proportions (Gile 2011; Gile et al. 2015; Spiller et al. 2018). However, recent findings on the inconsistency of estimators of network features such as homophily in partially observed networks (Crawford et al. 2017; Shalizi and Rinaldo 2013) raise the question of whether those target features can be recovered using the observed RDS data alone – as recovering information about these features is critical if we wish to condition upon them. In this paper, we conduct a simulation study to assess the accuracy of existing RDS simulation methods, in terms of their abilities to generate RDS samples with the desired levels of two network parameters: homophily and differential activity.
Results
The results show that (1) homophily cannot be consistently estimated from simulated RDS samples and (2) differential activity estimators are more precise when groups, defined by traits, are equally active and equally represented in the population. We use this approach to mimic features of the Engage Study, an RDS sample of gay, bisexual and other men who have sex with men in Montréal, Canada.
Conclusions
In this paper, we highlight that it is possible, in some cases, to simulate population networks by mimicking the characteristics of real-world RDS data while retaining accuracy and precision for target network features in the samples.
Funding source: Natural Sciences and Engineering Research Council (NSERC) of Canada
Award Identifier / Grant number: RGPIN-2019-04230
Acknowledgment
The authors would like to thank the Engage study participants, office staff, and community engagement committee members, as well as our community partner agencies REZO, ACCM and Maison Plein Coeur. The authors also wish to acknowledge the support of David M. Moore, Nathan J. Lachowsky and Jody Jollimore and their contributions to the work presented here. Engage/Momentum II is funded by the Canadian Institutes for Health Research (CIHR, TE2-138299), the CIHR Canadian HIV/AIDS Trails Network (CTN300), the Canadian Foundation for AIDS Research (CANFAR, Engage), the Ontario HIV Treatment Network (OHTN, 1051), the Public Health Agency of Canada (Ref: 4500370314), Canadian Blood Services (MSM2017LP-OD), and the Ministère de la Santé et des Services sociaux (MSSS) du Québec. Erica E. M. Moodie acknowledges a chercheur boursier senior career award from the Fonds de recherche du Québec – Santé.
-
Research funding: MY is funded by a Postdoctoral Fellowship jointly sponsored by the Statistical and Applied Mathematical Sciences Institute (SAMSI) and the Canadian Statistical Sciences Institute (CANSSI). The methodological developments in this manuscript are supported by a Discovery Grant to EEMM from the Canadian Natural Sciences and Research Council (NSERC), grant #RGPIN-2019-04230.
-
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Competing interests: Authors state no conflict of interest.
-
Informed consent: Not applicable.
-
Ethical approval: Not applicable.
Characteristics of the Engage population network and RDS sample.
Parameter | Estimated value | 95% CI |
---|---|---|
Population network | ||
Population size | 40,400 | |
Mean degree | 16.63 | |
Prevalence, % | ||
Condomless anal sex in the past six months | 57.9 | [52.7, 63.0] |
Currently in a relationship | 43.9 | [38.8, 49.0] |
HIV positive | 12.7 | [9.3, 16.0] |
RDS sample | ||
Number of seeds | 27 | – |
Number of recruits | ||
0 | 651 | – |
1 | 236 | – |
2 | 117 | – |
3 | 81 | – |
4 | 49 | – |
5 | 27 | – |
6 | 18 | – |
Sample size | 1,179 | – |
(Pearson) correlation matrix of three nodal covariates for the Engage RDS sample. Unweighted and weighted correlations are displayed, with weighted correlations in parenthesis.
1. CAS | 2. CIR | 3. HIV+ | |
---|---|---|---|
1. Condomless anal sex (CAS) | 1 | 0.104*** (0.115***) | 0.023 (0.018) |
2. Currently in a relationship (CIR) | 1 | 0.046 (0.002) | |
3. HIV positive (HIV+) | 1 |
-
***p-Value <0.001.
References
Barbiero, A., and P. A. Ferrari. 2017. “An R Package for the Simulation of Correlated Discrete Variables.” Communications in Statistics – Simulation and Computation 46: 5123–40, https://doi.org/10.1080/03610918.2016.1146758.Search in Google Scholar
Biernacki, P., and D. Waldorf. 1981. “Snowball Sampling: Problem and Techniques of Chain Referral Sampling.” Sociological Methods & Research 10: 141–63, https://doi.org/10.1177/004912418101000205.Search in Google Scholar
Butts, C. 2008a. “Network: A Package for Managing Relational Data in R.” Journal of Statistical Software, Articles 24: 1–36, https://doi.org/10.18637/jss.v024.i02.Search in Google Scholar
Butts, C. 2008b. “Social Network Analysis with sna.” Journal of Statistical Software, Articles 24: 1–51, https://doi.org/10.18637/jss.v024.i06.Search in Google Scholar
Camirand, H., I. Traoré, and J. Baulne. 2016. L’Enquête québécoise sur la santé de la population, 2014-2015: pour en savoir plus sur la santé des Québécois.Search in Google Scholar
Costenbader, E., and T. W. Valente. 2003. “The Stability of Centrality Measures when Networks are Sampled.” Social Networks 25: 283–307, https://doi.org/10.1016/s0378-8733(03)00012-1.Search in Google Scholar
Crawford, F. W., P. M. Aronow, L. Zeng, and J. Li. 2017. “Identification of Homophily and Preferential Recruitment in Respondent-Driven Sampling.” American Journal of Epidemiology 187: 153–60, https://doi.org/10.1093/aje/kwx208.Search in Google Scholar
Durrett, R. 2006. Erdös–Rényi Random Graphs, 27–69, Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press.10.1017/CBO9780511546594.003Search in Google Scholar
Gile, K. J. 2011. “Improved Inference for Respondent-Driven Sampling Data with Application to HIV Prevalence Estimation.” Journal of the American Statistical Association 106: 135–46, https://doi.org/10.1198/jasa.2011.ap09475.Search in Google Scholar
Gile, K. J., and M. S. Handcock. 2010. “Respondent-driven Sampling: An Assessment of Current Methodology.” Sociological Methodology 40: 285–327, https://doi.org/10.1111/j.1467-9531.2010.01223.x.Search in Google Scholar
Gile, K. J., and M. S. Handcock. 2015. “Network Model-Assisted Inference from Respondent-Driven Sampling Data.” Journal of the Royal Statistical Society – Series A: Statistics in Society 178: 619–39, https://doi.org/10.1111/rssa.12091.Search in Google Scholar
Gile, K. J., L. G. Johnston, and M. J. Salganik. 2015. “Diagnostics for Respondent-Driven Sampling.” Journal of the Royal Statistical Society – Series A: Statistics in Society 178: 241–69, https://doi.org/10.1111/rssa.12059.Search in Google Scholar
Gile, K. J., I. S. Beaudry, M. S. Handcock, and M. Q. Ott. 2018. “Methods for Inference from Respondent-Driven Sampling Data.” Annual Review of Statistics and Its Application 5: 65–93, https://doi.org/10.1146/annurev-statistics-031017-100704.Search in Google Scholar
Goodman, L. A. 1961. “Snowball Sampling.” The Annals of Mathematical Statistics 32: 148–70, https://doi.org/10.1214/aoms/1177705148.Search in Google Scholar
Handcock, M. S., D. R. Hunter, C. T. Butts, S. M. Goodreau, and M. Morris. 2003. Statnet: Software Tools for the Statistical Modeling of Network Data. Also available at http://statnetproject.org.Search in Google Scholar
Harris, J. 2014. An Introduction to Exponential Random Graph Modeling, Quantitative Applications in the Social Sciences. United States: SAGE Publications.10.4135/9781452270135Search in Google Scholar
Heckathorn, D. D. 1997. “Respondent-driven Sampling: A New Approach to the Study of Hidden Populations.” Social Problems 44: 174–99, https://doi.org/10.1525/sp.1997.44.2.03x0221m.Search in Google Scholar
Heckathorn, D. D. 2002. “Respondent-driven Sampling II: Deriving Valid Population Estimates from Chain-Referral Samples of Hidden Populations.” Social Problems 49: 11–34, https://doi.org/10.1525/sp.2002.49.1.11.Search in Google Scholar
Hunter, D. R., M. S. Handcock, C. T. Butts, S. M. Goodreau, and M. Morris. 2008. “Ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks.” Journal of Statistical Software 24: 1–29, https://doi.org/10.18637/jss.v024.i03.Search in Google Scholar
Lambert, G., J. Cox, M. Messier-Peet, H. Apelian, and E. E. M. Moodie. 2019. Engage Montréal, Portrait de la santé sexuelle des hommes de la région métropolitaine de Montréal ayant des relations sexuelles avec des hommes, Cycle 2017-2018, Faits saillants. Canada: Direction régionale de santé publique du CIUSSS du Centre-Sud-de-l’Île-de-Montréal.Search in Google Scholar
Lin, S.-D., M.-Y. Yeh, and C.-T. Li. 2013. “Sampling and Summarization for Social Networks.” In 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)(tutorial). United States: Society for Industrial and Applied Mathematics.Search in Google Scholar
Newman, M. E. 2002. “Assortative Mixing in Networks.” Physical Review Letters 89: 208701, doi:https://doi.org/10.1103/physrevlett.89.208701.Search in Google Scholar
Salganik, M. J., and D. Heckathorn. 2004. “Sampling and Estimation in Hidden Populations Using Respondent-Driven Sampling.” Sociological Methodology 34: 193–240, https://doi.org/10.1111/j.0081-1750.2004.00152.x.Search in Google Scholar
Shalizi, C. R., and A. Rinaldo. 2013. “Consistency under Sampling of Exponential Random Graph Models.” Annals of Statistics 41: 508, https://doi.org/10.1214/12-aos1044.Search in Google Scholar
Spiller, M. W., K. J. Gile, M. S. Handcock, C. M. Mar, and C. Wejnert. 2018. “Evaluating Variance Estimators for Respondent-Driven Sampling.” Journal of Survey Statistics and Methodology 6: 23–45 https://doi.org/10.1093/jssam/smx018.Search in Google Scholar
WHO. 2013. Introduction to HIV/AIDS and Sexually Transmitted Infection Surveillance: Module 4: Introduction to Respondent-Driven Sampling.Search in Google Scholar
© 2021 Walter de Gruyter GmbH, Berlin/Boston