Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter January 20, 2021

Sampling from networks: respondent-driven sampling

Mamadou Yauck ORCID logo, Erica E.M. Moodie ORCID logo, Herak Apelian, Marc-Messier Peet, Gilles Lambert, Daniel Grace, Nathan J. Lachowsky, Trevor A. Hart and Joseph Cox
From the journal Epidemiologic Methods

Abstract

Objectives

Respondent-Driven Sampling (RDS) is a variant of link-tracing, a sampling technique for surveying hard-to-reach communities that takes advantage of community members' social networks to reach potential participants. While the RDS sampling mechanism and associated methods of adjusting for the sampling at the analysis stage are well-documented in the statistical sciences literature, methodological focus has largely been restricted to estimation of population means and proportions, while giving little to no consideration to the estimation of population network parameters. As a network-based sampling method, RDS is faced with the fundamental problem of sampling from population networks where features such as homophily (the tendency for individuals with similar traits to share social ties) and differential activity (the ratio of the average number of connections by attribute) are sensitive to the choice of a sampling method.

Methods

Many simple approaches exist to generate simulated RDS data, with specific levels of network features (mainly homophily and differential activity), where the focus is on estimating means and proportions (Gile 2011; Gile et al. 2015; Spiller et al. 2018). However, recent findings on the inconsistency of estimators of network features such as homophily in partially observed networks (Crawford et al. 2017; Shalizi and Rinaldo 2013) raise the question of whether those target features can be recovered using the observed RDS data alone – as recovering information about these features is critical if we wish to condition upon them. In this paper, we conduct a simulation study to assess the accuracy of existing RDS simulation methods, in terms of their abilities to generate RDS samples with the desired levels of two network parameters: homophily and differential activity.

Results

The results show that (1) homophily cannot be consistently estimated from simulated RDS samples and (2) differential activity estimators are more precise when groups, defined by traits, are equally active and equally represented in the population. We use this approach to mimic features of the Engage Study, an RDS sample of gay, bisexual and other men who have sex with men in Montréal, Canada.

Conclusions

In this paper, we highlight that it is possible, in some cases, to simulate population networks by mimicking the characteristics of real-world RDS data while retaining accuracy and precision for target network features in the samples.


Corresponding author: Mamadou Yauck, Department of Epidemiology, Biostatistics & Occupational Health, McGill University, Montréal, QC, Canada, E-mail:

Funding source: Natural Sciences and Engineering Research Council (NSERC) of Canada

Award Identifier / Grant number: RGPIN-2019-04230

Acknowledgment

The authors would like to thank the Engage study participants, office staff, and community engagement committee members, as well as our community partner agencies REZO, ACCM and Maison Plein Coeur. The authors also wish to acknowledge the support of David M. Moore, Nathan J. Lachowsky and Jody Jollimore and their contributions to the work presented here. Engage/Momentum II is funded by the Canadian Institutes for Health Research (CIHR, TE2-138299), the CIHR Canadian HIV/AIDS Trails Network (CTN300), the Canadian Foundation for AIDS Research (CANFAR, Engage), the Ontario HIV Treatment Network (OHTN, 1051), the Public Health Agency of Canada (Ref: 4500370314), Canadian Blood Services (MSM2017LP-OD), and the Ministère de la Santé et des Services sociaux (MSSS) du Québec. Erica E. M. Moodie acknowledges a chercheur boursier senior career award from the Fonds de recherche du Québec – Santé.

  1. Research funding: MY is funded by a Postdoctoral Fellowship jointly sponsored by the Statistical and Applied Mathematical Sciences Institute (SAMSI) and the Canadian Statistical Sciences Institute (CANSSI). The methodological developments in this manuscript are supported by a Discovery Grant to EEMM from the Canadian Natural Sciences and Research Council (NSERC), grant #RGPIN-2019-04230.

  2. Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

  3. Competing interests: Authors state no conflict of interest.

  4. Informed consent: Not applicable.

  5. Ethical approval: Not applicable.

Appendix

Table 4:

Characteristics of the Engage population network and RDS sample.

Parameter Estimated value 95% CI
Population network
Population size 40,400
Mean degree 16.63
Prevalence, %
 Condomless anal sex in the past six months 57.9 [52.7, 63.0]
 Currently in a relationship 43.9 [38.8, 49.0]
 HIV positive 12.7 [9.3, 16.0]
RDS sample
Number of seeds 27
Number of recruits
 0 651
 1 236
 2 117
 3 81
 4 49
 5 27
 6 18
Sample size 1,179

Table 5:

(Pearson) correlation matrix of three nodal covariates for the Engage RDS sample. Unweighted and weighted correlations are displayed, with weighted correlations in parenthesis.

1. CAS 2. CIR 3. HIV+
1. Condomless anal sex (CAS) 1 0.104*** (0.115***) 0.023 (0.018)
2. Currently in a relationship (CIR) 1 0.046 (0.002)
3. HIV positive (HIV+) 1

  1. ***p-Value <0.001.

References

Barbiero, A., and P. A. Ferrari. 2017. “An R Package for the Simulation of Correlated Discrete Variables.” Communications in Statistics – Simulation and Computation 46: 5123–40, https://doi.org/10.1080/03610918.2016.1146758.Search in Google Scholar

Biernacki, P., and D. Waldorf. 1981. “Snowball Sampling: Problem and Techniques of Chain Referral Sampling.” Sociological Methods & Research 10: 141–63, https://doi.org/10.1177/004912418101000205.Search in Google Scholar

Butts, C. 2008a. “Network: A Package for Managing Relational Data in R.” Journal of Statistical Software, Articles 24: 1–36, https://doi.org/10.18637/jss.v024.i02.Search in Google Scholar

Butts, C. 2008b. “Social Network Analysis with sna.” Journal of Statistical Software, Articles 24: 1–51, https://doi.org/10.18637/jss.v024.i06.Search in Google Scholar

Camirand, H., I. Traoré, and J. Baulne. 2016. L’Enquête québécoise sur la santé de la population, 2014-2015: pour en savoir plus sur la santé des Québécois.Search in Google Scholar

Costenbader, E., and T. W. Valente. 2003. “The Stability of Centrality Measures when Networks are Sampled.” Social Networks 25: 283–307, https://doi.org/10.1016/s0378-8733(03)00012-1.Search in Google Scholar

Crawford, F. W., P. M. Aronow, L. Zeng, and J. Li. 2017. “Identification of Homophily and Preferential Recruitment in Respondent-Driven Sampling.” American Journal of Epidemiology 187: 153–60, https://doi.org/10.1093/aje/kwx208.Search in Google Scholar

Durrett, R. 2006. Erdös–Rényi Random Graphs, 27–69, Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press.10.1017/CBO9780511546594.003Search in Google Scholar

Gile, K. J. 2011. “Improved Inference for Respondent-Driven Sampling Data with Application to HIV Prevalence Estimation.” Journal of the American Statistical Association 106: 135–46, https://doi.org/10.1198/jasa.2011.ap09475.Search in Google Scholar

Gile, K. J., and M. S. Handcock. 2010. “Respondent-driven Sampling: An Assessment of Current Methodology.” Sociological Methodology 40: 285–327, https://doi.org/10.1111/j.1467-9531.2010.01223.x.Search in Google Scholar

Gile, K. J., and M. S. Handcock. 2015. “Network Model-Assisted Inference from Respondent-Driven Sampling Data.” Journal of the Royal Statistical Society – Series A: Statistics in Society 178: 619–39, https://doi.org/10.1111/rssa.12091.Search in Google Scholar

Gile, K. J., L. G. Johnston, and M. J. Salganik. 2015. “Diagnostics for Respondent-Driven Sampling.” Journal of the Royal Statistical Society – Series A: Statistics in Society 178: 241–69, https://doi.org/10.1111/rssa.12059.Search in Google Scholar

Gile, K. J., I. S. Beaudry, M. S. Handcock, and M. Q. Ott. 2018. “Methods for Inference from Respondent-Driven Sampling Data.” Annual Review of Statistics and Its Application 5: 65–93, https://doi.org/10.1146/annurev-statistics-031017-100704.Search in Google Scholar

Goodman, L. A. 1961. “Snowball Sampling.” The Annals of Mathematical Statistics 32: 148–70, https://doi.org/10.1214/aoms/1177705148.Search in Google Scholar

Handcock, M. S., D. R. Hunter, C. T. Butts, S. M. Goodreau, and M. Morris. 2003. Statnet: Software Tools for the Statistical Modeling of Network Data. Also available at http://statnetproject.org.Search in Google Scholar

Harris, J. 2014. An Introduction to Exponential Random Graph Modeling, Quantitative Applications in the Social Sciences. United States: SAGE Publications.10.4135/9781452270135Search in Google Scholar

Heckathorn, D. D. 1997. “Respondent-driven Sampling: A New Approach to the Study of Hidden Populations.” Social Problems 44: 174–99, https://doi.org/10.1525/sp.1997.44.2.03x0221m.Search in Google Scholar

Heckathorn, D. D. 2002. “Respondent-driven Sampling II: Deriving Valid Population Estimates from Chain-Referral Samples of Hidden Populations.” Social Problems 49: 11–34, https://doi.org/10.1525/sp.2002.49.1.11.Search in Google Scholar

Hunter, D. R., M. S. Handcock, C. T. Butts, S. M. Goodreau, and M. Morris. 2008. “Ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks.” Journal of Statistical Software 24: 1–29, https://doi.org/10.18637/jss.v024.i03.Search in Google Scholar

Lambert, G., J. Cox, M. Messier-Peet, H. Apelian, and E. E. M. Moodie. 2019. Engage Montréal, Portrait de la santé sexuelle des hommes de la région métropolitaine de Montréal ayant des relations sexuelles avec des hommes, Cycle 2017-2018, Faits saillants. Canada: Direction régionale de santé publique du CIUSSS du Centre-Sud-de-l’Île-de-Montréal.Search in Google Scholar

Lin, S.-D., M.-Y. Yeh, and C.-T. Li. 2013. “Sampling and Summarization for Social Networks.” In 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)(tutorial). United States: Society for Industrial and Applied Mathematics.Search in Google Scholar

Newman, M. E. 2002. “Assortative Mixing in Networks.” Physical Review Letters 89: 208701, doi:https://doi.org/10.1103/physrevlett.89.208701.Search in Google Scholar

Salganik, M. J., and D. Heckathorn. 2004. “Sampling and Estimation in Hidden Populations Using Respondent-Driven Sampling.” Sociological Methodology 34: 193–240, https://doi.org/10.1111/j.0081-1750.2004.00152.x.Search in Google Scholar

Shalizi, C. R., and A. Rinaldo. 2013. “Consistency under Sampling of Exponential Random Graph Models.” Annals of Statistics 41: 508, https://doi.org/10.1214/12-aos1044.Search in Google Scholar

Spiller, M. W., K. J. Gile, M. S. Handcock, C. M. Mar, and C. Wejnert. 2018. “Evaluating Variance Estimators for Respondent-Driven Sampling.” Journal of Survey Statistics and Methodology 6: 23–45 https://doi.org/10.1093/jssam/smx018.Search in Google Scholar

WHO. 2013. Introduction to HIV/AIDS and Sexually Transmitted Infection Surveillance: Module 4: Introduction to Respondent-Driven Sampling.Search in Google Scholar

Received: 2020-08-20
Accepted: 2020-12-29
Published Online: 2021-01-20

© 2021 Walter de Gruyter GmbH, Berlin/Boston

Scroll Up Arrow