What is Respondent Driven Sampling ?

Respondent-driven sampling (RDS), combines "snowball sampling" (getting individuals to refer those they know, these individuals in turn refer those they know and so on) with a mathematical model that weights the sample to compensate for the fact that the sample was collected in a non-random way.

RDS represents an advance in sampling methodology because it resolves what had previously been an intractable dilemma, a dilemma that is especially severe when sampling hard-to-reach groups, that is, groups that are small relative to the general population, and for which no exhaustive list of population members is available. This includes groups relevant to public health, such as drug injectors, prostitutes, and gay men, groups relevant to public policy such as street youth and the homeless, and groups relevant to arts and culture such as jazz musicians and other performance and expressive artists.

The dilemma is that if a study focuses only on the most accessible part of the target population, standard probability sampling methods can be used but coverage of the target population is limited. For example, drug injectors can be sampled from needle exchanges and from the streets on which drugs are sold, but this approach misses many women, youth, and those who only recently started injecting. Therefore, a statistically representative sample is drawn of an unrepresentative part of the target population, so conclusions cannot be validly made about the entirety of the target population.

This same fundamental problem faced pollsters during the recent presidential election. Phone based polls were not able to access voters who had abandoned land-based phones in favor of cell and Internet phones or voters who merely refused to be interviewed. What little was known about these inaccessible voters showed that they were not the same as other voters, for example, they tended to be younger. However, whether they differed in political attitudes was not known, so pollsters had no way of knowing how to adjust their estimates to compensate for those they had missed. Similarly, public health researchers have had no reliable way to determine how those they could access through location-based sampling differed from those who were inaccessible.

The other horn of the dilemma arises if priority is placed on coverage rather than statistical validity. Network-based methods can provide comprehensive coverage of the target population. They start with a set of initial respondents, who refer their peers; these in turn refer their peers, and so on, as the sample expands from wave to wave. Based on the principle of “six degrees of separation,“ this approach could potentially reach any member of a population in only six waves, so total coverage is possible, at least theoretically. However, this approach is prey to a host of biases. For example, most people recruit those whom they resemble in race, ethnicity, education, income, and religion. Well-connected individuals tend to be over-sampled because many recruitment paths lead to them, so the peer recruitment upon which network-based sampling is based is anything but random.

Due to this dilemma, researchers had to choose between a statistically valid sample of the most accessible part of the target population, and a statistically invalid sample of broader coverage. This dilemma was widely assumed to be impossible to resolve. Standard statistical sampling texts describe network-based sampling methods, also known as chain-referral and snowball sampling, as afflicted by biases of unknown size and unknown direction, so any inferences made based upon data from such a sample would be nothing more than mere "subjective evaluation."

The situation changed with the advent of respondent-driven sampling (RDS), a sampling method that overcomes this dilemma by showing that the breadth of coverage of network-based methods can be combined with the statistical validity of standard probability sampling methods. This makes it possible for the first time to draw statistically valid samples of previously unreachable groups. In essence, respondents recruit their peers, as in network-based samples, and researchers keep track of who recruited whom and their numbers of social contacts. A mathematical model of the recruitment process then weights the sample to compensate for non-random recruitment patterns. This model is based on a synthesis and extension of two areas of mathematics, Markov chain theory and biased network theory, which were not a part of the standard tool kit of mathematical sampling theory. The resulting statistical theory, termed RDS, enables researchers to provide both unbiased population estimates and measures of the precision of those estimates. This extends the realm within which statistically valid samples can be drawn, to include many groups of importance to public health, public policy, and arts and culture.

RDS was developed by Douglas Heckathorn less than a decade ago, in 1997, as part of a National Institute on Drug Abuse-funded HIV-prevention research project targeting drug injectors in several Connecticut cities. RDS served as the recruitment mechanism for an intervention design developed with Robert Broadhead termed "peer-driven intervention" (PDI).

RDS was further elaborated, in 2002 as part of a CDC-funded project focusing on young injectors to include means for calculating confidence intervals and weighting the sample to control for differences in network size and clustering across groups.

An article appearing in the journal, Sociological Methodology, offers a further important advance in the sampling method. It shows, using both analytic methods and simulations, that when applied in a way that fits the statistical theory on which RDS is based, it produces estimates that are "asymptotically unbiased." This means that bias is only on the order of one divided by the sample size, so the sampling method is unbiased for samples of meaningful size. It also improves the means for controlling for the effects of differences in network sizes. This article was co-authored by Heckathorn with Matthew Salganik when he was a Cornell sociology graduate student in the university's National Science Foundation-funded IGERT (Integrative Graduate Education and Research Trainee) program. Salganik is now in the Columbia University sociology department.

RDS has been applied to study a variety of populations. In collaboration with Joan Jeffri, Director of Columbia University's Center for the Study of Arts and Culture it was employed to study of jazz musicians in New York City, San Francisco, New Orleans and Detroit. Funded by the National Endowment for the Arts, this was the first application of this method to the study of arts and culture. Heckathorn and Jeffri are now conducting studies on both aging artists in the New York City metropolitan area and the national network of professional and semi-professional storytellers.

RDS has also been used by the CDC’s Global AIDS Program to study injection drug users (IDUs) in Bangkok and IDUs and prostitutes in Vietnam, and it has been used by Family Health International, the largest non-profit agency in international public health, in more than a dozen countries, including Bangladesh, Burma, Cambodia, Egypt, Honduras, India, Kosovo, Mexico, Nepal, Vietnam, Pakistan, Papua New Guinea and Russia to study gay men, IDUs and prostitutes. Consequently, though less than a decade old, it has been more than fifteen countries.