## APPENDIX A

# Measuring Deaths in a Population: The Cluster Survey Method

Measuring population characteristics is almost always based on sampling. This could be about deaths, births, or disease present, but it just as well could be voting intentions, number of pets in the household or most-watched television programs.

The basis of sampling is that a characteristic of the whole can be determined from parts selected at random from that whole. Various mathematical formulae allow calculation of how many parts are required to be assured that the result is indeed representative of the whole. It is usual for a survey to be designed with a 95% confidence that the result is within 80% (or some number that we chose for other reasons) of the true value for the whole (precision). Using such formulae, we can determine how many persons we would have to survey to satisfy the criteria set for the confidence and precision of the results. This is called the *sample size*. The more confidence we desire for our results, and the more precision we want, the larger the sample must be. Sometimes this involves just more people and more training, but as in the case of the Iraq study, it involves increasing the exposure to danger. Survey takers have been killed in Iraq, but fortunately none from our teams.

Once we decide on the sample size, we need to determine how to find those numbers of the parts that make up the whole. For people, we speak of the whole as the *population*. These parts should be found by *random* (not haphazard) methods. The simplest way is to make a list of everyone in the population (the whole), and randomly choose the number needed from this list. This is known as simple random sampling.

Cluster sampling, which involves the random selection of clusters of people (or households) instead of individual people, is a valid alternative to simple random sampling. In conflicts and in many developing countries, the listing of all persons or households in an area to be sampled is seldom available. An alternative is the cluster survey, which selects a certain number of clusters (usually not less than 30) from the area to be surveyed. In the area of interest, each of the towns or administrative units is listed by the best estimate of population sizes, and a running total of these populations is made.

The total population is then divided by the number of clusters to give the *sampling interval*. If we were to visit 30 clusters in a county with a population of 120,000, our sampling interval would be 120,000 ÷ 30 = 4,000. This means every 4,000th person in the county would live in one of the clusters we will want to visit.We then list all the towns in the county by their population (in any order). Our first cluster is the town where person 4,000 lives, our second cluster is in the town where person 8,000 lives, the third cluster is where person 12,000 lives, and so on until we have our 30 clusters.We really don’t need to know who person 4,000 is, just the town where he or she lives. If we listed the towns alphabetically, then we would know automatically which town would be the first cluster chosen, and it would not be random. However, if we pick our first cluster with a random number, then it could be several places. If the random number is too big (bigger than 4,000), then we will not get 30 clusters for the country. So the rule is that the random number is chosen for the first cluster, which is less than the sampling interval—in this example, 4,000.

As the towns or administrative units are listed by populations, bigger towns are likely to be selected for more clusters. This is a basic sampling principle: the chance of being selected is equal for everyone, whether you live in a big city or a small town. In this way, all people and all households have an equal chance of being included in a survey.

Once the cluster is selected, additional sampling stages are required to locate neighborhoods and eventually a single house where to start. For each of these selection stages, a random process is used so there will be no bias to select one location over another. Once the “start house” or location is selected, then the survey team moves to the next nearest (or sometimes the second or third nearest) house until the specified number of houses are selected (often from 10-50) to be interviewed in that cluster. The same is done for the other clusters.

A problem with cluster surveys is that households adjacent to each other are more likely to be similar than those located farther away. In the case of localized violent events, the same event is likely to affect households close together. This makes simple random sampling a stronger survey method where this is possible. But in war this is seldom possible.

To compensate for this “clustering effect” (sometimes called the *design effect*), the number of households or persons in a cluster sample is increased over that of a simple random sample in order to provide adequate precision. As one does not know the extent of “clustering” before the survey is started, it is usually estimated at two, meaning that a cluster survey would need twice the number of households as a simple random survey in order to have equal statistical power.

Afterwards, the clustering or design effect can be calculated from the results to see if the estimate of 2.0 was indeed correct. In the 2006 Iraq mortality study, during its analysis this effect was found to be only 1.6—that is, the number of households in the cluster sample needed to be 1.6 times the size of a completely random household sample in order to have the same statistical power, or in terms of confidence intervals, to give an equally precise result. This standard was achieved, because an effect of 2.0 had been allowed for in the design. In other words, the final number of households surveyed— 1,849—was greater than what was needed for such precision.

Cluster sampling is the method that gives us much of our information about health of populations in developing countries. It has been accepted as an effective tool for measuring deaths in previous conflict situations such as in the Democratic Republic of Congo,^{1} during post Gulf War sanctions,^{2} in Kosovo,^{3} in Darfur,^{4} and in Angola.^{5} The results of these studies were widely used to establish policy by governments and the United Nations. The US Government, the Canadian Government, UN agencies (especially UNICEF) and many other organizations have supported development of these methods both in peacetime and during conflict.

Validation of cluster sampling methodologies as an appropriate alternative to simple random sampling is difficult in conflict situations. However, there have been multiple initiatives to validate this method in measuring public health outcomes such as mortality, demographics, and nutritional and disease status in stable circumstances. The Standardized Monitoring and Assessment of Relief and Transitions Initiative (SMART), a collaborative network through USAID seeking to standardize and evaluate methodologies among humanitarian organizations, has established cluster sampling as an acceptable method of sampling in conflict.^{6} In stable situations, the USAID-supported Demographic and Health Surveys (DHS), which frequently use cluster sampling in stable countries to measure death rates, have obtained results that are almost identical to data measured through a national census.^{7} The data derived from cluster sampling from DHS have been used to inform many health policy decisions by donor countries, and is one of the United States’ major contributions to public health knowledge.