Humans are pattern-seeking animals are are thus prone to detect patterns where none exists. We are also very interested in categorizing things, presumably because it is easier to handle cognitively. Imagine the difficulty we would have if we had to mentally treat each leaf as a separate entity and could not consider them “just a bunch of leaves”! But there is a downside to this as well, because we can be mislead and neglect complicated patterns because our categories are easy and psychological influential. These issues and questions often appear in discussions about human genetic diversity. This is enhanced by the fact that complicated genetic and computational analyses feeds us with visually striking graphs that tickle our imagination, while we do not pay equal attention to the underlying methodology.
However, reality is more complicated. Genetic clusters overemphasize differences, largely ignore similarities and is confounded by low sampling density and geographic distance. Thus, a modern analysis of human genetic variation reveals that it is, with a few exceptions, mostly clinal in nature and that notions of discrete genetic races is not an accurate description.
It is often said that ethnicity is useful in medicine, but this is also more complicated due to confounders such as health disparities, bias, discrimination, healthcare seeking behavior and compliance, as well as socioeconomic status. It turns out that ethnic status is at best a crude proxy for the alleles of a person and sequencing individuals will be much more useful. Finally, a focus on racial medicine has led to misdiagnosis of some diseases, such as sickle-cell anemia, thalassemia and cystic fibrosis.
Recently, Steven Novella wrote a mostly balanced discussion of the issue of human genetic variation and its connections to genetic research and medicine. Novella gets most things right, but the discussion of genetic clustering and racial medicine lacks vital details that show that the evidence either is inconsistent with discrete genetic races or not as supportive as once thought. This led him astray to the faulty conclusion that modern science supports the existence of discrete genetic races.
Are there genetic differences between humans and populations can they sometimes be useful in medicine? Yes, but those genetic differences are not accurately described by discrete genetic races and there are medical downsides and limitations to racial medicine that we also need to consider.
How does genetic clustering work?
To understand why genetic clustering does not support the notion of discrete human races, we need to understand how genetic clustering works and how we can be led astray by not fully appreciating these details.
Genetic clustering works in two steps. The first step involves calculating pairwise distance measures between your samples. This tells you how different any two samples are from each other in your data set. The second step is called (hierarchical) clustering, whereby the samples are put into clusters based on the distance measures. There are other clustering methods besides hierarchical clustering, there are many different ways to calculate distance and there are many different ways to do hierarchical clustering.
In this case, the most important step to understand is the calculation of the distance measure. Two very common methods to do this Euclidean distance and Pearson correlation. To put it simply, you can think of two samples as two points in a graph and the distance measure as the distance between those points. They key take-home message is that this calculation strongly emphasizes differences to the expense of identities or similarities. Large differences are given high distance values, high similarities are given low distance values and identities are given a zero contribution to the genetic distance.
A primer on genetic clustering can be found in D’haeseleer (2005). Although it uses gene expression data as an example, the general principles are universal regardless of what data you are clustering.
Why genetic clustering does not support discrete genetic races
So when you are looking at genetic clusters, you are only looking at differences while similarities have been greatly downplayed and identities have been ignored. This is exactly what we want when we look at gene expression differences between samples in response to some environmental challenge or drug. But if we use it to make claims about the genetic structure of populations, there is a substantial bias for difference built into this method.
This is particularly dangerous when it comes to species where the genetic variation is very low. In humans, for instance, the genetic difference between individuals is very low. If we look at single nucleotide polymorphisms (SNPs), that is, single base variations, the average difference between two individuals is 0.1% (National Human Genome Research Institute, 2016). So on average, 999 out of 1000 SNPs are identical between individuals, and 1 out of 1000 SNPs differ. Genetic clustering would ignore 99.9% of the dataset and focus on the remaining 0.1%. This becomes even more pernicious when modern high-throughput that looks at ~650 000 SNPs and ~400 microsatellites has found that the vast majority of genetic variation (84.7%-95% depending on the study and genetic element) occurs between individuals and within continental groups (Li et al., 2008; Rosenberg et al., 2002).
The other key issue is that genetic clustering analyses on humans typically have a low sampling density and if we are not careful, we can confuse geographical distance for genetic differences. For instance, if we, for instance, sequence a dozen people from Sweden, a dozen people in Somalia and a dozen people in India and do a clustering analysis on that data, we will get clusters. But if we had also sequenced people living in-between these areas in high-resolution, we would see that human genetic variation is mostly continuous and clinal and the illusion of discrete genetic clusters disappear. Here is Serre and Pääbo (2004):
In the light of these results, and in agreement with extensive studies of classical genetic markers (Cavalli-Sforza et al. 1994), it seems that gradual variation and isolation by distance rather than major genetic discontinuities is typical of global human genetic diversity. Obviously, this does not imply that genetic discontinuities do not exist on a more local scale, for example, between people from different linguistic groups (e.g., Barbujani and Sokal 1990; Sokal et al. 1990). It also does not mean that no differences whatsoever exist between continental groups. In fact, what Rosenberg et al. (2002) have shown is that given enough markers and the extraordinary power of Structure, the tiny amounts of genetic differences that exist between continents can also be discerned. However, this should not obscure the fact that on a worldwide scale, clines are a better representation of the human diversity than clades, and that continents do not represent more substantial discontinuities in such clines than many other geographical and cultural barriers.
Thus, the evidence does not support the notion of discrete human races and we should think critically about the sampling strategy and computational tools that are used and crucially consider what kind of conclusions we can draw from what data. Humans are pattern-seeking animals, and it is easy to read genetic clusters as discrete genetic races, but the reality is very different.
Although there are some debate between researchers about the details (such as should this or that model for allele correlations be used), even the critics of Serre and Pääbo, namely Rosenberg and colleagues (2005), admit that genetic clusters do not correspond to biological races:
Our evidence for clustering should not be taken as evidence of our support of any particular concept of “biological race.” In general, representations of human genetic diversity are evaluated based on their ability to facilitate further research into such topics as human evolutionary history and the identification of medically important genotypes that vary in frequency across populations. Both clines and clusters are among the constructs that meet this standard of usefulness: for example, clines of allele frequency variation have proven important for inference about the genetic history of Europe, and clusters have been shown to be valuable for avoidance of the false positive associations that result from population structure in genetic association studies. The arguments about the existence or nonexistence of “biological races” in the absence of a specific context are largely orthogonal to the question of scientific utility, and they should not obscure the fact that, ultimately, the primary goals for studies of genetic variation in humans are to make inferences about human evolutionary history, human biology, and the genetic causes of disease.
It should be noted that proponents of the idea that genetic clusters demonstrate discrete genetic races typically cite the original Rosenberg et al. (2012) paper that Serre and Pääbo objected to. So even the researchers who did the research disagrees with the race interpretation.
Rosenberg and colleagues (2005) decides to take a more pragmatic approach the genetic clusters. They might not be objectively real and merely an artifact of geographical distance and the methodology of clustering, but could racial classifications still be useful in e. g. medicine? It turns out that this issue is much more complicated than it first seems.
But before we examine that issue, we will take a short detour into fish genetics and evolution.
Something smells fishy…
Novella writes that:
I agree with the premise [that most genetic variation occurs within continental groups] but not the conclusion [that discrete races is an inaccurate description of human genetic variation]. Genetically speaking, all vertebrates are fish. There is much greater genetic variation within the fish clade than there is between fish and other vertebrates. Land dwelling vertebrates represent a tiny twig on the vast fish genetic tree.
In the exact same way, there is much more genetic variation within Africans, then between Africans an all other human populations. This simply reflects the fact that humans lived in Africa for a long time, evolving extensive genetic diversity, and the population that migrated out of African represents a tiny twig on the African genetic tree. We are all Africans in the exact same way that we are all fish.
Here we should be very careful with phylogenetics. Fishes are not a clade, but a paraphyletic group that excludes a lot of descendants (amphibians, avian and non-avian reptiles, mammals etc.) This means that fishes are just a form group where different organisms have been lumped together because of similarities in appearance and way of life rather than evolutionary history. Thus, fishes has an artificially inflated genetic diversity. It is also a poor analogy because of the high genetic diversity in chordates (since it is a phylum) and the low genetic diversity in humans (because our species recently had a genetic bottleneck ~50k years ago). It should also be pointed out that there is probably more genetic difference within any continent than between continents, not just in Africa.
The core idea behind the fact that most genetic variation occurs within continental groups than between them is that if you want to accurately describe human genetic diversity, you would not typically emphasize the 5-15% of genetic diversity that occur between continental groups, while downplaying the other 85-95% that can be found elsewhere. You would rather say that human genetic variation is largely clinal.
Another common objection is that the ends of a gradient are different even if the changes when moving across the gradient is mostly continuous. While this is true, the rejection of discrete races does not assume that all humans or populations are genetically identical and disproving the latter does not disprove the former. The key idea is that human genetic variation is best represented by mostly clinal variation, not discrete races. There are certainly genetic differences between people, but those differences look nothing like most proponents of discrete races think they do.
The twilight of racial medicine
At this point, some people might concede that discrete genetic races is not an accurate view of human genetic variation, but still suggest that the idea has so much pragmatic merit in e. g. medicine that it can still be justified. After all, is it not the case that differences in mortality between whites and African-Americans account for 260 African-Americans dying prematurely every day on average? Are there not diseases that affect different continental groups? Do they not metabolize medication differently? Are there not gene variations with implications for medicine that are more common in some groups that others?
This turns out to be much more complicated than it first seems for a number of different factors.
First, a big chunk of observed differences can be attributed to disparities in health care, unconscious biases by providers, discrimination and differences in income, education and unemployment. Thus, when we compare raw data from individuals from different ethnic backgrounds, we are not comparing apples with apples, but apples and oranges. Research has found that socioeconomic status is a more important factor than ethnic group for variation in health and health care disparities and discrimination has been linked to health outcomes such as hypertension, all-cause mortality, incidence of asthma, poor mental health, inflammation, coronary artery calcification, obesity, cortisol dysregulation, poor sleep, smoking and other substance use etc. and also with less healthcare seeking and compliance behavior (Williams and Wyatt, 2015). This is, by the way, probably a good reason to stratify medical studies by ethnic background.
Second, some differences are a result of geography of pathogens, rather than continental group per se. While it is true that sickle-cell anemia is more prevalent in Africa, this is mostly the case for regions where malaria is prevalent. If we are not careful with geographical confounders and similar issues (most studies use self-identified ethnic group rather than genetic data), we might detect spurious associations (Weiss and Fullerton, 2005)
Third, differences in allele frequencies are modest most of the time and only a few of them have been found to be related to differential response to medical treatments. For instance, the difference in allele frequency for the nitric oxide synthase G89T4 SNP that is involved in arterial stiffness only differs by ~20 percentage points between African-Americans and whites (Chen, 2004). In this particular case, it means that a special medication for African-Americans will not benefit most of them, and will benefit a minority of whites too.
Fourth, since ethnic group is a very crude proxy of which allele you have, doctors might as well just sequence the version their patient has. This would be much more useful.
Fifth, a lot of the racial medicine discoveries have been either refuted or shown not to be as relevant as first thought. For instance, it was once thought that angiotensin-converting enzyme (ACE) inhibitors were less effective in African-Americans for blood pressure control, before a better study showed that it was effective regardless of ethnic group (Saunders and Gavin, 2003). The medication BiDil has been shown to have a better result for African-Americans with heart failure than a competitor, but the main study did not included people from other groups (Brody and Hunt, 2006), thus making the idea that it is a racial medicine doubtful.
Sixth, racial medicine can lead to misdiagnoses of diseases such as sickle-cell anemia, thalassemia and cystic fibrosis. The latter disease appears to be underdiagnosed in Africa because it is considered a disease that white people have (Yudell et al., 2016). Thus, there are not just pragmatic benefits with racial medicine for disease diagnosis, but also downsides.
In the end, it is not possible to show that ethnic background is completely irrelevant in medicine. But we should keep in mind that it is (1) confounded by socioeconomic status and health care disparities, biases and discrimination, (2) confounded by geographical and methodological considerations, (3) allele frequency differences are most often only modest, (4) self-identified ethnic group is a crude proxy for which allele you have, so if we focus on usefulness, we might as well just sequence the individual, (5) some of the icons of racial medicine have been refuted, (6) racial medicine also have downsides when diagnosing diseases that many people consider strongly associated with ethnicity.
Thus, the utility of ethnic background in medicine is probably non-zero, but it is much cruder than commonly believed and not necessarily connected or relevant to notions of discrete genetic races.
References and further reading
Brody, H., & Hunt, L. M. (2006). BiDil: Assessing a Race-Based Pharmaceutical. The Annals of Family Medicine, 4(6), 556-560.
Chen, W., Srinivasan, S. R., Bond, M. G., Tang, R., Urbina, E. M., Li, S., . . . Berenson, G. S. (2004). Nitric oxide synthase gene polymorphism (G894T) influences arterial stiffness in adults. Am J Hypertens, 17(7), 553-559.
D’Haeseleer, P. (2005). How does gene expression clustering work? Nat Biotech, 23(12), 1499-1501.
Li, J. Z., Absher, D. M., Tang, H., Southwick, A. M., Casto, A. M., Ramachandran, S., . . . Myers, R. M. (2008). Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation. Science, 319(5866), 1100-1104.
National Human Genome Research Institute. (2016). Frequently Asked Questions About Genetic and Genomic Science. Accessed: 2016-07-23.
Rosenberg, N. A., Pritchard, J. K., Weber, J. L., Cann, H. M., Kidd, K. K., Zhivotovsky, L. A., & Feldman, M. W. (2002). Genetic Structure of Human Populations. Science, 298(5602), 2381-2385.
Rosenberg, N. A., Mahajan, S., Ramachandran, S., Zhao, C., Pritchard, J. K., & Feldman, M. W. (2005). Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure. PLoS Genet, 1(6), e70. doi:10.1371/journal.pgen.0010070
Saunders E., Gavin J.R. (2003). Blockade of the renin-angiotensin system in African Americans with hypertension and cardiovascular disease. J Clin Hypertens. 5. 12-7.
Serre, D., & Pääbo, S. (2004). Evidence for Gradients of Human Genetic Diversity Within and Among Continents. Genome Research, 14(9), 1679-1685.
Weiss, K. M., & Fullerton, S. M. (2005). Racing around, getting nowhere. Evolutionary Anthropology: Issues, News, and Reviews, 14(5), 165-169.
Williams D.R., Wyatt R. (2015). Racial Bias in Health Care and Health: Challenges and Opportunities. JAMA. 314(6):555-556.
Yudell, M., Roberts, D., DeSalle, R., & Tishkoff, S. (2016). Taking race out of human genetics. Science, 351(6273), 564-565.