Thursday, May 12, 2016

Putting the Genes in Genealogy

Random sciency-looking picture of DNA


When I had my DNA tested by 23andMe, I was not surprised that my ancestry came back as 100% European. (Yup, I'm the whitest person you've ever met). But what do these DNA tests really tell us about ancestry, and how? DNA can't conclusively place you in any ancestral group, partly because of limitations of the tests themselves, and partly because of the reality of DNA differences between human groups.

I'm an excellent example of the limitations of DNA ancestry tests. The test showed I was European, because that is the ancestry of the vast majority of my ancestors, but my 11th great-grandmother was a Nipissing woman whose name has been lost to history. She married a French trader, and their daughter, Euphrosine Madeline Nicolet, was recorded in the Church records of Quebec.

But if I have any non-European ancestry, that would show up in my DNA test, wouldn't it? Not necessarily, for a variety of reasons:

1) Small amounts of DNA from one group are swamped by DNA from others. My 11th great-grandmother was the ultimate heir of millions of Indigenous ancestors who lived in what today we call North America. People have lived here at least 12,000 years, or around 600 generations. If we calculate how many ancestors one person would have over that time period, the answer is, frankly, a ridiculously large number that basically boils down to "everyone who reproduced in the region back then" (see my post We're All Kings Now). That's a lot of Native ancestors on my family tree, but I have an equally large number of European ancestors for each of my other 11th great-grandparents, all 8192 of them. Therefore, although this one 11th great-grandmother represents a large number of Native ancestors, she contributed only 0.012% of my DNA (if any, at that distance, it could all be lost to chance). Such a small amount is not going to show up on a DNA test, unless she was a direct maternal-line ancestor from whom I inherited mitochondrial DNA. (She wasn't. My mDNA is German.).

2) There are no genes for race or ethnicity, and even ancestry isn't that simple. For the sake of simplicity, DNA testing companies will tell you that your genes are "European" or "West African", but these labels are not as easy to interpret as it might seem. 

First, some vocabulary: race is a socially-defined category, a set of labels our culture has developed to categorize physical and ancestral variation in people, but it's not a reflection of biological reality. Racial categories, like "Black" or "White" are like our units of time. Time is real, and it really changes, but our units of "minutes" and "hours" are just abstract concepts, agreed-upon ways of measuring and categorizing the continuous sweep time. They don't exist in nature, and we could just as easily have decided to make 100 hours in a day, or 10 minutes in an hour. 

The same is true of race. Human variation exists, and people really are different from each other, but our racial categories are just arbitrary ways of breaking people into different groups. In the U.S. we use skin color and facial features as the major way of doing this, but in other cultures they use different criteria. There are no genetic divisions between the racial categories we've chosen. Therefore, "race" can't be studied genetically. (Note: this isn't the same as saying that race doesn't exist. It does, and it's immensely important for understanding social and power dynamics in the United States, it's just not biological.)

Similarly, ethnicity is difficult to study genetically. Ethnicity relates to the culture in which we were raised: our language, religion, customs, etc. Ethnicity is tied into our identity, but it may have little (or nothing) to do with genetics. For example, I was raised to think of myself as Irish in ethnic identity, but the truth is that I'm mostly German. A lot of German-Americans abandoned their ethnic identity during the first half of the 20th century, because of the world wars. My grandmother used to tell the story of her older sister, during WWI, telling her mother (Mabel Ruffertschofer) and her grandmother (Christiana Wiemer) how much she hated and feared the Germans.

So we're left with ancestry, pure and simple. Statistically speaking, how likely are each of our genes to come from any particular region of the world? And putting all our genes together, what does that tell us about where all of our different ancestors came from? The truth is, there's no such thing as a "European" or an "African" gene. Rather, certain genetic traits are more common in some populations than others, just like certain expressed traits. However, there are no hard and fast rules in biology that state a particular gene or trait can only be found in one general region or another.

Certain lineages (that is, all the people descended from a common ancestor) have unique markers. A new variation of a gene appears through mutation and can be passed on to the mutated person's children. If the gene is advantageous, or even neutral, then it might spread throughout the community and beyond. This will lead to a group of people, all descendants of that original mutant, who carry particular variants of genes that are absent or rare in other groups. For example, a population living high in the Andes may have a lot of people who carry a gene that helps them maintain higher blood oxygen levels. If you carry that same gene, it could suggest you were a descendant of that population. However - and this is an important 'however' - some people in that population may not carry that gene, and some people outside that population may have had genetic mutations that created the same gene variant. The presence or absence of that gene is not a 100% guarantee that you are (or are not) a descendant of that population. That is one reason why the ancestry data you are given from a DNA testing company is probabilistic. You are given confidence intervals that tell you, statistically speaking, how likely it is that a particular gene comes from a particular source.

In other words, there are no categorical genetic differences between different ethnic groups, certainly not groups as large and diverse as "European" or "West African". There can be categorical differences between lineages, however.

3) The genetic differences between ancestral groups are small. Much of our DNA is so similar to the DNA of all other humans that it has no discernible regional origin. Our DNA is 95-98.8% identical to a chimpanzee (the range is because there are multiple ways that variation in DNA can be measured)(1). That leaves only 5-1.2% of our DNA that is uniquely human. Within that human DNA, very little of the variation can be attributed to different ancestry on the large scale (that is, between continents or regions, rather than between families or individuals). Samples from around the world suggest only 5-10% of human genetic variation is related to differences between large ancestral groups, and much of that diversity is related to local environmental adaptations (such as the high-elevation community I mentioned earlier), rather than genes shared across a whole continent or region (2). It's no surprise, then, that my one Native American 11th great-grandmother's contribution to my DNA (if any) is not identified in my DNA test.

4) Not all distinct ancestries have genetic markers, or the DNA tests choose not to record those markers. DNA tests can tell you if you have Ashkenazi Jewish ancestry. That's because the relatively isolated Ashkenazi community has a number of distinctive genetic markers. But French-Canadians are also an isolated community (see my previous blog post: French Canadians: What We Mean by an Isolated Population), and they also have distinctive genetic markers, but 23andMe doesn't report on French-Canadian ancestry. Why not? Perhaps the French-Canadians have not been a unique sociocultural group long enough to develop distinctive genetic markers (although a study suggests otherwise [3]), or it may be that, from the perspective of the test makers, they are not perceived as being a separate ethnic group (see above for social vs. biological definitions of human communities). It may simply be that companies like 23andMe do not consider ethnic groups that developed after European colonization of North America to be the appropriate field of study. Regardless, it's important to recognize that not every ethnic group is represented in the ancestry reports you receive from a DNA testing company.

23andMe is actually rather vague about how they determine ancestry. I assume (because it's what I would do) that they are looking for snippets of non-coding DNA that are characteristic of certain lineages, and then telling you where those lineages are found. Their database of lineages, however, is far from complete. As more people take advantage of DNA testing, the accuracy with which we can determine ancestry will improve.

References:

1) Ebersberger, I., Metzler, D., Schwarz, C., Pääbo, S., 2002. Genomewide comparison of DNA sequences between humans and chimpanzees. American Journal of Human Genetics 70, 1490–1497.

2) Lewontin, R., 1972. The apportionment of human diversity. Evolutionary Biology 6: 381-398. and Hinds D.A., Stuve L.L., Nilsen G.B., Halperin E., Eskin E., Ballinger D.G., Frazer K.A., Cox D.R., 2005. Whole-genome patterns of common DNA variation in three human populations. Science 307, 1072–1079.

3) Casals F, Hodgkinson A, Hussin J, Idaghdour Y, Bruat V, de Maillard T, et al. (2013) Whole-Exome Sequencing Reveals a Rapid Change in the Frequency of Rare Functional Variants in a Founding Population of Humans. PLoS Genet 9(9): e1003815. doi:10.1371/journal.pgen.1003815

No comments: