The problem with sub-Saharan Africa and DNA analysis tools

This is the first post in a series that covers issues I’ve experienced with reporting of sub-Saharan African results in DNA analysis. This series of posts will have a particular emphasis on DNA testing for African Americans. Over the next series of posts, I’ll be looking at the strengths and weaknesses of DNA admixture analysis tools – with tips for things to look out for.

I recently had the opportunity to upload my Ancestry.com DNA results to Gedmatch.com. And what a revelatory experience Gedmatch.com has been. To be honest, this DNA analysis service is proving fascinaing. There is just so much to explore and comprehend. I have been doing a LOT of research in order to get my head around all of the information Gedmatch has provided.

My experience with Gedmatch has better enabled me to finely tune a quibble I’ve had with my Ancestry.com results. Don’t get me wrong, Ancestry’s DNA test has done exactly what I wanted it to – put me in touch with distant (and not so distant) relations from my various family lines. It’s allowed me to find my 4x great Sheffey grandfather. And it put me on the right track towards identifying my 4 x Roane great-grandfather.

My niggle with Ancestry’s results has to do with my admixtures and the countries it genetically tied me to. These results were always going to be general in nature. Ancestry.com states as much. The quibble I had has to do with Africa. And my recent experience with Gedmatch has allowed me to better understand the nature of my quibble.

DNA test results are based on data sets. These data sets are compiled by DNA test result databases. A database can only be as precise as the data that’s put into it. In this case, precision DNA results rely on large numbers of a population 1) having a DNA test and 2) those results being added to a data set which is imported into a database. For instance, a data set with 200,000 DNA results from the Baltic region of Eastern Europe will provide more precise insights than a data set of 50,000 individuals from the same region. It also depends on how each individual is classified and sub-classified (i.e. Bulgarian, Caucasian Bulgarian, Central Asian Bulgarian, Altaic Bulgarian, etc).

This brings me to my quibble about Africa. The way African DNA test results are classified, you would thing Africa was one large country populated by a homogenous people. This simply is not the case. The continental African population is arguably one of the most heterogenous populations. The admixture analysis tools and reports I’ve used on Ancestry.com and Gedmatch simply don’t reflect this diversity of African peoples.

For instance, I know that the central African pygmy populations have contributed roughly 2% to my genetic makeup. This comes from my mother’s mtDNA as well as through my paternal grandmother’s DNA as evidenced by my Genebase Y-DNA and mtDNA tests as well as my father’s mtDNA test.

Now where things get tricky is what’s classed as ‘Sub-Saharan Africa.

image of the map of African
Ancestry.com, along with a number of Gedmatch’s DNA analysis tools, takes the literal approach: all countries below the Sahara desert. Genebase, on the other hand, does not. Genebase, for instance, has categorized the territory from Western Sahara to Niger and south to Nigeria as Northwestern Africa. On its service you will also find North Central Africa, West Africa, Eastern Africa, Central Africa and so on and so forth. These sub-classifications of Sub-Saharan regions (and its peoples) allows for far more accurate interpretation for DNA analysis purposes. It’s also much more meaningful.

Based on this classification, my 18% African result is primarily spread across: Northwest (4%), Western (2%), Northern (5%), North Central (3%) and Eastern (4%) Africa. This is more meaningful that either a report that simply says 18% African or 12% sub-Saharan African, specifically.

For someone who is developing a travel-adventure series based on his DNA results, I’m a stickler for DNA reporting accuracy.

Gedcom & the MDLP DNA analysis tool

So first up is the MDLP DNA analysis tool which can be found on Gedmatch.

MDLP is a bio-geographical analysis project for the territories of the former Grand Duchy of Lithuania. Lithuania should have been my first clue. It was only after I saw the first set of results that I discovered that MDLP was designed for individuals with European and some Eurasian ancestry (mostly Finno-Uralic and Altaic). This tool is not recommended for inferring African-American, East-Asian etc. ancestry.

You’ll see why this tool wouldn’t be particularly useful to peoples of a largely African or East Asian ancestry:

MDLP World-22 Admixture Proportions

MDLP-World-22-results

Population  
Pygmy 2.63%
West-Asian 3.99%
North-European-Mesolithic 0.53%
Indo-Tibetan
Mesoamerican
Arctic-Amerind
South-America_Amerind 0.09%
Indian 1.86%
North-Siberean 0.31%
Atlantic_Mediterranean_Neolithic 13.71%
Samoedic
Indo-Iranian 1.61%
East-Siberean
North-East-European 12.89%
South-African 0.78%
North-Amerind 1.38%
Sub-Saharian 54.86%
East-South-Asian
Near_East 5.30%
Melanesian 0.08%
Paleo-Siberian
Austronesian

The sub-Saharan results were all out of proportion to what I already knew. Which made me go back to do some more research on this particular analysis. That’s when I found it was created to actually analyze European and Eurasian admixtures. Basically, this tool takes quite a literal and generous view of what’s meant by sub-Saharan.

However, where this tool has been interesting, for me, is in analyzing exactly what it was meant to – my European and Eurasian admixtures.

Variations of this test can be found below. Each has a different emphasis. I’m still researching what the emphasis of each actually is. There isn’t much information available. My DNA contact is off doing his research about this series of tools. The basic clue is in the name: “proportions”. However, I’m in the dark about what’s being proportionally measured – or why results for each geographical region can differ so staggeringly from one sub-test to another

If anyone out there actually understands what aspects of a person’s admixtures these analysis, feel free to post in the comment section below.

MDLP World Admixture Proportions

MDLP-World-Admixture

Population
Caucaus_Parsia 5.26%
Middle_East 5.45%
Indian 2.04%
South_and_West_European 17.20%
Melanesian 0.07%
Sub_Saharian 49.22%
North_and_East_European 11.00%
Arctic_Amerind 0.74%
East_Asian
Paleo_African 8.48%
Mesoamerican 0.56%
North_Asian

 

MDLP K=5 Admixture Proportions

MDLP-K=5-Admixture

Population
East-Eurasian 24.68%
West_Eurasian 4.08%
Caucasian 32.99%
South-Asian 12.02%
Paleo_Mediterranean 26.24%

 

MDLP K=6 Admixture Proportions

MDLP-K=6-Admixture

Population
South_Asian 11.92%
Caucasian 32.59%
North_West_Eurasian 4.29%
West_Eurasian 1.85%
Paleo_Mediterranean 26.01%
East_Euroasian 23.34%

 

MDLP K=7 Admixture Proportions

MDLP-K=7-Admixture

Population
Volga_Uralic 3.78%
Paleo_Mediterranean 25.80%
Altaic_Turkic 22.87%
South_Central_Asian 11.78%
Caucasian 32.27%
Paleo_Scandinavian 1.97%
West_Eurasian 1.54%

 

 MDLP K=8 Admixture Proportions

MDLP-K=8-Admixture

Population
Altaic_Turkic 22.81%
Paleo_Scandinavian 1.38%
South_Central_Asian 11.65%
East_European
West_European 10.73%
Caucasian 25.41%
Paleo_Mediterranean 24.75%
Volga_Finnic 3.27%

My question with the above results is: Where has the Eastern European from the other results gone? It disappears from this point onwards.

MDLP K=9 Admixture Proportions

MDLP-K=9-Admixture-Proportions

Population
Paleo_Balkanic 0.39%
Caucasian 25.06%
East_European
Volga_Finnic 3.32%
South_Central_Asian 11.62%
Paleo_Mediterranean 25.54%
Altaic_Turkic 22.72%
West_European 9.97%
Paleo_Scandinavian 1.38%

 

MDLP K=10 Admixture Proportions

MDLP-K=10-Admixture-Proportions

Population
Altaic_Turkic 22.62%
South_Central_Asian 11.56%
Paleo_North_European 1.28%
Paleo_Mediterranean 25.44%
Iberian 5.23%
Caucasian 23.00%
East_European
Paleo_Balkanic 0.40%
British 7.42%
Volga_Finnic 3.05%

 

MDLP K=11 Admixture Proportions

MDLP-K=11-Admixture-Proportions

Population
Paleo_Balkanic 0.39%
Celto_Germanic 7.37%
Caucasian 22.80%
Volga_Uralic 1.22%
Iberian 5.04%
Altaic_Turkic 22.56%
Paleo_North_European 1.27%
South_Central_Asian 11.47%
East_European
Uralic_Permic 2.55%
Mediterranean 25.34%

 

 MDLP K=12 Admixture Proportions

MDLP-K=12-Admixture-Proportions

Population
East_European
Paleo_Mediterranean 25.19%
Iberian 5.08%
Caucasian 22.52%
Uralic_Permic 2.63%
Balto_Finnic 1.21%
Paleo_Balkanic 0.37%
Celto_Germanic 7.23%
Paleo_North_European 0.25%
South_Central_Asian 11.48%
Volga_Uralic 1.27%
Altaic_Turkic 22.77%

So, while not particularly insightful for my African DNA associations, it has been very insightful for others. The Paleo Mediterranean results are largely in line with my Genebase results and incorporate my results associated with Sicily, Smyrna (Greece), and what we would think of as the Phoenicians (Malta, Cyprus and present day Lebanon).

The other Paleo findings are new. So I’m definitely looking to finding out more about them.

I remain absolutely fascinated by my Altaic and Caucasus results…a probable legacy from the ancient Silk Road trade route.

If you’re African American and your Ancestry.com or 23andme results are showing European and/or Eurasian results, this DNA analysis tool is worth investigating.

Advertisements

3 Comments

Filed under AfAm Genealogy, Genetics

3 responses to “The problem with sub-Saharan Africa and DNA analysis tools

  1. Pingback: Mixed Race Studies » Scholarly Perspectives on Mixed-Race » The problem with sub-Saharan Africa and DNA analysis tools

  2. Who wrote this article? I see no author.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s