Royal Society Publishing

A private allele ubiquitous in the Americas

K.B Schroeder , T.G Schurr , J.C Long , N.A Rosenberg , M.H Crawford , L.A Tarskaia , L.P Osipova , S.I Zhadanov , D.G Smith


The three-wave migration hypothesis of Greenberg et al. has permeated the genetic literature on the peopling of the Americas. Greenberg et al. proposed that Na-Dene, Aleut-Eskimo and Amerind are language phyla which represent separate migrations from Asia to the Americas. We show that a unique allele at autosomal microsatellite locus D9S1120 is present in all sampled North and South American populations, including the Na-Dene and Aleut-Eskimo, and in related Western Beringian groups, at an average frequency of 31.7%. This allele was not observed in any sampled putative Asian source populations or in other worldwide populations. Neither selection nor admixture explains the distribution of this regionally specific marker. The simplest explanation for the ubiquity of this allele across the Americas is that the same founding population contributed a large fraction of ancestry to all modern Native American populations.


1. Introduction

There has been extensive debate over the number of migrations into the Americas. Greenberg et al. (1986) hypothesized that Amerind, Na-Dene and Aleut-Eskimo are language phyla which represent three migrations from Asia, occurring in that sequence. This hypothesis stimulated a multitude of genetic investigations into the number and timing of migrations (reviewed in Schurr 2004).

Still, genetic studies have not produced a consensus on the number of migrations into the Americas; we suggest this is because the number of migrations cannot be inferred from genetic data. Migrations may have occurred that have not significantly influenced the current distribution of genetic variation. Distinct migrations from the same source population may produce patterns of variation similar to that produced from a single migration.

Although the number of migrations might not be inferable from genetic data, whether all Native American populations descend from the same founding population can be addressed if a unique autosomal variant absent from Asian populations is identified throughout the Americas. In their analysis of the HGDP–CEPH human genome diversity panel (henceforth HGDP) genotypes for 377 microsatellites, Zhivotovsky et al. (2003) noted that only in a single instance could a regional group could be distinguished by a private marker. A 275 bp allele at D9S1120 (also known as GATA81C04 or GATA11E11) was observed at high frequencies in all American populations (all of which are Amerind: Pima; Maya; Colombian; Karitiana; and Surui) and was absent from 47 other worldwide populations. This allele had a frequency of 36.5% in the pooled American sample, while no other allele among the 4688 studied was private to a major geographical region (defined as sub-Saharan Africa, Europe and the part of Asia south and west of the Himalayas (including North Africa), East Asia, Oceania and the Americas) with a frequency above 13%. Expansion of the dataset to 783 loci and 9346 alleles (Rosenberg et al. 2005) did not reveal any additional regionally private allele with a frequency above 13% (figure 1, inset).

Figure 1

Distribution of frequencies of private alleles (with frequency of 2% or above) in the HGDP among 9346 alleles at 783 microsatellites studied by Rosenberg et al. (2005). Inset. Frequency distribution of 9RA (represented by red-shaded area) at D9S1120 by population in Asia and the Americas. Numbers next to pie charts reference table 1.

View this table:
Table 1

Frequency of each allele at D9S1120 in all sampled populations. (Three alleles of non-standard fragment size, each observed once in the HGDP (Han, Burusho and Biaka Pygmy), are not represented.)

The 275 bp allele was the smallest one observed at D9S1120 in the HGDP. We have determined through sequencing that this allele contains nine tetranucleotide repeats and is the result of slippage in the repetitive section, as opposed to a deletion elsewhere in the amplicon. Henceforth, we shall refer to alleles at this locus by the corresponding number of repeats and to the 9-repeat allele as ‘9RA’. The next largest allele in the HGDP outside the Americas is 11 repeats and was observed only in three chromosomes. The lack of regionally specific private alleles at a high frequency (figure 1, inset), the striking distribution of 9RA and the rarity of intermediate-sized alleles (table 1) strongly suggests that all or nearly all copies of 9RA descend from a single mutational event.

We hypothesized that if the Aleut-Eskimo, Na-Dene and other indigenous populations throughout the Americas share common ancestry with the American populations in the HGDP, then we would observe 9RA across the Americas. Additionally, if further sampling did not reveal 9RA in putative Asian source populations, then we could conclude that modern Native American populations share more recent common ancestry with each other than with any Asian population.

2. Material and methods

(a) Populations sampled

We sampled two Aleut-Eskimo, two Na-Dene and nine North American Amerind populations (tables 1 and 2) for 9RA. We use the grouping ‘Amerind’ so that our results may be interpreted within the framework of the tripartite migration hypothesis of Greenberg et al. (1986), but note that many historical linguists do not accept Amerind (see Greenberg 1987 and Campbell 1997 for opposing views).

View this table:
Table 2

Average frequency of 9RA in linguistic and geographical groups.

Populations in the Altaian region of east central Asia are among those thought to be most closely related to modern Native Americans on the basis of Y-chromosome and mtDNA evidence, yet some East Siberian populations also share markers with modern Americans (reviewed in Schurr 2004). Thus, we sampled seven populations in the Altaian region of east central Asia and in East Siberia (table 1).

See the electronic supplementary material for further information on the populations sampled and genotyping of 9RA.

(b) Selection

If the distribution of 9RA across the Americas has significantly been influenced by balancing or positive selection at a linked locus, we would expect D9S1120 to be an outlier when compared with empirical distributions of heterozygosity and FST for neutrally evolving loci. We assume that the majority of the 783 microsatellites for which the American HGDP have been genotyped (Rosenberg et al. 2005) to be selectively neutral. We excluded the Surui from this dataset as an extreme outlier (Zhivotovsky et al. 2003). For each locus, we estimated FST using Embedded Image (Weir & Cockerham 1984) and calculated expected heterozygosity (pooling samples), as given by Weir (1996).

Under mutation–drift equilibrium, a positive correlation is expected between the mean heterozygosity and the number of alleles. For this dataset, mean heterozygosity is not significantly different for loci with eight alleles (which includes D9S1120), compared with nine (see electronic supplementary material). Thus, we created empirical distributions of FST and heterozygosity at neutral loci using 116 microsatellites with eight or nine alleles.

(c) Admixture

Supposing that the Aleut-Eskimo or the Na-Dene descend from different founding populations in which 9RA was not present, we calculated the amount of Amerind admixture required to bring 9RA to the frequencies observed in the Aleut-Eskimo or Na-Dene using Bernstein's (1931) formula Embedded Image, where ph is the observed average frequency of 9RA in the putatively admixed Aleut-Eskimo or Na-Dene; p1 is the observed average frequency of 9RA in the Amerind; and p2 is the frequency of 9RA in the Aleut-Eskimo or the Na-Dene prior to admixture, which is zero.

3. Results

9RA was present in all sampled North and South American populations at an average frequency of 32.9% (figure 1; table 1). It was also observed in the Koryaks and the Chukchi of Western Beringia. 9RA was not observed in putative Asian ancestral populations. The populations most divergent in their frequency of 9RA, the Seri (10.0%) and the Surui (97.1%), have been identified as potential isolates on the basis of linguistic (Campbell 1997) and genetic (Zhivotovsky et al. 2003) data, respectively, and thus the divergent frequencies in these populations probably reflect genetic drift.

The frequencies of the other alleles at D9S1120 in the populations in which 9RA was observed are not outliers. Aside from 9RA, the three most common alleles in the Americas and Western Beringia, the 15-, 16- and 17-repeat alleles, are also those most common in the rest of the world. Relative to populations without 9RA, the average frequencies of these common alleles in the Americas and Western Beringia are reduced by about 30% (between 28.8% and 37.1%).

Figure 2 shows FST plotted against expected heterozygosity for 116 microsatellites in four American HGDP populations. D9S1120, located at (0.793, 0.042), is not an outlier with respect to the bivariate distribution of FST and heterozygosity. D9S1120 falls between the 0.2 and 0.8 quantiles of both distributions.

Figure 2

Plot of FST against heterozygosity for 116 microsatellites in four American HGDP populations. Red bars show the 0.05 and 0.95 quantiles, anchored at the medians. D9S1120 is represented by the red diamond located at (0.793, 0.042).

Under the assumptions of the admixture model, if the Aleut-Eskimo or Na-Dene had acquired 9RA through admixture with the Amerind, the proportions of these populations derived from the Amerind would be 91.9 and 92.8%, respectively. These results are consistent with the similarity of frequencies of 9RA in the Aleut-Eskimo, Na-Dene and Amerind (table 2).

4. Discussion

Irrespective of the evolutionary history of other unlinked loci, the remarkable distribution of 9RA severely constrains the possible evolutionary histories of modern Native American populations. The simplest explanation for the homogeneous frequency of 9RA across the Americas is that the Americas were settled by a single founding population in which 9RA was present and from which all modern Native American populations descend. While homogenization can occur through selection or gene flow, we show it is unlikely either of these processes is solely responsible for the distribution of 9RA.

Were selection, rather than inheritance from a common founding population, responsible for the observed distribution of 9RA, nearly identical selection pressures would be required from the Arctic to the Amazon. Humans in the Americas have coped with remarkable geographical and temporal variations in ecology and hence they have probably been subject to variable selection pressures. In addition, data from four American HGDP populations show that, compared with 115 other microsatellites, D9S1120 is not unusual in FST or heterozygosity. Thus, there is no compelling evidence to suggest that the allele frequency distribution at D9S1120 in the Americas has strongly been influenced by balancing or positive selection.

Despite the simplicity of the admixture model we used, it demonstrates that the frequency of 9RA is too high in the Aleut-Eskimo and Na-Dene to be explained solely by moderate gene flow from the Amerind. If the Aleut-Eskimo or Na-Dene had acquired 9RA through admixture rather than inheritance from a common founding population, then it is probable that their respective gene pools would have been almost completely replaced and that their prehistory cannot be recovered with genetic data. This degree of admixture does not correspond to the traditional concept of multiple founding populations resulting in biologically distinct population expansions.

Hence, the distribution of 9RA is most consistent with the hypothesis that all modern Native Americans descend from a common founding population. The data are not, however, informative about the number of source populations that contributed to this founding population and do not exclude the possibility of small genetic contributions from other populations.

There are a number of possible explanations for the apparent absence of 9RA in putative Asian source populations. 9RA could have (i) arisen in the founding American population, (ii) been sampled from a putative Asian source population, in which it was subsequently lost by genetic drift, and (iii) been segregating in an Asian source population at the time of migration, but that source population has not been identified because it either went extinct or has not been included in modern-day samples. If the allele were segregating in an Asian population, it is improbable that all the copies of 9RA in the Americas descend from more than one ancient sampling event from that population. It is unlikely that an allele at a frequency sufficiently low to destine it for extinction, or an allele the sole source of which is a small, geographically restricted population, would have been included in multiple migratory groups and maintained multiple times.

The presence of 9RA in the Koryaks and Chukchi is consistent with other genetic evidence of shared ancestry between Western Beringians and Native Americans (e.g. Karafet et al. 1997; Lell et al. 1997; Schurr et al. 1999). The observed geographical distribution of 9RA is quite similar to that of two other alleles that descend from unique mutational events, the 16111T and DYS199T transitions which define Native American mtDNA lineage A2 and Y-chromosome lineage Q-M3 (Underhill et al. 1996), respectively. Hence, three independent lines of genetic evidence support the claim (Shields et al. 1993) of an ancient gene pool that included the ancestors of the modern inhabitants of Western Beringia and the Americas.


This study was funded by an NSFGRF and a UC Davis Humanities grant to K.B.S., University of Pennsylvania Faculty Research Funds to T.G.S. and NIH grant RR05090 to D.G. Smith. In memory of John McDonough. We thank M.N. Grote, D.A. Bolnick, K. Hunley and R.S. Malhi for helpful comments.



View Abstract