Showing posts with label Caribbean. Show all posts
Showing posts with label Caribbean. Show all posts

March 10, 2015

DNA of 17th century African slaves frome the Caribbean

PNAS doi: 10.1073/pnas.1421784112

Genome-wide ancestry of 17th-century enslaved Africans from the Caribbean

Hannes Schroeder, María C. Ávila-Arcos et al.

Between 1500 and 1850, more than 12 million enslaved Africans were transported to the New World. The vast majority were shipped from West and West-Central Africa, but their precise origins are largely unknown. We used genome-wide ancient DNA analyses to investigate the genetic origins of three enslaved Africans whose remains were recovered on the Caribbean island of Saint Martin. We trace their origins to distinct subcontinental source populations within Africa, including Bantu-speaking groups from northern Cameroon and non-Bantu speakers living in present-day Nigeria and Ghana. To our knowledge, these findings provide the first direct evidence for the ethnic origins of enslaved Africans, at a time for which historical records are scarce, and demonstrate that genomic data provide another type of record that can shed new light on long-standing historical questions.

Link

July 26, 2014

Ancestry of Cubans

PLoS Genet 10(7): e1004488. doi:10.1371/journal.pgen.1004488

Cuba: Exploring the History of Admixture and the Genetic Basis of Pigmentation Using Autosomal and Uniparental Markers

Beatriz Marcheco-Teruel et al.

We carried out an admixture analysis of a sample comprising 1,019 individuals from all the provinces of Cuba. We used a panel of 128 autosomal Ancestry Informative Markers (AIMs) to estimate the admixture proportions. We also characterized a number of haplogroup diagnostic markers in the mtDNA and Y-chromosome in order to evaluate admixture using uniparental markers. Finally, we analyzed the association of 16 single nucleotide polymorphisms (SNPs) with quantitative estimates of skin pigmentation. In the total sample, the average European, African and Native American contributions as estimated from autosomal AIMs were 72%, 20% and 8%, respectively. The Eastern provinces of Cuba showed relatively higher African and Native American contributions than the Western provinces. In particular, the highest proportion of African ancestry was observed in the provinces of Guantánamo (40%) and Santiago de Cuba (39%), and the highest proportion of Native American ancestry in Granma (15%), Holguín (12%) and Las Tunas (12%). We found evidence of substantial population stratification in the current Cuban population, emphasizing the need to control for the effects of population stratification in association studies including individuals from Cuba. The results of the analyses of uniparental markers were concordant with those observed in the autosomes. These geographic patterns in admixture proportions are fully consistent with historical and archaeological information. Additionally, we identified a sex-biased pattern in the process of gene flow, with a substantially higher European contribution from the paternal side, and higher Native American and African contributions from the maternal side. This sex-biased contribution was particularly evident for Native American ancestry. Finally, we observed that SNPs located in the genes SLC24A5 and SLC45A2 are strongly associated with melanin levels in the sample.

Link

December 27, 2013

Reconstructing Native American migrations

Of wider interest might be the authors' estimation of the autosomal mutation rate as 1.44x10-8 mutations/bp/generation. Of course, this might depend on the archaeological calibration used (where/when did the bottleneck in the ancestry of Native Americans occur?). It might also depend on recent evidence that Native Americans are of mixed origin and thus did not really split from CHB/JPT; only part of their ancestry did. Nonetheless, this is another fairly "low" autosomal mutation rate.

(This was previously released as a preprint to the arXiv).

PLoS Genet 9(12): e1004023. doi:10.1371/journal.pgen.1004023

Reconstructing Native American Migrations from Whole-Genome and Whole-Exome Data

Simon Gravel et al.

Link

November 15, 2013

Population history of the Caribbean

PLoS Genet 9(11): e1003925. doi:10.1371/journal.pgen.1003925

Reconstructing the Population Genetic History of the Caribbean

Andrés Moreno-Estrada et al.

The Caribbean basin is home to some of the most complex interactions in recent history among previously diverged human populations. Here, we investigate the population genetic history of this region by characterizing patterns of genome-wide variation among 330 individuals from three of the Greater Antilles (Cuba, Puerto Rico, Hispaniola), two mainland (Honduras, Colombia), and three Native South American (Yukpa, Bari, and Warao) populations. We combine these data with a unique database of genomic variation in over 3,000 individuals from diverse European, African, and Native American populations. We use local ancestry inference and tract length distributions to test different demographic scenarios for the pre- and post-colonial history of the region. We develop a novel ancestry-specific PCA (ASPCA) method to reconstruct the sub-continental origin of Native American, European, and African haplotypes from admixed genomes. We find that the most likely source of the indigenous ancestry in Caribbean islanders is a Native South American component shared among inland Amazonian tribes, Central America, and the Yucatan peninsula, suggesting extensive gene flow across the Caribbean in pre-Columbian times. We find evidence of two pulses of African migration. The first pulse—which today is reflected by shorter, older ancestry tracts—consists of a genetic component more similar to coastal West African regions involved in early stages of the trans-Atlantic slave trade. The second pulse—reflected by longer, younger tracts—is more similar to present-day West-Central African populations, supporting historical records of later transatlantic deportation. Surprisingly, we also identify a Latino-specific European component that has significantly diverged from its parental Iberian source populations, presumably as a result of small European founder population size. We demonstrate that the ancestral components in admixed genomes can be traced back to distinct sub-continental source populations with far greater resolution than previously thought, even when limited pre-Columbian Caribbean haplotypes have survived.

Link

September 06, 2013

ASHG 2013 abstracts

Feel free to point me to more interesting abstracts than the ones I noticed during my "first pass".

Morphometric and ancient DNA study of human skeletal remanants in Indian Subcontinent.
N. Rai et al.
Recovery and sequencing of mtDNA from ancient human remnants is a daunting task but provides valuable information about human migrations and evolution. Our present study is the first to recover, amplify and sequence (HVR and coding regions of mtDNA) inadequately preserved and highly degraded (1.5 Ky to ≤1.0 Ky ago) hominids mitochondrial DNA of three most intriguing and indigenous ancient population of South and South-East Asia (Myanmar=20 Buried individuals, Nicobar Islands=15 and Andaman Island=6). Following all parameters and to avoid the chance of contamination we independently extracted and sequenced the DNA in two different labs and measured the cranial variability in all hominid skulls using 128 cranial landmarks, compiled 3D morphometrics, genetic data of ancient DNA samples and analyzed the admixture and genetic affinities of above three populations. Results showed the predominant frequency of F1a1 and complete absence of 9bp deletion in ancient Nicobarese. Unlike in previous reports on modern Nicobarese, the high frequency of F1a1 haplogroup in ancient Nicobarese show the probable migration of Nicobarese from South East Asia and the complete absence of 9bp deletion suggests the different events of settlement. This study failed to detect genetic affinities of Burmese with Nicolbarese even though their phenotype and language appears to be same. We first time report any kind of population study on Burmese populations and with the genetic affinity of Burmese with East Asian, East Indian (Including Gadhwal region of Himalaya) and Bangladeshi populations, we found significant admixture with West Eurasians. Our study strongly supports the West Eurasian and East Asian route of migration and settlement of early Burmese population. The three populations in the present study are quite different in their genetic structure but 3D morphometric study using huge number of landmarks explains a close homology among these populations and this can be explained by the role of climatic signature on these populations.
 Y chromosomes of ancient Hunnu people and its implication on the phylogeny of East Asian linguistic families. 
LL. Kang et al.
The Hunnu (Xiongnu) people, also called Huns in Europe, were the largest ethnic group to the north of Han Chinese until the 5th century. The ethno-linguistic affiliation of the Hunnu is controversial among Yeniseian, Altaic, Uralic, and Indo-European. Ancient DNA analyses on the remains of the Hunnu people had shown some clues to this problem. Y chromosome haplogroups of Hunnu remains included Q-M242, N-Tat, C-M130, and R1a1. Recently, we analyzed three samples of Hunnu from Barköl, Xinjiang, China, and determined Q-M3 haplogroup. Therefore, most Y chromosomes of the Hunnu samples examined by multiple studies are belonging to the Q haplogroup. Q-M3 is mostly found in Yeniseian and American Indian peoples, suggesting that Hunnu should be in the Yeniseian family. The Y chromosome diversity is well associated with linguistic families in East Asia. According to the similarity in the Y chromosome profiles, there are four pairs of congenetic families, i.e., Austronesian and Tai-Kadai, Mon-Khmer and Hmong-Mien, Sino-Tibetan and Uralic, Yeniseian and Palaesiberian. Between 4,000-2,000 years before present, Tai-Kadai, Hmong-Mien, Sino-Tibetan, and Yeniseian languages transformed into toned analytic languages, becoming quite different from the rest four. Since Hunnu was in the Yeniseian family, all these four toned families were distributed in the inland of China during the transformations. There must be some social or biological factors induced the transformations at that time, which is worth doing more linguistic and genetic researches.
Genomic scans for haplotypes of Denisova and Neanderthal ancestry in modern human populations.
F. L. Mendez, M. F. Hammer University of Arizona, Tucson, AZ., USA.
Evidence of archaic introgression into modern humans has accumulated in recent years. While most efforts to characterize the introgression process have relied on genome averages, only a small number of introgressive haplotypes have been shown to have an archaic origin after rejection of the alternative hypothesis of incomplete lineage sorting. Accurate identification of introgressive haplotypes is crucial both to characterize potentially functional consequences of archaic admixture and to quantify more precisely the genomic impact of archaic introgression. We perform two independent genomic scans for haplotypes of Denisova and of Neanderthal origin in a geographically diverse sample of complete genome sequences. These scans are based on the local sharing of polymorphisms and linkage disequilibrium, respectively. The analysis of concordance between the methods is then used to estimate the power and to compare demographic inference when performed using either all the data or just the genomic regions with no evidence of introgression. Moreover, we evaluate the extent to which Denisova haplotypes are observed in non-Melanesian populations, and investigate whether the presence of such haplotypes is better explained by their persistence in the population since introgression or by more recent gene flow from Melanesians.
Admixture Estimation in a Founder Population. 
Y. Banda1 et al.
Admixture between previously diverged populations yields patterns of genetic variation that can aid in understanding migrations and natural selection. An understanding of individual admixture (IA) is also important when conducting association studies in admixed populations. However, genetic drift, in combination with shallow allele frequency differences between ancestral populations, can make admixture estimation by the usual methods challenging. We have, therefore, developed a simple but robust method for ancestry estimation using a linear model to estimate allele frequencies in the admixed individual or sample as a function of ancestral allele frequencies. The model works well because it allows for random fluctuation in the observed allele frequencies from the expected frequencies based on the admixture estimation. We present results involving 3,366 Ashkenazi Jews (AJ) who are part of the Kaiser Permanente Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort and genotyped at 674,000 SNPs, and compare them to the results of identical analyses for 2,768 GERA African Americans (AA). For the analysis of the AJ, we included surrogate Middle Eastern, Italian, French, Russian, and Caucasus subgroups to represent the ancestral populations. For the African Americans, we used surrogate Africans and Northern Europeans as ancestors. For the AJ, we estimated mean ancestral proportions of 0.380, 0.305, 0.113, 0.041 and 0.148 for Middle Eastern, Italian, French, Russian and Caucasus ancestry, respectively. For the African Americans, we obtained estimated means of 0.745 and 0.248 for African and European ancestry, respectively. We also noted considerably less variation in the individual admixture proportions for the AJ (s.d. = .02 to .05) compared to the AA (s.d.= .15), consistent with an older age of admixture for the former. From the linear model regression analysis on the entire population, we also obtain estimates of goodness of fit by r2. For the analysis of AJ, the r2 was 0.977; for the analysis of the AA, the r2 was 0.994, suggesting that genetic drift has played a more prominent role in determining the AJ allele frequencies. This was confirmed by examination of the distribution of differences for the observed versus predicted allele frequencies. As compared to the African Americans, the AJ differences were significantly larger, and presented some outliers which may have been the target of selection (e.g. in the HLA region on chromosome 6p).
Admixture in the Pre-Columbian Caribbean. 
J. C. Martinez-Cruzado et al.
The biological origin of the Caribbean aborigines that greeted Columbus is one of the most controversial issues regarding the population history of this region. Genome studies suggest an Equatorial-Tucanoan origin, consistent with the Arawakan language spoken by most natives of the region. However, the archaeological evidence suggests an early arrival from Mesoamerica, and their admixture with the more recent Arawak-speaking group stemming from the Amazon remains a possibility. The lineages comprehending most Puerto Rican samples belonging to haplogroups B1 and C1, which in turn encompass 44% of all Native American mtDNAs in the island, have an unambiguous South American origin. However, none of those belonging to haplogroup A2, encompassing 52% of all Native American mtDNAs, have been related to South America or any other continental region. To augment the scarce data from Mesoamerican countries other than Mexico, we present the complete mtDNA sequence of 6 Honduran samples belonging to distinct control region lineages in addition to 3 from the Dominican Republic and 3 from Puerto Rico. Interestingly, maximum likelihood phylogenetic reconstruction including 40 published haplogroup A2 sequence haplotypes from Mesoamerica, Central America and South America clusters 8 out of 10 Mesoamerican and Andean haplotypes in a deep rooted group, separate from, and excluding all Costa Rican, Panamian and Brasilian haplotypes, suggesting a relatively recent origin for Chibchan-Paezan and Amazonian groups. Furthermore, 4 of the 5 Greater Antillean A2 haplotypes are included in the deeply rooted Mesoamerican-Andean cluster. Moreover, the only Cuban haplotype in the literature and the remaining A2 haplotype from the Dominican Republic form even more deeply rooted private branches. Similarly, the only haplogroup C1d sample sequenced from the Dominican Republic forms a private branch with the deepest root in a maximum likelihood tree containing 19 additional C1d haplotypes from Mexico to Brasil plus the CRS. In conclusion, our preliminary results suggest that a substantial proportion of the Native American mtDNA lineages from the Greater Antilles do not share an Amazonian origin with the language their people spoke in 1492. Furthermore, the position of two Dominican lineages at the earliest split in both their respective trees suggests an early origin that could be explained by extensive lineage extinctions in Mesoamerica and the Andes or an origin in North America.
 The possible role of social selection in the distribution of the "Proto-Mongolian" haplotype in Kazakhs, Kyrgyz, Mongols and other Eurasian populations.
M. Zhabagin et al.
Social factors may be important contributors to reproductive success and determination of the selective survival of individuals. Therefore, social selection and other social factors are important for understanding population structure and its formation. The role of social selection on the distribution and formation of Y-chromosomal gene pool has been studied. There is a strong connection between social selection and birth rate of the descendants, whose fathers had achieved high social status during the expansion of the Mongol Empire and associated historical events. A total of 783 haplotypes, including 687 newly obtained and 96 retrieved from the literature were assigned to the haplogroup C3*-M217 (xM48) based on genotyping 17 Y-chromosomal STR markers. These haplotypes represent 11 populations of Eurasia: Kazakhs, Mongols, Kyrgyz, Telengits, Circassians, Balkar, Temirgoys, Karachai, Evenki, Kizhi and the Pashtuns. As the result, a major haplotype 13-16-25-15-16-18-14-10-22-11-10-11-13-10-21 (DYS389a-DYS389b-DYS390-DYS456-DYS19-DYS458-DYS437-DYS438-DYS448-GATA4-DYS391-DYS392-DYS393-DYS439-DYS635, N=94) was found to have 12.00% frequency within haplogroup C3*. This haplotype includes and extends the previously described “star-cluster” haplotype. Noteworthy, the frequency of this major haplotype within haplogroup C3* was 16.80% in Kazakhs, 10.13% in Mongols and 2.63% in Kirgiz who are not considered as direct descendants of Genghis Khan. 35.10% of the major haplotype was represented by Kazakh tribe Ashamayly-Kerey, 17.02% by the Khalkh Mongols and 7.44% by the Barguts. Therefore, we suppose this major ancestral haplotype to be the "proto-Mongolian haplotype", inherited by Genghis Khan and his descendants. It is important to mention that Temujin belongs to Kiyat-Borjigin tribe that in turn is a branch of the bigger Borjigin tribe, part of the Khalkh Mongols. Thus, Genghis Khan might be considered as a carrier rather than founder of the star-cluster haplotype. He and his descendants are the ones who contributed to a positive effect of social selection in the distribution of this haplotype. Other examples are the Barguts, who had Genghis Khan’s credit and were granted with a number of privileges, or the Kerey, based on the fact that Temujin had been brought up at the court of the Togrul Khan, belonging to the Kerey tribe.
Y-chromosomal variation in native South Americans: bright dots on a gray canvas.
M. Nothnagel et al.
While human populations in Europe and Asia have often been reported to reveal a concordance between their extant genetic structure and the prevailing regional pattern of geography and language, such evidence is lacking for native South Americans. In the largest study of South American natives to date, we examined the relationship between Y-chromosomal genotype on the one hand, and male geographic origin and linguistic affiliation on the other. We observed virtually no structure for the extant Y-chromosomal genetic variation of South American males that could sensibly be related to their inter-tribal geographic and linguistic relationships, augmented by locally confined Y-STR autocorrelation. Analysis of repeatedly taken random subsamples from Europe adhering to the same sampling scheme excluded the possibility that this finding was due to our specific scheme. Furthermore, for the first time, we identified a distinct geographical cluster of Y-SNP lineages C-M217 (C3*) in South America, which are virtually absent from North and Central America, but occur at high frequency in Asia. Our data suggest a late introduction of C3* into South America no more than 6,000 years ago and low levels of migration between the ancestor populations of C3* carrier and non-carriers. Our findings are consistent with a rapid peopling of the continent, followed by long periods of isolation in small groups, and highlight the fact that a pronounced correlation between genetic and geographic/cultural structure can only be expected under very specific conditions.
The timing and history of Neandertal gene flow into modern humans. 
S. Sankararaman et al.
   Previous analyses of modern human variation in conjunction with the Neandertal genome have revealed that Neandertals contributed 1-4% of the genes of non-Africans with the time of last gene flow dated to 37,000-86,000 years before present. Nevertheless, many aspects of the joint demographic history of modern humans and Neandertals are unclear. We present multiple analyses that reveal details of the early history of modern humans since their dispersal out of Africa.
   1.We analyze the difference between two allele frequency spectra in non-Africans: the spectrum conditioned on Neandertals carrying a derived allele while Denisovans carry the ancestral allele and the spectrum conditioned on Denisovans carrying a derived allele while Neandertals carry the ancestral allele. This difference spectrum allows us to study the drift since Neandertal gene flow under a simple model of neutral evolution in a panmictic population even when other details of the history before gene flow are unknown. Applying this procedure to the genotypes called in the 1000 Genomes Project data, we estimate the drift since admixture in Europeans of about 0.065 and about 0.105 in East Asians. These estimates are quite close to those in the European and East Asian populations since they diverged, implying that the Neandertal gene flow occurred close to the time of split of the ancestral populations. 
   2.Assuming only one Neandertal gene flow event in the common ancestry of Europeans and East Asians, we estimate the drift since gene flow in the common ancestral population. We show that an upper bound on this shared drift is 0.018. Because this is far less than the drift associated with the out-of-Africa bottleneck of all non-African populations, this shows that the Neandertal gene flow occurred after the out-of-Africa bottleneck. 
   3.We use the genetic drift shared between Europeans and East Asians, in conjunction with the observation of large regions deficient in Neandertal ancestry obtained from a map of Neandertal ancestry in Eurasians, to estimate the number of generations and effective population size in the period immediately after gene flow. These analyses suggest that only a few dozen Neandertals may have contributed to the majority of Neandertal ancestry in non-Africans today.
Genetic characterisation of two Greek population isolates. 
K. Hatzikotoulas et al.
   Genetic association studies of low-frequency and rare variants can be empowered by focusing on isolated populations. It is important to genetically characterize population isolates for substructure and recent admixture events as these may give rise to spurious associations. Under the auspices of the HELlenic Isolated Cohorts study (HELIC; www.helic.org) we have collected >3,000 samples from two isolated populations in Greece: the Pomak villages (HELIC Pomak), a set of religiously-isolated mountainous villages in the North of Greece; and Anogia and surrounding mountainous villages on Crete (HELIC MANOLIS). All samples have information on anthropometric, cardiometabolic, biochemical, haematological and diet-related traits. 1,500 individuals from each population isolate have been typed on the Illumina OmniExpress and Human Exome Beadchip platforms. Multidimensional scaling analysis with the 1000 Genomes Project data shows similarities of the two population isolates with Mediterranean populations such as the Tuscans from Italy and Iberians from Spain. We also observe evidence for structure within the isolates, with the Kentavros village in the Pomak strand demonstrating high levels of differentiation. To characterise the degree of isolatedness in these populations we estimated the proportion of individuals with at least one “surrogate parent” (using only the subset of samples with pairwise pi-hat<0 .2="" 707="" adolescents="" an="" and="" at="" attica="" compared="" comprises="" district.="" find="" for="" from="" genome="" greek="" in="" individuals="" is="" isolate="" least="" manolis="" of="" one="" outbred="" parent="" population="" proportion="" random="" regions="" study="" surrogate="" teenage="" that="" the="" this="" to="" unrelated="" we="" which="" with="">60% and in the Pomak isolate is >65% compared to ~1% in the outbred Greek population. Our results establish these populations as isolates and provide some insights into the genomic architecture of Greek populations, which have not been previously characterised.
Efficient and Accurate Whole-Genome Human Phasing.
T. Blauwkamp et al.
   High throughput DNA sequencing allows whole human genomes to be resequenced rapidly and inexpensively producing a comprehensive list of variants relative to the reference genome. However, short read sequencing technologies are limited in their ability to determine phasing information, thus resulting in heterozygous calls being represented as the average of the maternal and paternal chromosomes. Phasing information is of critical importance to personal medicine as it provides a better linkage between genotype and phenotype, permitting new advances in our understanding of compound heterozygote linked diseases, pharmacogenomics, HLA typing, and prenatal genome sequencing. Here, we describe a new sample prep method that enables whole human genome haplotyping at high accuracy using only 30Gb of sequence data. Genomic DNA was fragmented into ~10Kb fragments, end repaired, and ligated to adapters. Hundreds of aliquots with approximately 50MB of DNA in each were amplified, fragmented and converted into individual shotgun libraries. The pooled libraries were sequenced in a single lane of a HiSeq2500 at 2x100bp to generate ~30Gb of sequence. The resulting sequence information was analyzed to obtain a set of long blocks of ~10Kb, covering multiple heterozygous SNPs, allowing phasing of these SNPs relative to each other. An HMM-based phasing algorithm was used to compute the most likely phase and confidence intervals based on the observed coverage and sequencer quality scores. Phasing of those blocks relative to each other was done by another HMM-based algorithm which uses a panel of previously phased genomes. Comparing our results with phase information inferred by transmission from the parents, we found that over 98% of heterozygous SNPs were phased within long blocks (N50=500kb) at a switch error rate below 1 switch per megabase of phased sequence. We present results obtained from multiple cell lines and human samples. This new library prep method and data analysis pipeline enables whole human genome phasing with only 30Gb of raw sequence, which represents only ~30% more sequencing than current 30x baseline run for human sequencing. Compared to other published reports, this method is capable of phasing a greater fraction of SNPS with ~75% less sequencing. Coupling our higher percentage of SNPs phased with high accuracy and the lowest sequencing requirement, this new technology is the most affordable approach to generating completely phased whole human genomes.
 Inference of Natural Selection and Demographic History for African Pygmy Hunter-Gatherers.
P. H. Hsieh et al.
   African Pygmies are hunter-gatherers primarily inhabiting the Central African rainforests, where they are exposed to high temperatures, high humidity, and a pathogen and parasite-enriched woody habitat. These factors undoubtedly influenced their evolutionary history as they adapted to this environment. Many Pygmy populations have historically been in socio-economic contact with neighboring Niger-Kordofanian speaking farmer populations, particularly since the agriculture expansion in sub-Saharan Africa that began five thousand years ago (kya). To look for the true signatures of adaptation to the rainforest habitat of pygmies we must control for this complex demographic history. We sequenced and combined 40x whole genome sequence data from 3 Baka pygmies from Cameroon, 4 Biaka pygmies from the Central African Republic, and 9 Niger-Kordofanian speaking Yoruba farmers from Nigeria. We used ?a?i, a model-based demographic inference tool, to infer the history of these populations. Our best-fit model suggests that the ancestors of the farmer and pygmy populations diverged 150 kya and remained isolated from each other until 40 kya. This divergence is more ancient than estimated by previous studies that included fewer loci, but is consistent with a PSMC analysis, a separate inference tool that uses different aspects of the genomic data than ?a?i. Interestingly, our analysis shows that models with bi-directional asymmetric gene flow between farmers and pygmies are statistically better supported than previously suggested models with a single wave of uni-directional migration from farmers to pygmies. To identify possible targets of positive selection, we conducted a genomic scan using complementary methods, including the frequency-spectrum based G2D test, the population differentiation based XP-CLR test, and the haplotype based iHS test. We performed 10,000 simulations based on the above best-fit demographic model in order to assign statistical significance to each reported target of natural selection. Our results reveal that genes involved in cell adhesion, cellular signaling, olfactory perception, and immunity were likely targeted by natural selection in the pygmies or their recent ancestors. Our analysis also shows that genes involved in the function of lipid binding are enriched in highly differentiated non-synonymous mutations, suggesting that this function may have acted differently on the Pygmies and farmers after their divergence from their common ancestor.
Population demography and maternal history of Oceania.
A. T. Duggan et al.
   We present a large-scale study of mtDNA diversity across Near and Remote Oceania with whole-genome mtDNA sequencing and a sample collection of more than 1,300 individuals spanning from the Bismarck Archipelago in the west to the Cook Islands in the east. As the location of at least two major migration events (initial colonization over 40,000 years ago, followed by an expansion of Austronesian-speaking migrants around 3,500 years ago), Oceania provides a unique opportunity to study the effects of population admixture. Our results support the idea of sex-biased admixture between the resident populations and the migrants of the Austronesian expansion. We find that haplogroups of putative Asian origin which are thought to have spread with the Austronesian expansion are found at high frequency in all but two populations and, in general, we see little evidence of distinction between Papuan and Austronesian speaking populations. Santa Cruz, which is part of the Solomon Islands but geographically distinct from the main island chain and considered part of Remote Oceania, has long been considered a linguistic oddity and is now accepted to represent a very deep branch in the Oceanic language family. We find that it is also a genetic outlier, with potential direct connections to the Bismarck Archipelago not evident in the main Solomon Islands chain. In this expanded dataset, we find additional evidence of instability and increased heteroplasmy at the ‘Polynesian motif’ position 16247, further confirming previous findings restricted to the Solomon Islands. 

 Reconstructing Austronesian population history. 
M. Lipson et al.
   Present-day populations that speak Austronesian languages are spread across half the globe, from Easter Island in the Pacific Ocean to Madagascar in the Indian Ocean. Evidence from linguistics and archaeology suggests that the "Austronesian expansion," a vast cultural and linguistic dispersal that began 4--5 thousand years ago, had its origin in Taiwan. However, genetic studies of Austronesian ancestry have been inconclusive, with some finding affinities with aboriginal Taiwanese, others advancing an autochthonous origin within Island Southeast Asia, and others proposing a model involving multiple waves of migration from Asia. Here, we analyze genome-wide data from a diverse set of 31 Austronesian-speaking and 25 other groups typed at 18,412 overlapping single nucleotide polymorphisms (SNPs) to trace the genetic origins of Austronesians. We use a recently developed computational tool for building phylogenetic models of population relationships incorporating the possibility of admixture, which allows us to infer ancestry proportions and sources of genetic material for 26 admixed Austronesian-speaking populations. Our analysis provides strong confirmation of widespread ancestry of Taiwanese origin: at least a quarter of the genetic material in all Austronesian-speaking populations that we studied---including all of the Asian ancestry in populations from eastern Indonesia and Oceania---is more closely related to aboriginal Taiwanese than to any populations we sampled from the mainland. Surprisingly, we also show that western Austronesian-speaking populations have inherited substantial proportions of their Asian ancestry from a source that falls within the variation of present-day Austro-Asiatic populations in Southeast Asia. No Austro-Asiatic languages are spoken in Island Southeast Asia today, although there are some linguistic and archaeological suggestions of an early connection between mainland and island populations. The most plausible explanation for these findings, in light of the historical evidence, is that western Island Southeast Asia was settled by Austronesian groups who had previously mixed with Austro-Asiatic speakers on the mainland.
 No significant differences in the accumulation of deleterious mutations across diverse human populations. 
R. Do et al.
   Differences in demographic history across populations are expected to cause differences in the accumulation of deleterious mutations because natural selection works less efficiently when population sizes are small. Surprisingly, however, the relative burden of deleterious mutations has never been directly measured across human populations on a per-haploid genome basis, despite the fact that this is what matters biologically in the absence of dominance and epistasis. Here we empirically measure the relative accumulation of deleterious mutations in 13 diverse populations (Yoruba, Mandenka, San, Mbuti, Dinka, Australian, French, Sardinian, Han, Dai, Mixe, Karitiana and Papuan) along with one archaic population (Denisova). All the present-day populations have statistically indistinguishable accumulations of coding mutations. We highlight two examples. First, we find no evidence for a lower mutational load in West Africans than in Europeans despite the approximately 30% higher genetic diversity in West Africans: the accumulation of nonsynonymous mutations in West Africans is 1.01±0.02 times that in Europeans, and for “probably damaging” mutations, the ratio is 1.03±0.04. Second, we find no evidence for a lower mutational load in populations that have experienced agriculture-related expansions over the last 10,000 years and those that have not: the ratio in Chinese to Karitiana hunter gatherers from Brazil is 0.99±0.07. We determined that these null results are not an artifact of insensitivity of our method to differences in demographic history. As a positive control, we also analyzed archaic Denisovans who are known to have had a small population size for hundreds of thousands of years since separation from modern humans. We show that the Denisovan lineage has accumulated “probably damaging” mutations 1.33±0.06 times more rapidly than modern humans since they split. These analyses are important because of the new constraints they place on the distribution of selection coefficients in humans. Given the currently estimated demographic histories of West Africans and Europeans, combined with the fact that we do not detect a lower accumulation of deleterious mutations in West Africans than Europeans, we can conclude that only a small proportion of nonsynonymous mutations have selection coefficients in the range s=-0.01 to -0.001, which is the range of selection coefficients which would be expected to show a lower accumulation in West Africans than in Africans.
Deep coverage Bedouin genomes reveal Bedouin haplotypes shared among worldwide populations in the 1000 Genomes Project. 
J. L. Rodriguez-Flores et al.
   The 1000 Genomes Project (1000G) has sampled and sequenced over 2500 genomes that are representative of the genetic diversity in populations worldwide. The Arabian Peninsula has not been previously included in 1000G, hence the connections between genetic variation in the indigenous Bedouin people and worldwide populations is unknown. We have sampled genomes from Bedouin individuals in the nation of Qatar as a window into the genetic variation in this understudied region. Our goal was to use this sample to assess the hypothesis that there is detectable shared ancestry between Bedouin and Southern European populations resulting from the history of empires that spanned both the Mediterranean and Arabian regions and the hypothesis that there is shared ancestry between Bedouin and contemporary Latin American populations, since the majority of European settlers in Latin America from the past half millennia are primarily from Southern European countries. We selected 60 Qataris with over 95% Bedouin ancestry and at least 3 generations of ancestry in Qatar for deep coverage genome sequencing. Genomes were sequenced by the Illumina Genome Network using TruSeq DNA PCR-free sample preparation, generating over 120 gigabases of paired-end 100 base pair reads per genome on a HiSeq 2500, yielding over 30x depth and genotypes for >96% of the genome using both the ELAND/CASAVA and BWA/GATK pipelines. Using these genotypes, we inferred haplotypes using SHAPEIT for Bedouin Qataris and for 1000G populations on a set of sites polymorphic in both 1000G and Bedouins. We used admixture analysis to assess shared ancestry between our Bedouin sample and 1000G populations using the ancestry deconvolution method SUPPORTMIX. Given the lack of appropriate ancestral populations, we conducted a leave-one-out approach, where for each population (1000G + Bedouin = n), we removed the population and used the remaining n-1 populations as an ancestral reference panel. Using this approach, we observed up to 15% Bedouin ancestry in European, South Asian, and American populations. Likewise, we observed ancestry from Europe, South Asia, and America in the Bedouin. For individuals from the Americas, the analysis identified a considerable number of segments shared with Bedouins previously classified as European ancestry. 
Using a haplotype-based model to infer Native American colonization history.
C. Lewis et al.
   We apply a powerful haplotype-based model (described in Lawson et al. 2012) to infer the population history of 410 individuals from ~50 Native American groups, using data interrogated at >470,000 genome-wide autosomal Single-Nucleotide-Polymorphisms (SNPs). The model matches haplotype patterns among individuals' chromosomes to infer which individuals share recent common ancestry at each location of the genome, an approach that has previously been demonstrated to increase power substantially over widely-used alternative approaches that consider SNPs independently. We apply this methodology to 1861 samples described in Reich et al. (2012), incorporating 263 additional samples from 32 relevant world-wide regions collated from other publicly available resources and currently unavailable data. We utilize these methodology and data in two ways. First, we infer intermixing (i.e. "admixture") events among different Native American groups by identifying the groups that share the most haplotype segments. Using additional unpublished techniques, we determine the dates of these intermixing events, the proportions of DNA contributed, and the precise genetic make-up of the groups involved. These unique characteristics set this methodology apart from all presently available software, allowing us to place these mixing events into a clear historical context and thus identify the factors (e.g. the rise or fall of various Native American empires) that have contributed most to the genetic architecture of present-day Native American groups. Second, we match DNA patterns from each Native American group to a set of over 30 populations from Siberia and East Asia, describing each Native American group as a mixture of DNA from these regions. This enables us to shed light on the widely debated number of distinct migrations into the Americas during the initial colonization across the Bering Strait, comparing our results to previous inference from the literature. Our application demonstrates the power gained by using rich haplotype information relative to approaches that ignore this information.
Using Ancient Genomes to Detect Positive Selection on the Human Lineage. 
K. Prüfer et al.
   At least two distinct groups of archaic hominins inhabited Eurasia before the arrival of modern humans: Neandertals and Denisovans. The analysis of the genomes of these archaic humans revealed that they are more closely related to one another than they are to modern humans. However, since modern and archaic humans are so closely related, only about 10% of the archaic DNA sequences fall outside the present-day human variation whereas for 90% of the genome, Neandertal or Denisova DNA sequences are more closely related to some humans than to others. The fact that the archaic sequence often falls within the diversity of modern humans can be used to detect selective sweeps that affected all modern humans after their split from archaic humans since such sweeps will result in genomic regions where both the Neandertal and Denisova genomes fall outside the modern human variation. The genetic lengths of such external regions are proportional to the strength of selection, since stronger selection will lead to faster sweeps allowing less time for recombination to decrease their size. We have implemented a test for such external regions as a hidden Markov model. At each polymorphic position the model emits ancestral or derived based on whether the tested archaic genome carries the ancestral or derived variant of SNPs observed in present-day humans. The model was applied to 185 African genomes from the 1000 genomes phase 1 data. We identified thousands of external regions using the Neandertal and Denisova genomes, separately. Approximately one third of the regions are overlapping between the two genomes. These regions are significantly longer than regions only identified in only one of the archaic genomes. Based on this excess of overlap for long regions, we devise a measure to identify a set of regions that are candidates for selective sweeps on the human lineage since the split from Neandertal and Denisova.
Pulling out the 1%: Whole-Genome In-Solution (WISC) capture for the targeted enrichment of ancient DNA sequencing libraries. 
C. D. Bustamante et al.
   The very low levels of endogenous DNA remaining in most ancient specimens has precluded the shotgun sequencing of many interesting samples due to cost. For example, ancient DNA (aDNA) libraries derived from bones and teeth often contain <1 b="" by="" capacity="" dna.="" dna="" endogenous="" environmental="" is="" majority="" meaning="" of="" sequencing="" taken="" that="" the="" up=""> We will present a method for the targeted enrichment of the endogenous component of human aDNA sequencing libraries. Using biotinylated RNA baits transcribed from genomic DNA libraries, we are able to significantly enrich for human-derived DNA fragments. This approach, which we call whole-genome in-solution capture (WISC), allows us to obtain genome-wide ancestral information from ancient samples with very low endogenous DNA contents. We demonstrate WISC on libraries created from four Iron Age and Bronze Age human teeth from Bulgaria, as well as bone samples from seven Peruvian mummies and a Bronze Age hair sample from Denmark. Prior to capture, shotgun sequencing of these libraries yielded an average of 1.2% of reads mapping to the human genome (including duplicates). After capture, this fraction increased dramatically, with up to 59% of reads mapped to human and folds enrichment ranging from 5X to 139X. Furthermore, we maintained coverage of the majority of fragments present in the pre-capture library. Intersection with the 1000 Genomes Project reference panel yielded an average of 50,723 SNPs (range 3,062-147,243) for the post-capture libraries sequenced with 1 million reads, compared with 13,280 SNPs (range 217-73,266) for the pre-capture libraries, increasing resolution in population genetic analyses. We will also present the results of performing WISC on other aDNA libraries from both archaic human and non-human samples, including ancient domestic dog samples. Our capture approach is flexible and cost-effective, allowing researchers to access aDNA from many specimens that were previously unsuitable for sequencing. Furthermore, this method has applications in other contexts, such as the enrichment of target human DNA in forensic samples.
Insights into population history from a high coverage Neandertal genome. 
D. Reich1, for.the. Neandertal Genome Consortium2 
   We have sequenced to about 50-fold coverage a genome sequence from about 40 mg of a bone found in Denisova Cave in Southern Siberia. The genome of this female is much more closely related to the low-coverage Neandertal genomes from Croatia, Spain, Germany and the Caucasus than to the genome of archaic Denisovans, a sister group of Neandertals, and provides unambiguous evidence that both Neandertals and Denisovans inhabited the Altai Mountains in Siberia. The high-coverage Neandertal genome, combined with our earlier sequencing of a high quality Denisova genome, allows novel insights about the population history of archaic humans:
    •We document recent inbreeding in this Altai Neandertal. The inbreeding coefficient of about 1/8 corresponds to about the homozygosity that would be expected from a mating of half siblings. 
    •The Altai Neandertal genome shares almost seven percent more derived alleles with present-day Africans than does the Denisova genome. This means that the Denisovans derived a proportion of their ancestry from a very archaic human lineage, and the amount of this ancestry they inherit is larger than in Neandertals. 
    • The Denisovan genome is affected by major recent gene flow from an Altai-related Neandertal. 
    • To further characterize the variation among Neandertals we sequenced the genome of a Neandertal from the Caucasus to about 0.5-fold coverage. Comparisons to present-day genomes show that the Neandertals who contributed genes to present-day non-Africans were more closely related to this Caucasian Neandertal than to the Neandertals we sequenced from the Altai. 
    •We built a map of Neandertal ancestry in modern humans, using data from all non-Africans in the 1000 Genomes Project. We show that the average Neandertal ancestry on chromosome X of present-day non-Africans is about a fifth of the genome average. It is known that hybrid incompatibility loci concentrate on chromosome X. Thus, this observation is consistent with a model of hybrid incompatibility in which Neandertal variants that introgressed into modern humans were rapidly selected away due to epistatic interactions with the modern human genetic background.
Inferring complex demographies from PSMC coalescent rate estimates: African substructure and the Out-of-Africa event.
S. Gopalakrishnan et al.
   Human population history is an intriguing and complex story with many events like population growth, bottlenecks, time-dependent/non-homogeneous migration, population splits and mixtures. Estimating complete demographies with population sizes, rates of gene flow and population split times has proven to be a challenging endeavor. We propose a framework for jointly estimating the demography parameters, especially gene-flow rates and split times, for a large number of populations. We use coalescent rate estimates obtained from Pairwise Sequentially Markovian Coalescent (PSMC) as the starting point for our analysis. Since PSMC works on only two chromosomes at a time, we apply PSMC to all pairs of individuals to obtain the pairwise coalescent rates for lineages from every pair of sampled populations. Using a mathematical model for calculating coalescent probabilites given population parameters, we estimate demography using the parameters that best fit the observed coalesecent rates.
   In this study, we focus on two aspects of African population genetics, 1. the nature of population structure in Africa going back in time and 2. the timing of the Out-of-Africa event. To address these questions, we assembled a dataset with whole genome sequences from 162 individuals using both in-house sequencing and publicly available sources. These samples span 22 populations worldwide. These include eleven African populations which we use to dissect the population substructure in Africa. In addition, we also have 2 Middle Eastern, 5 European and 4 East/Central Asian populations which inform the population split time estimates for the Out-of-Africa event and the European-Asian split.
   We find extensive population structure in Africa extending back to before the Out-of-Africa event. The Ethiopian populations, Amhara and Oromo, show evidence of mixing beyond 15 kya. The Maasai and Luhye merge with the Ethiopian populations to form a panmictic East African population ~40kya. We find evidence for extensive mixing between east and west African populations before 50kya. Among the pygmy populations, we see recent gene flow between the Batwa and Mbuti. All African populations except the San merge into a single population around 110 kya. The San exchange migrants with the other African populations beginning ~120 kya. We estimate the Out-of-Africa event to have occurred ~75kya and the European-Asian split to ~25kya.
Out of Africa, which way? 
L. Pagani et al.
While the African origin of all modern human populations is well-established, the dynamics of the diaspora that led anatomically modern humans to colonize the lands outside Africa are still under debate. Understanding the demographic parameters as well as the route (or routes) followed by the ancestors of all non-Africans could help to refine our understanding of the selection processes that occurred subsequently, as well as shedding light on a landmark process in our evolutionary history. Of the three possible gateways out of Africa (via Morocco across the Gibraltar strait, via Egypt through the Suez isthmus or via the Horn of Africa across Bab el Mandeb strait) only the latter two are supported by paleoclimatic and archaeological evidence. Furthermore, recent studies (Pagani et al. 2012) showed that, although the modern Ethiopian populations might be good candidates for the descendants of the source population of such a migration, modern Egyptians could be an even better candidate. Unfortunately, however, only a few Egyptian samples have been genotyped and, as yet, none have been fully sequenced. Here, we have generated 125 Ethiopian and 100 Egyptian whole genome sequences (Illumina HiSeq, 8x average depth). The genomes were partitioned using PCAdmix (Brisbin et al. 2012) to account for the confounding effects of recent introgression from neighboring non-African populations. To explore the genetic legacy of migration routes out of Africa, and in particular to test whether the observed genetic data support one route over another, the African components of Egyptians and Ethiopians were then compared to a panel of available non-African populations from the 1000 Genomes Project (1000 Genomes Project Consortium, 2012). The high resolution provided by whole genome sequencing allows us to shed new light on the paths followed by our ancestors as they left Africa, as well as refining the current knowledge of the demographic history of the populations analyzed.
The Saudi Arabian Genome Reveals a Two Step Out-of-Africa Migration. 
J. J. Farrell et al.
   Here we present the first high-coverage whole genome sequences from a Middle Eastern population consisting of 14 Eastern Province Saudi Arabians. Genomes from this region are of interest to further answer questions regarding “Out-of-Africa” human migration. Applying a pairwise sequentially Markovian coalescent model (PSMC), we inferred the history of population sizes between 10,000 years and 1,000,000 years before present (YBP) for the Saudi genomes and an additional 11 high-coverage whole genome sequences from Africa, Asia and Europe.
   The model estimated the initial separation from Africans at approximately 110,000 YBP. This intermediate population then underwent a long period of decreasing population size culminating in a bottleneck 50,000 YBP followed by an expansion into Asia and Europe. The split and subsequent bottleneck were thus two distinct events separated by a long intermediate period of genetic drift in the Middle East. The two most frequent mitochondria haplogroups (30% each) were the Middle Eastern U7a and the African L. The presence of the L haplogroup common in Africa was unexpected given the clustering of the Saudis with Europeans in the phylogenetic tree and suggests some recent African admixture. To examine this further, we performed formal tests for a history of admixture and found no evidence of African admixture in the Saudi after the split. Taken together, these analyses suggest that the L3 haplogroup found in the Saudi were present before the bottleneck 50,000 YBP. Given the TMRCA estimates for the L3 haplogroup of approximately 70,000 YBP and the timing of the Out-of-Africa split, these analyses suggest that L3 haplogroup arose in the Middle East with a subsequent back migration and expansion into Africa over the Horn-of-Africa during the lower sea levels found during the glacial period bottleneck.
    These results are consistent with the hypothesis that modern humans populated the Middle East before a split 110,000 YBP, underwent genetic drift for 60,000 years before expanding to Asia and Europe as well as back-migration into Africa. Examination of genetic variants discovered by Saudi whole genome sequencing in ancestral African populations and European/Asian populations will contribute to the understanding human migration patterns and the origin of genetic variation in modern humans.
 Geographic Population Structure (GPS) of worldwide human populations infers biogeographical origin down to home village
E. Elhaik et al.
The search for a method that utilizes biological information to predict human’s place of origin has occupied scientists for millennia. Modern biogeography methods are accurate to 700 km in Europe but are highly inaccurate elsewhere, particularly in Southeast Asia and Oceania. The accuracy of these methods is bound by the choice of genotyping arrays, the size and quality of the reference dataset, and principal component (PC)-based algorithms. To overcome the first two obstacles, we designed GenoChip, a dedicated genotyping array for genetic anthropology with an unprecedented number of ~12,000 Y-chromosomal and ~3,300 mtDNA SNPs and over 130,000 autosomal and X-chromosomal SNPs carefully chosen to study ancestry without any known health, medical, or phenotypic relevance. We also 615 individuals from 54 worldwide populations collected as part of the Genographic Project and the 1000 Genomes Project. To overcome the last impediment, we developed an admixture-based Geographic Population Structure (GPS) method that infers the biogeography of worldwide individuals down to their village of origin. GPS’s accuracy was demonstrated on three data sets: worldwide populations, Southeast Asians and Oceanians, and Sardinians (Italy) using 40,000-130,000 GenoChip markers. GPS correctly placed 80%; of worldwide individuals within their country of origin with an accuracy of 87%; for Asians and Oceanians. Applied to over 200 Sardinians villagers of both sexes, GPS placed a quarter of them within their villages and most of the remaining within 50 km of their villages, allowing us to identify the demographic processes that shaped the Sardinian society. These findings are significantly more accurate than PCA-based approaches. We further demonstrate two GPS applications in tracing the poorly understood biogeographical origin of the Druze and North American (CEU) populations. Our findings demonstrate the potential of the GenoChip array for genetic anthropology. Moreover, the accuracy and power of GPS underscore the promise of admixture-based methods to biogeography and has important ramifications for genetic ancestry testing, forensic and medical sciences, and genetic privacy.

June 19, 2013

Native American origins from whole-genome and exome data (Gravel et al. 2013)

arXiv:1306.4021 [q-bio.PE]

Reconstructing Native American Migrations from Whole-genome and Whole-exome Data

Simon Gravel et al.

There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparental markers. The 1000 Genomes Project provides a unique opportunity for improving our understanding of population genetic history by providing over a hundred sequenced low coverage genomes and exomes from Colombian (CLM), Mexican-American (MXL), and Puerto Rican (PUR) populations. Here, we explore the genomic contributions of African, European, and especially Native American ancestry to these populations. Estimated Native American ancestry is 48% in MXL, 25% in CLM, and 13% in PUR. Native American ancestry in PUR appears most closely related to Equatorial-Tucanoan-speaking populations, supporting a Southern America ancestry of the Taino people of the Caribbean. We present new methods to estimate the allele frequencies in the Native American fraction of the populations, and model their distribution using a three-population demographic model. The ancestral populations to the three groups likely split in close succession: the most likely scenario, based on a peopling of the Americas 16 thousand years ago (kya), supports that the MXL Ancestors split 12.2kya, with a subsequent split of the ancestors to CLM and PUR 11.7kya. The model also features a Mexican population of 62,000, a Colombian population of 8,700, and a Puerto Rican population of 1,900. Modeling Identity-by-descent (IBD) and ancestry tract length, we show that post-contact populations also differ markedly in their effective sizes and migration patterns, with Puerto Rico showing the smallest size and the earlier migration from Europe.

Link

June 06, 2013

Population history of the Caribbean (Moreno-Estrada et al. 2013)

The placement of Caribbeans on a European "genetic map" is fairly interesting, as they appear to be "ultra-Iberian" (on the far left). The authors invoke drift as an explanation, which makes sense, given that a small portion of the Iberian gene pool entered into the composition of these populations.

On the other hand, it'd be nice to have Iberian data from a few centuries ago, to make sure, since Iberia, being a part of Europe may have had the opportunity to "right-shift" during the last few centuries due to gene flow, and even if it didn't there is a chance that gene flow within Iberia may have dulled population differentiation, while immigration to the Caribbean may not have originated from all parts of Iberia equally (and as I've shown, there is substantial population structure in Iberia down to this day).

arXiv:1306.0558 [q-bio.PE]

Reconstructing the Population Genetic History of the Caribbean

Andres Moreno-Estrada et al.

The Caribbean basin is home to some of the most complex interactions in recent history among previously diverged human populations. Here, by making use of genome-wide SNP array data, we characterize ancestral components of Caribbean populations on a sub-continental level and unveil fine-scale patterns of population structure distinguishing insular from mainland Caribbean populations as well as from other Hispanic/Latino groups. We provide genetic evidence for an inland South American origin of the Native American component in island populations and for extensive pre-Columbian gene flow across the Caribbean basin. The Caribbean-derived European component shows significant differentiation from parental Iberian populations, presumably as a result of founder effects during the colonization of the New World. Based on demographic models, we reconstruct the complex population history of the Caribbean since the onset of continental admixture. We find that insular populations are best modeled as mixtures absorbing two pulses of African migrants, coinciding with early and maximum activity stages of the transatlantic slave trade. These two pulses appear to have originated in different regions within West Africa, imprinting two distinguishable signatures in present day Afro-Caribbean genomes and shedding light on the genetic impact of the dynamics occurring during the slave trade in the Caribbean.

Link

June 24, 2012

SMBE 2012 abstracts (Part II)

Some more abstracts from SMBE 2012.


The Neolithic trace in mitochondrial haplogroup U8 
Joana Barbosa Pereira 1,2 , Marta Daniela Costa 1,2 , Pedro Soares 2 , Luísa Pereira 2,3 , Martin Brian Richards 1,4 1 Institute of Integrative and Comparative Biology, Faculty of Biological Sciences, University of Leeds, Leeds, UK, 2 Instituto de Patologia e Imunologia Molecular da Universidade do Porto, Porto, Portugal,  3 Faculdade de Medicina da  Universidade do Porto, Porto, Portugal,  4 School of Applied Sciences, University of Huddersfield, Huddersfield, UK  

The mitochondrial DNA (mtDNA) still remains an important marker in the study of human history, especially if  considering the increasing amount of data available. Among the several questions regarding human history that are  under debate, the model of expansion of agriculture into Europe from its source in the Near East is still unclear. Recent  studies have indicated that clusters belonging to haplogroup K, a major clade from U8, might be related with the  Neolithic expansions. Therefore, it is crucial to identify the founder lineages of the Neolithic in Europe so that we may  understand the real genetic input of the first Near Eastern farmers in the current European population and comprehend  how agriculture spread so quickly throughout all Europe.  In order to achieve this goal, a total of 55 U8 samples from the Near East, Europe and North Africa were selected for  complete characterisation of mtDNA. A maximum-parsimonious phylogenetic tree was constructed using all published  sequences available so far. Coalescence ages of specific clades were estimated using ρ statistic, maximum likelihood  and Bayesian methods considering a mutation rate for the complete molecule corrected for purifying selection.   Our results show that U8 dates to ~37-54 thousand years ago (ka) suggesting that this haplogroup might have been  carried by the first modern humans to arrive in Europe, ~50 ka. Haplogroup K most likely originated in the Near East  ~23-32 ka where it might have remained during the Last Glacial Maximum, between 26-19 years ago. The majority of K  subclades date to the Late Glacial and are related with the repopulation of Europe from the southern refugia areas. Only  a few lineages appear to reflect post glacial, Neolithic or post-Neolithic expansions, mostly occurring within Europe. The  major part of the lineages dating to the Neolithic period seems to have an European origin with exception of haplogroup  K1a4 and K1a3. Clade K1a4 appears to be originated from the Near East where it also reaches its highest peak of  diversity. Despite the main clades of K1a4 arose in the Near East during the Late Glacial, its subclade K1a4a1 dates to  ~9-11 ka and is most likely related with the Neolithic dispersal to Europe. Similarly, K1a3 probably originated in the Near  East during the Late Glacial and its subclade K1a1a dispersed into Europe ~11-13 ka alongside with the expansion of  agriculture. 
Late Glacial Expansions in Europe revealed through the fine-resolution characterisation of mtDNA haplogroup  U8 
Marta Daniela Costa 1,2 , Joana Barbosa Pereira 1,2 , Pedro Soares 2 , Luisa Pereira 2,3 , Martin Brian Richards 1,4 1 Institute of Integrative and Comparative Biology, Faculty of Biological Sciences, University of Leeds, Leeds, UK, 2 IPATIMUP - Instituto de Patologia e Imunologia Molecular da Universidade do Porto, Porto, Portugal,  3 Faculdade de  Medicina, Universidade do Porto, Porto, Portugal,  4 School of Applied Sciences, University of Huddersfield, Huddersfield,  UK  

The maternally inherited and fast evolving mitochondrial DNA (mtDNA) molecule is a highly informative tool with which  to reconstruct human prehistory. This has become even more true in recent years, as mtDNA based studies are  becoming more robust and powerful due to the availability of complete mtDNA genomes. These allow better mutation  rate estimates and fine-resolution characterisation of the phylogeography of mtDNA haplogroups, or named  clades.  MtDNA haplogroup K, the major subclade of U8, occurs at low frequencies through West Eurasian populations,  and is much more common in Ashkenazi Jews. However, the lack of variation on the first hypervariable segment (HVSI) has precluded any meaningful phylogeographic analysis to date. We therefore completely sequenced 50 haplogroup  K and 5 non-K U8 mtDNA samples from across Europe and the Near East, and combined them with 343 genomes  previously deposited in GenBank, in order to reconstruct a detailed phylogenetic tree. By combining several inference  methods, including maximum parsimony, maximum likelihood and Bayesian inference it was possible to trace the  timescale and geography of the main expansions and dispersals associated with this lineage. We confirmed that  haplogroup K, dating to ~32 thousand years (ka) ago, descended from the U8 clade, which coalesces ~48 ka ago. The  latter is close to the timing of the first arrival of modern humans in Europe and U8 could be one of the few surviving  mtDNA lineages brought by the first settlers from the Near East. U8 split into the widespread U8b, at ~43 ka, and U8a,  which seems to have expanded only in Europe ~24 ka ago. Considering the pattern of diversity and the geographic  distribution, haplogroup K is most likely to have arisen in the Near East, ~32 ka ago. However, some subclades were  evidently carried to Europe during the Last Glacial Maximum (LGM). We observed significant expansions of haplogroup  K lineages in the Late Glacial period (14-19 ka), reflecting expansions out of refuge areas in southwest and possibly  also southeast Europe. 

Reticulated origin of domesticated tetraploid wheat 
Peter Civan Centro de Ciencias do Mar, Universidade do Algarve, Faro, Portugal  

The past 15 years have witnessed a notable scientific interest in the topic of crop domestication and the emergence of  agriculture in the Near East. Multi-disciplinary approaches brought a significant amount of new data and a multitude of  hypotheses and interpretations. However, some seemingly conflicting evidence, especially in the case of emmer wheat,  caused certain controversy and a broad scientific consensus on the circumstances of the wheat domestication has not  been reached, yet.  The past phylogenetic research has translated the issue of wheat domestication into somewhat simplistic mono- /polyphyletic dilemma, where the monophyletic origin of a crop signalizes rapid and geographically localized  domestication, while the polyphyletic evidence suggests independent, geographically separated domestication events.  Interestingly, the genome-wide and haplotypic data analyzed in several studies did not yield consistent results and the  proposed scenarios are usually in conflict with the archaeological evidence of lengthy domestication.  Here I suggest that the main cause of the above mentioned inconsistencies might lie in the inadequacy of the divergent,  tree-like evolutional model. The inconsistent phylogenetic results and implicit archaeological evidence indicate a  reticulate (rather than divergent) origin of domesticated emmer. Reticulated genealogy cannot be properly represented  on a phylogenetic tree; hence different sets of samples and genetic loci are prone to conclude different domestication  scenarios. On a genome-wide super-tree, the conflicting phylogenetic signals are suppressed and the origin of  domesticated crop may appear monophyletic, leading to misinterpretations of the circumstances of the Neolithic  transition.  The network analysis of multi-locus sequence data available for tetraploid wheat clearly supports the reticulated origin of  domesticated emmer and durum wheat. The concept of reticulated genealogy of domesticated wheat sheds new light  onto the emergence of Near-Eastern agriculture and is in agreement with current archaeological evidence of protracted  and dispersed emmer domestication.

High-coverage population genomics of diverse African hunter-gatherers 
Joseph Lachance 1 , Benjamin Vernot 2 , Clara Elbers 1 , Bart Ferwerda 1 , Alain Froment 3 , Jean-Marie Bodo 4 , Godfrey  Lema 5 , Thomas Nyambo 5 , Timothy Rebbeck 1 , Kun Zhang 6 , Joshua Akey 2 , Sarah Tishkoff 1 1 University of Pennsylvania, Philadelphia, PA, USA,  2 University of Washington, Seattle, WA, USA,  3 IRD-MNHN, Musee  de l'Homme, Paris, France,  4 Ministere de la Recherche Scientifique et de l’Innovation, Yaounde, Cameroon,  5 Muhimbili  University College of Health Sciences, Dar es Salaam, Tanzania,  6 University of California at San Diego, San Diego, CA,  USA     
In addition to their distinctive subsistence patterns, African hunter-gatherers belong to some of the most genetically  diverse populations on Earth.  To infer demographic history and detect signatures of natural selection, we sequenced  the whole genomes of five individuals in each of three geographically and linguistically diverse African hunter-gatherer  populations at >60x coverage.  In these 15 genomes we identify 13.4 million variants, many of which are novel,  substantially increasing the set of known human variation.  These variants result in allele frequency distributions that are  free of SNP ascertainment bias.  This genetic data is used to infer population divergence times and demographic history  (including population bottlenecks and inbreeding).  We find that natural selection continues to shape the genomes of  hunter-gatherers, and that deleterious genetic variation is found at similar levels for hunter-gatherers and African  populations with agricultural or pastoral subsistence patterns.  In addition, the genomes of each hunter-gatherer  population contain unique signatures of local adaptation.  These highly-divergent genomic regions include genes  involved in immunity, metabolism, olfactory and taste perception, reproduction, and wound healing.

Reconstructing past Native American genetic diversity in Puerto Rico from contemporary populations Marina Muzzio 1,2 , Fouad Zakharia 1 , Karla Sandoval 1 , Jake K. Byrnes 3 , Andres Moreno-Estrada 1 , Simon Gravel 1 , Eimear  Kenny 1 , Juan L. Rodriguez-Flores 5 , Chris R. Gignoux 6 , Wilfried Guiblet 4 , Julie Dutil 7 , The 1000 Genomes Consortium 0 ,  Andres Ruiz-Linares 8 , David Reich 9,10 , Taras K. Oleksyk 4 , Juan Carlos Martinez-Cruzado 4 , Esteban Gonzalez  Burchard 6 , Carlos D. Bustamante 1 1 Department of Genetics, Stanford University School of Medicine, Stanford, California, USA,  2 Facultad de Ciencias  Naturales, Universidad Nacional de La Plata, La Plata, Buenos Aires, Argentina,  3 Ancestry. com®, San Francisco,  California, USA,  4 Department of Biology, University of Puerto Rico at Mayagüez, Mayagüez, Puerto Rico,  5 Department  of Genetic Medicine, Weill Cornell Medical College, New York, New York, USA,  6 Institute for Human Genetics,  University of California San Francisco, San Francisco, California, USA,  7 Ponce School of Medicine, Ponce, Puerto Rico, 8 Department of Genetics, Evolution and Environment. University College London, London, UK,  9 Department of  Genetics, Harvard Medical School, Boston, Massachusetts, USA,  10 Broad Institute of MIT and Harvard, Cambridge,  Massachusetts, USA  

The Caribbean region has a rich cultural and biological diversity, including several countries with different languages,  and important historical events like the arrival of the Europeans in the late fifteenth century affected it deeply. Although it  has been said that two main Native American groups peopled the Caribbean at the time of Columbus’s voyages—the  Arawakan-speaking Tainos and the Caribs—this model has been questioned because it comes from the descriptions  written by the conquerors. The archaeological record shows a richer picture of trade among the islands, cultural change  and diversity than what colonial documents depict, from the early settlements around 8000 B.P. to the chiefdoms and  towns at the time of contact. How this area was peopled and how its inhabitants interacted with the surrounding  continent are questions that remain to be answered due to the fragmentary nature of the historical and archaeological  records.   
We aim to reconstruct the Native American genetic diversity from the time of the Spanish arrival at the island of Puerto  Rico from its contemporary population. We seek to find out how the original peopling of Puerto Rico occurred, along  with which contemporary Native American populations are the most closely related to the Native tracks found. We used  PCAdmix to trace Native American segments in admixed individuals, thus enabling us to reconstruct the original native  lineages previous to the European and African contact.   

Specifically, we generated local ancestry calls for the 70 parents of the 35 complete Puerto Rican trios from the wholegenome and Illumina Omni 2.5M chip Genotype data of the 1000 Genomes Project, both to examine genome-wide  admixture patterns and to infer demographic historical events from ancestry tract length distributions and an ancestryspecific PCA approach, adding 55 Native American groups as potential source populations (N=475 genotyped through  Illumina’s 650K array) and 15 selected Mexican trios (genotyped on Affymetrix’s 6.0 array, including about 906,000  SNPs) to provide population context. ADMIXTURE analysis has shown that in Puerto Rico there is no single source of  contribution for the Native component. Rather, this component seems to include a mixture of major Mexican and  Andean components with little contributions from the Amazonian isolates. On the other hand, the ancestry-specific PCA  plotted the Puerto Rican Native segments tightly clustered with the Native segments of groups from the same language  family as the Tainos (Equatorial-Tucanoan), showing a clear association between linguistics and genetics instead of a  geographical one.
 Inference of demographic history and natural selection in African Pygmy populations from whole-genome  sequencing data
 Martin Sikora 1 , Etienne Patin 2 , Helio Costa 1 , Katherine Siddle 2 , Brenna M Henn 1 , Jeffrey M Kidd 1,3 , Ryosuke Kita 1 ,  Carlos D Bustamante 1 , Lluis Quintana-Murci 2 1 Department of Genetics, School of Medicine, Stanford Uni, Stanford, CA, USA,  2 Unit of Human Evolutionary Genetics,  Institut Pasteur, CNRS URA3012, Paris, France,  3 Departments of Human Genetics and Computational Medicine and  Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA     

The Pygmy populations of Central Africa are some of the last remaining hunter-gatherers among present-day human  populations, and can be broadly classified into two geographically separated groups, the Western and Eastern Pygmies.  Compared to their neighboring populations of predominantly Bantu origin, Pygmy populations show distinct cultural and  physical characteristics, most notably short stature, often referred to as the “Pygmy phenotype”. Given their distinct  physical characteristics, the questions of the demographic history and origin of the Pygmy phenotype have attracted  much attention. Previous studies have shown an ancient divergence (~60,000 years ago) of the ancestors of modernday Pygmies from non-Pygmies, and a more recent split of the Eastern and Western Pygmy groups. However, these  studies were generally based on a relatively small set of markers, precluding accurate estimations of demographic  parameters. Furthermore, despite the considerable interest, to date there is still little known about the genetic basis of  the small stature phenotype of Pygmy populations.   
In order to address these questions, we sequenced the genomes of 47 individuals from three populations: 20 Baka, a  Pygmy hunter-gatherer population from the Western subgroup of the African Pygmies; 20 Nzebi, a neighboring nonPygmy agriculturist population from the Bantu ethnolinguistic group; as well as 7 Mbuti, Eastern Pygmy population, from  the Human Genome Diversity Project (HGDP). We performed whole-genome sequencing using Illumina Hi-Seq 2000 to  a median sequencing depth of 5.5x per individual. After stringent quality control filters, we call over 17 Million SNPs  across the three populations, 32% of them novel (relative to dbSNP 132). Genotype accuracy after imputation was  assessed using genotype data from the Illumina OMNI1 SNP array, and error rates were found to be comparable to  other low-coverage studies (< 3% for most individuals). Preliminary results show relatively low genetic differentiation  between the Baka and the Nzebi (mean FST = 0.026), whereas the Mbuti show higher differentiation to both Baka and  Nzebi (mean FST = 0.060 and 0.070, respectively). Furthermore, we find that alleles previously found to be associated with height in other populations are not enriched for the “small” alleles in the Pygmy populations. We find a number of  highly differentiated genomic regions as candidate loci for height differentiation, which will be verified using simulations  under the best-fit demographic model, inferred from multi-dimensional allele frequency spectra using DaDi. Our dataset  will allow a detailed investigation of the demographic history and the genomics of adaptation in these populations.
Genetic structure in North African human populations and the gene flow to Southern Europe
Laura R Botigué 1 , Brenna M Henn 2 , Simon Gravel 2 , Jaume Bertranpetit 1 , Carlos D Bustamante 2 , David Comas 1 1 Institut de Biologia Evolutiva (IBE, CSIC-UPF), Barcelona, Spain,  2 Stanford University, Stanford CA, USA Despite being in the African continent and at the shores of the Mediterranean, North African populations might have  experienced a different population history compared to their neighbours. However, the extent of their genetic divergence  and gene flow from neighbouring populations is poorly understood. In order to establish the genetic structure of North  Africans and the gene flow with the Near East, Europe and sub-Saharan Africa, a genomewide SNP genotyping array  data (730,000 sites) from several North African and Spanish populations were analysed and compared to a set of  African, European and Middle Eastern samples. We identify a complex pattern of autochthonous, European, Near  Eastern, and sub-Saharan components in extant North African populations; where the autochthonous component  diverged from the European and Near Eastern component more than 12,000 years ago, pointing to a pre-Neolithic  ‘‘back-to-Africa’’ gene flow. To estimate the time of migration from sub-Saharan populations into North Africa, we  implement a maximum likelihood dating method based on the frequency and length distribution of migrant tracts, which  has suggested a migration of western African origin into Morocco ~1,200 years ago and a migration of individuals with  Nilotic ancestry into Egypt ~ 750 years ago.  We characterize broad patterns of recent gene flow between Europe and Africa, with a gradient of recent African  ancestry that is highest in southwestern Europe and decreases in northern latitudes. The elevated shared African  ancestry in SW Europe (up to 20% of the individuals’ genomes) can be traced to populations in the North African  Maghreb. Our results, based on both allele-frequencies and shared haplotypes, demonstrate that recent migrations from  North Africa substantially contribute to the higher genetic diversity in southwestern Europe

Estimating a date of mixture of ancestral South Asian populations
Priya Moorjani 1,2 , Nick Patterson 2 , Periasamy Govindaraj 3 , Danish Saleheen 4 , John Danesh 4 , Lalji Singh* 3,5 ,  Kumarasamy Thangaraj* 3 , David Reich* 1,2 1 Harvard University, Boston, Massachusetts, USA,  2 Broad Institute, Cambridge, Massachusetts, USA,  3 Centre for  Cellular and Molecular Biology, Hyderabad, Andhra Pradesh, India,  4 Dept of Public Health and Care, University of  Cambridge, Cambridge, UK,  5 Genome Foundation, Hyderabad, Andhra Pradesh, India Linguistic and genetic studies have demonstrated that almost all groups in South Asia today descend from a mixture of  two highly divergent populations: Ancestral North Indians (ANI) related to Central Asians, Middle Easterners and  Europeans, and Ancestral South Indians (ASI) not related to any populations outside the Indian subcontinent. ANI and  ASI have been estimated to have diverged from a common ancestor as much as 60,000 years ago, but the date of the  ANI-ASI mixture is unknown. Here we analyze data from about 60 South Asian groups to estimate that major ANI-ASI  mixture occurred 1,200-4,000 years ago. Some mixture may also be older—beyond the time we can query using  admixture linkage disequilibrium—since it is universal throughout the subcontinent: present in every group speaking  Indo-European or Dravidian languages, in all caste levels, and in primitive tribes. After the ANI-ASI mixture that  occurred within the last four thousand years, a cultural shift led to widespread endogamy, decreasing the rate of  additional mixture.   
Long IBD in Europeans and recent population history 
Peter Ralph, Graham Coop  UC Davis, Davis, CA, USA  
Numbers of common ancestors shared at various points in time across populations  can tell us about recent demography, migration, and population movements.  These rates of shared ancestry over tens of generations can be inferred from  genomic data, thereby dramatically increasing our ability to infer population  history much more recent than was previously possible with population genetic  techniques.  We have analyzed patterns of IBD in a dataset of thousands of  Europeans from across the continent, which provide a window into recent  European geographic structure and migration.   
Gene flow between human populations during the exodus from Africa, and the timeline of recent human  evolution  
Aylwyn Scally, Richard Durbin  Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK 
We present a novel test for historical gene flow between populations using unphased genotypes in present-day  individuals, based on the sharing of derived alleles and making a minimal set of assumptions about their demographic  history. We apply this test to data for three human individuals of African, European and Asian ancestry. We find that the  joint distribution of European and Asian genotypes is compatible with these populations having separated cleanly at  some time in the past without subsequent genetic exchange. However the same is not true of the European-African and  Asian-African distributions, which instead suggest an extended period of continued exchange between African and nonAfrican populations after their initial separation. 
We discuss this in comparison with recent models and estimates of separation time between these populations. We  also consider the impact of recent direct experimental studies of the human mutation rate, which suggest rates of  around 0.5 × 10 -9  bp -1  y -1 , substantially lower than prior estimates of 1 × 10 -9  bp -1  y -1  obtained from calibration against  the primate fossil record. We show that in several places the lower rate, implying older dates, yields better agreement  between genetic and non-genetic (paleoanthropological and archaeological) evidence for events surrounding the  exodus of modern humans from Africa and their dispersion worldwide.
Long-term presence versus recent admixture: Bayesian and approximate-Bayesian analyses of genetic  diversity of human populations in Central Asia 
Friso Palstra, Evelyne Heyer, Frederic Austerlitz  Eco-anthropologie et Ethnobiologie UMR 7206 CNRS, Equipe Genetique des Populations Humaines, Museum National  d'Histoire Naturelle, Paris, France 
A long-standing goal in population genetics is to unravel the relative importance of evolutionary forces that shape  genetic diversity. Here we focus on human populations in Central Asia, a region that has long been known to contain  the highest genetic diversity on the Eurasian continent. However, whether this variation principally reflects long-term  presence, or rather the result of admixture associated with repeated migrations into this region in more recent historical  times, remains unclear. Here we investigate the underlying demographic history of Central Asian populations in explicit  relation to Western Europe, Eastern Asia and the Middle East. For this purpose we employ both full Bayesian and  approximate-Bayesian analyses of nuclear genetic diversity in 20 unlinked non-coding resequenced DNA regions,  known to be at least 200 kb apart from any known gene, mRNA or spliced EST (total length of 24 kb), and 22 unlinked  microsatellite loci.   
Using an approximate Bayesian framework, we find that present patterns of genetic diversity in Central Asia may be  best explained by a demographic history which combines long-term presence of some ethnic groups (Indo-Iranians)  with a more recent admixed origin of other groups (Turco-Mongols). Interestingly, the results also provide indications  that this region might have genetically influenced Western European populations, rather than vice versa. A further  evaluation in MCMC-based Bayesian analyses of isolation-with-migration models confirms the different times of  establishment of ethnic groups, and suggests gene flow into Central Asia from the east. The results from the  approximate Bayesian and full Bayesian analyses are thus largely congruent. In conclusion, these analyses illustrate  the power of Bayesian inference on genetic data and suggest that the high genetic diversity in Central Asia reflects both  long-term presence and admixture in more recent historical times. 
Population structure and evidence of selection in the Khoe-San and Coloured populations from southern Africa 
Carina Schlebusch 1 , Pontus Skoglund 1 , Per Sjödin 1 , Lucie Gattepaille 1 , Sen Li 1 , Flora Jay 2 , Dena Hernandez 3 , Andrew  Singleton 3 , Michael Blum 2 , Himla Soodyall 4,5 , Mattias Jakobsson 1 1 Uppsala University, Uppsala, Sweden,  2 Université Joseph Fourier, Grenoble, France,  3 National Institute on Aging (NIH),  Bethesda, USA,  4 University of the Witwatersrand, Johannesburg, South Africa,  5 National Health Laboratory Service,  Johannesburg, South Africa  

The San and Khoe people currently represent remnant groups of a much larger and widely distributed population of  hunter-gatherers and pastoralists who had exclusive occupation of southern Africa before the arrival of Bantu-speaking  groups in the past 1,200 years and sea-borne immigrants within the last 350 years. Mitochondrial DNA, Y-chromosome  and autosomal studies conducted on a few San groups revealed that they harbour some of the most divergent lineages  found in living peoples throughout the world.   

We used autosomal data to characterize patterns of genetic variation among southern African individuals in order to  understand human evolutionary history, in particular the demographic history of Africa. To this end, we successfully  genotyped ~ 2.3 million genome wide SNP markers in 220 individuals, comprising seven Khoe-San, two Coloured and  two Bantu-speaking groups from southern Africa. After quality filtering, the data were combined with publicly available  SNP data from other African populations to investigate stratification and demography of African populations.  

We also  applied a newly developed method of estimating population topology and divergence times. Genotypes and inferred  haplotypes were used to assess genetic diversity, patterns of haplotype variation and linkage disequilibrium in different  populations.  We found that six of the seven Khoe-San populations form a common population lineage basal to all other modern  human populations. The studied Khoe-San populations are genetically distinct, with diverse histories of gene flow with  surrounding populations. A clear geographic structuring among Khoe-San groups was observed, the northern and  southern Khoe-San groups were most distinct from each other with the central Khoe-San group being intermediate. The  Khwe group contained variation that distinguished it from other Khoe-San groups. Population divergence within the  Khoe-San group is approximately 1/3 as ancient as the divergence of the Khoe-San as a whole to other human  populations (on the same order as the time of divergence between West Africans and Eurasians). Genetic diversity in  some, but not all, Khoe-San populations is among the highest worldwide, but it is influenced by recent admixture. We  furthermore find evidence of a Nilo-Saharan ancestral component in certain Khoe-San groups, possibly related to the  introduction of pastoralism to southern Africa.   

We searched for signatures of selection in the different population groups by scanning for differentiated genome-regions  between populations and scanning for extended runs of haplotype homozygosity within populations. By means of the  selection scans, we found evidence for diverse adaptations in groups with different demographic histories and modes of  subsistence. 
Impacts of life-style on human evolutionary history: A genome-wide comparison of herder and farmer  populations in Central Asia 
Michael C. Fontaine 1,2 , Laure Segurel 2,3 , Christine Lonjou 4 , Tatiana Hegay 5 , Almaz Aldashev 6 , Evelyne Heyer 2 , Frederic  Austerlitz 1,2 1 Ecology, Systematics & Evolution. UMR8079 Univ. Paris Sud - CNRS - AgroParisTech, Orsay, France,  2 EcoAnthropologie et Ethnobiologie, UMR 7206 CNRS, MNHN, Univ Paris Diderot, Sorbonne Paris Cite, Paris, France, 3 Department of Human Genetics, University of Chicago, Chicago, USA,  4 C2BiG (Centre de  Bioinformatique/Biostatistique Genomique d’Ile de France), Plateforme Post-genomique P3S, Hopital Pitie Salpetriere,  Paris, France,  5 Uzbek Academy of Sciences, Institute of Immunology, Tashkent, Uzbekistan,  6 Institute of Molecular  Biology and Medicine, National Center of Cardiology and Internal Medicine, Bishkek,  

Kyrgyzstan Human populations use a variety of subsistence strategies to exploit an exceptionally broad range of habitats and  dietary components. These aspects of human environments have changed dramatically during human evolution, giving  rise to new selective pressures. Here we focused on two populations in Central Asia with long-term contrasted lifestyles:  Kyrgyz’s that are traditionally nomadic herders, with a traditional diet based on meat and milk products, and Tajiks that  are traditionally agriculturalists, with a traditional diet based mostly on cereals. We genotyped 93 individuals for more  than 600,000 SNP markers (Human-660W-Quad-V1.0 from Illumina) spread across the genome. We first analysed the  population structure of these two populations in the world-wide context by combining our results with other available  genome-wide data. Principal component and Bayesian clustering analyses revealed that Tajiks and Kirgiz’s are both  admixed populations which differed however from each other with respect to their ancestry proportions: Tajiks display a  much larger proportion of common ancestry with European populations while Kirgiz’s share a larger common ancestry  with Asiatic populations. We then examined the region of the genome displaying unusual population differentiation  between these two populations to detect natural selection and checked whether they were specific to Central Asia or  not. We complemented these analyses with haplotype-based analyses of selection. 
Bayesian inference of the demographic history of Niger-Congo speaking populations 
Isabel Alves 1,2 , Lounès Chikhi 2,3 , Laurent Excoffier 1,4 1 CMPG, Institute of Ecology and Evolution, Berne, Switzerland,  2 Population and Conservation Genetics Group, Instituto  Gulbenkian de Ciência, Oeiras, Portugal,  3 CNRS, Université Paul Sabatier, ENFA, Toulouse, France,  4 Swiss Institute of  Bioinformatics, Lausanne, Switzerland  
The Niger-Congo phylum encompasses more than 1500 languages spread over sub-Saharan Africa. This current wide  range is mostly due to the spread of Bantu-speaking people across sub-equatorial regions in the last 4000-5000 years.  Although several genetic studies have focused on the evolutionary history of Bantu-speaking groups, much less effort  has been put into the relationship between Bantu and non-Bantu Niger-Congo groups. Additionally, archaeological and  linguistic evidence suggest that the spread of these populations occurred in distinct directions from the core region  located in what is now the border between Nigeria and Cameroon towards West and South Africa, respectively. We  have performed coalescent simulations within an approximate Bayesian computation (ABC) framework in order to  statistically evaluate the relative probability of alternative models of the spread of Niger-Congo speakers and to infer  demographic parameters underlying these important migration events. We have analysed 61 high-quality microsatellite  markers, genotyped in 130 individuals from three Bantu and three non Bantu-speaking populations, representing a  "Southern wave" or the Bantu expansion, and a "Western wave", respectively. Preliminary results suggest that models  inspired by a spatial spread of the populations are better supported than classical isolation with migration (IM) models.  We also find that Niger-Congo populations currently maintain high levels of gene flow with their neighbours, and that  they expanded from a single source between 200 and 600 generations, even though available genetic data do not  provide enough information to accurately infer these demographic parameters.

A genetic study of skin pigmentation variation in India  
Mircea Iliescu1 , Chandana Basu Mallick 2,3 , Niraj Rai 4 , Anshuman Mishra 4 , Gyaneshwer Chaubey 2 , Rakesh Tamang 4 ,  Märt Möls 3 , Rie Goto 1 , Georgi Hudjashov 2,3 , Srilakshmi Raj 1 , Ramasamy Pitchappan 5 , CG Nicholas Mascie-Taylor 1 , Lalji  Singh 4,6 , Marta Mirazon-Lahr 7 , Mait Metspalu 2,3 , Kumarasamy Thangaraj 4 , Toomas Kivisild 1,3 1 Division of Biological Anthropology, University of Cambridge, Cambridge, UK,  2 Evolutionary Biology Group, Estonian  Biocentre, Tartu, Estonia,  3 Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia,  4 Centre for Cellular  and Molecular Biology, Hyderabad, India,  5 Chettinad Academy of Research and Education, Chettinad Health City,  Chennai, India,  6 Banaras Hindu University, Varanasi, India,  7 Leverhulme Centre for Human Evolutionary Studies,  Division of Biological Anthropology, University of Cambridge, Cambridge, UK  

Human skin colour is a polygenic trait that is primarily determined by the amount and type of melanin produced in the  skin. The pigmentation variation between human populations across the world is highly correlated with geographic  latitude and the amount of UV radiation. Association studies together with research involving different model organisms  and coat colour variation have largely contributed to the identification of more than 378 pigmentation candidate genes.  These include TYR OCA2, that are known to cause albinism, MC1R responsible for the red hair phenotype, and genes  such as MATP, SLC24A5 and ASIP that are involved in normal pigmentation variation. In particular, SLC24A5 has been  shown to explain one third of the pigmentation difference between Europeans and Africans. However, the same gene  cannot explain the lighter East Asian phenotype; therefore, light pigmentation could be the result of convergent  evolution. A study on UK residents of Pakistani, Indian and Bangladeshi descent found significant association of  SLC24A5, SLC45A2 and TYR genes with skin colour. While these genes may explain a significant proportion of  interethnic differences in skin colour, it is not clear how much variation such genes explain within Indian populations  who are known for their high level of diversity of pigmentation. We have tested 15 candidate SNPs for association with  melanin index in a large sample of 1300 individuals, from three related castes native to South India. Using logistic  regression model we found that SLC24A5 functional SNP, rs1426654, is strongly associated with pigmentation in our  sample and explains alone more than half of the skin colour difference between the light and the dark group of  individuals. Conversely, the other tested SNPs fail to show any significance; this strongly argues in favour of one gene  having a major effect on skin pigmentation within ethnic groups of South India, with other genes having small additional  effects on this trait. We genotyped the SLC24A5 variant in over 40 populations across India and found that latitudinal  differences alone cannot explain its frequency patterns in the subcontinent. Key questions arising from this research are  when and where did the light skin variant enter South Asia and the manner and reason for it spreading across the Indian  sub-continent. Hence, a comprehensive view of skin colour evolution requires that in depth sequence information be  corroborated with population (genetic) history and with ancient DNA data of past populations of Eurasia