Friday, May 16, 2014

A Road Map for Population Genomics


In Monday's discussion of Andrew et al. (2013) Road Map for Molecular Ecology we criticized this vision of the future. In particular the sections on phylogeography, hybridization and speciation, don't grapple with the essential problem of how genome data are changing our questions and analyses.

Here are some recent papers that do just that.

  Ellegren (2012) "Genome sequencing and population genomics in non-model organisms" reviews methods for inference of historical demography (PSMC). This is a genome wide analogue of skyline plots, based on varying time to coalescence of different genome segments. It was developed for a single diploid genome (you need some heterozygosity to estimate coalescence time) and has now been extended to multiple individuals. It should be better resolved than one-to-few-gene skyline plots, especially in detecting the signal of multiple expansion-contractions but not necessarily for recent events (which depend on sample size).
Sousa & Hey (2013)  "Understanding the origin of species with genome-scale data: modelling gene flow" go straight to the heart of the matter. Much of the potential power of population genomics comes from recombination. The coalescent genealogical sampling methods that we currently use (BEAST, IMa...) are computationally intensive and can't handle recombination. Recent population genomic studies use summary statistics like Fst and D (single-SNP 4-taxon gene trees). These throw out most information in the data and can't distinguish among processes that may have caused observed patterns. Current methods are simply unable to simultaneously consider implications of many gene-genealogies with varying degrees of linkage but there are some promising approaches that involve computational shortcuts (ABC, PAC) and Hidden Markov Model (HMM) methods that allow changing genealogies along the genome.

Both of these reviews have something to say about detecting selection and the genes underlying phenotypic differences in wild populations - including GWAS, which was not mentioned in Andrews et al. (2013).

Apart from the emerging methods reviewed above I think our road map needs to consider genome-wide inference of admixture. This year there have been some great new resources and approaches to studying individual origins and population histories using admixture. One of these is the Genetic Atlas of Human Admixture History (Hellenthal et al. 2014), with this wonderful web playground. Just a few weeks ago Elhaik et al. (2014..aka The Genographic Consortium) published the "Geographic Population Structure" algorithm (GPS...haha) which uses admixture analysis of Ancenstral Informative Markers (AIMs) to infer the biogeographic origins of populations. Using 100 000 SNPs they were able to assign human individuals to geographic regions with astounding accuracy, to within 50km and often to the right village for some European populations (Sardinia, which is old and structured). This is far better than what is achieved from previous methods such as PCA.

On the other end of the population structure spectrum, we need to know about the new Bayesian assignment methods for species delimitation (BPP), which may be particularly applicable to bar-coding studies. As a parting shot I might add that DNA based trophic ecology (i.e.  CSI - POO) is at best a scenic byway off the freeway of ecological metagenomics; with the as yet unfulfilled promise of detecting rare species, measuring abundance and analysing whole species communities from non-invasive environmental samples. These curious absences in Andrew et al. (2013) are partly because the review aims at concepts not methods. Given that our (potential) ability to generate data is way ahead of theory we need more specific consideration of analyses. 

The Harrison et al. (2014) review of evolutionary potential, which we didn't discuss, is an interesting essay on how we might use experimental analyses of fitness in model organisms to derive measures of adaptive and non-adaptive variation in the wild (giving evolutionary significance to ESUs). Unfortunately the only two organisms cited in their consideration of wild populations are Arabidopsis and Drosophila - I guess we are not quite there yet.

PSMC - Pairwise Sequential Markovian Models
HMM - Hidden Markov Models
Admixture
AIMs - ancestry informative markers
GPS - population structure
BPP - species assignment
Ecological Metagenomics
GWAS on non-model species

That's a long list of emerging techniques to leave off your road map of the future.