In Monday's discussion of
Andrew et al. (2013) Road Map for Molecular Ecology we criticized this vision of the future. In particular the
sections on phylogeography, hybridization and speciation, don't grapple with the essential problem of how genome data are changing our
questions and analyses.
Here are some recent papers that do just that.
Ellegren (2012) "Genome sequencing and population genomics in non-model organisms" reviews methods for
inference of historical demography (PSMC). This is a genome wide analogue of
skyline plots, based on varying time to coalescence of different genome
segments. It was developed for a single diploid genome (you need some
heterozygosity to estimate coalescence time) and has now been extended to
multiple individuals. It should be better resolved than one-to-few-gene skyline
plots, especially in detecting the signal of multiple expansion-contractions but
not necessarily for recent events (which depend on sample size).
Sousa & Hey (2013) "Understanding the origin of species with genome-scale data: modelling gene flow" go straight to the heart of the matter. Much of the potential power of population genomics comes from recombination. The coalescent genealogical sampling methods that we currently use (BEAST, IMa...) are computationally intensive and can't handle recombination. Recent
population genomic studies use summary statistics like Fst and D (single-SNP
4-taxon gene trees). These throw out most information in the data and can't distinguish among processes that may have caused observed patterns. Current methods are simply unable to simultaneously consider implications of many gene-genealogies with varying degrees of linkage but there are some promising approaches that involve computational shortcuts (ABC, PAC) and Hidden Markov Model (HMM) methods that allow changing genealogies along the genome.
Both of these reviews have something to say about detecting selection and the genes underlying phenotypic differences in wild populations - including GWAS, which was not mentioned in Andrews et al. (2013).
Apart from the emerging methods reviewed above I think our road map needs to consider genome-wide inference of
admixture. This year there have been some great new resources and approaches to studying individual origins and population histories using admixture. One of these is the Genetic Atlas of Human Admixture History (
Hellenthal et al. 2014), with this wonderful web
playground. Just a few weeks ago Elhaik et al. (2014..aka
The Genographic Consortium) published the "Geographic Population Structure" algorithm (GPS...haha) which uses admixture analysis of Ancenstral Informative Markers (AIMs) to infer the biogeographic origins of populations. Using 100 000 SNPs they were able to assign human individuals to geographic regions with astounding accuracy, to within 50km and often to the right village for some European populations (Sardinia, which is old and structured). This is far better than what is achieved from previous methods such as PCA.
On the other end of the population structure spectrum, we need to know about the new Bayesian assignment methods for species delimitation
(BPP), which
may be particularly applicable to bar-coding studies. As a parting shot I might add that DNA based trophic ecology (i.e. CSI - POO) is at best a scenic byway off the freeway of ecological metagenomics; with the as yet unfulfilled promise of detecting rare species, measuring abundance and analysing whole species communities from non-invasive environmental samples. These curious absences in Andrew
et al. (2013) are partly because the review aims at concepts not methods. Given that our (potential) ability to generate data is way ahead of theory we need more specific consideration of analyses.
The
Harrison et al. (2014) review of evolutionary potential, which we didn't discuss, is an interesting essay on how we might use experimental analyses of fitness in model organisms to derive measures of adaptive and non-adaptive variation in the wild (giving evolutionary significance to ESUs). Unfortunately the only two organisms cited in their consideration of wild populations are
Arabidopsis and
Drosophila - I guess we are not quite there yet.
PSMC - Pairwise Sequential Markovian Models
HMM - Hidden Markov Models
Admixture
AIMs - ancestry informative markers
GPS - population structure
BPP - species assignment
Ecological Metagenomics
GWAS on non-model species
That's a long list of emerging techniques to leave off your road map of the future.