34 terms

Coalescence Theory and the Genealogy of Genes


Terms in this set (...)

probability of coalescence within a population depends on
population size
number of generations
to develop a deeper understanding of how drift operates and how it influences variation in a population we can look at
the genealogical relationships in that population. it will be particularly useful to look at these genealogical relationships one locus at a time. by doing this we can see how gene copies spread through a finite population over generations- this is the fundamental idea behind an area of population genetics known as coalescent theory
species tree
represent historical patterns of branching descent for a group of species or populations
gene tree
represent these genealogical relationships for a single locus. when we build a phylogenetic tree using sequence data from a single genetic locus we are not reconstructing the species tree directly but rather inferring the pattern of descent with modification at this one specific locus.

tells usbabout the history of that gene not the history of the populations in which that gene appears.
as we trace back in time from the present
the gene copies coalesce- that two or more distinct gene copies at some time are all descended from the same ancestral gene copy
The coalescent point
gene copy that is the most recent common ancestor
coalescent tree
shows the branching pattern of relatedness among the gene copies in the population
using gene trees to understand the process of genetic drift in small populations
we will try to understand the geneological pattern of ancestry among gene copies in a population of diploid organisms.

a genealogical
diagram—a depiction of which gene copy derived from which ancestral copy—
for a neutral locus in a population

In each generation, some gene copies manage to replicate themselves
and contribute to the next generation; other gene copies fail to replicate and are
lost. Because we are only interested in the genealogy of genes, not the genealogy
of individuals, we can ignore which gene copies are in which individual
and then "untangle" the genealogical graph to provide a clean picture, with
no crossing lines,
as we trace back in time from the
gene copies coalesce—that is, two or more distinct gene copies at
some time point are all descended from the same ancestral gene copy

coalescent point - the gene copy that is the most recent
common ancestor
dynamics of the coalescent process
The basic idea in mathematically modeling the coalescent process is to
think of a genealogy as a stochastic process running backward in time.
Suppose that we sample k gene copies from a population of N diploid
individuals. At the present, which we will call time t, these k gene copies
are all distinct. Now imagine that we take a step backward to time t −
1, and look at the previous generation. With some probability, any two
or more of our k gene copies may come from the same gene copy at t − 1. If that
occurs, we call it a coalescent event.
for a neutral locus
This model tells us
the distribution of times until coalescence and also the distribution of gene tree
topologies that arise at a neutral locus.

For a neutral locus in a diploid Wright-Fisher population of size N, the average
time to coalescence for any randomly chosen pair of gene copies turns out to be 2N
for larger group of gene copies
the average time to
coalescence of all of these copies is approximately 4N generations.
the coalescent process for a neutral locus
much of the action happens early, shortly before the present. thus most of the coalescent events between pairs of gene copies are expected to occur early on fewer than N generations

the expected time for the population to coalesce down to just two parental lineages is on ly 2N but final coalescent take a very long time. even when downt o two lineaes it take on another 2N generations for the final two lineages to coalesce
coalescent times depend strongly on
the demography of a population.

populations of constant size, we have seen that the coalescent time of any pair of
alleles is 2N and the average coalescent time of a sample of k alleles is approximately
4N. Therefore, in a small population with small N, coalescence will take less time
to occur than it will in a large population with large N
coalescence in different populations
is more recent in a small populations
occurs further back in a large population
is recent in a shrinking population
occurs further back in a growing population
Bugs in a Box
Felsenstein a metaphor for thinking about the coalescent as a stochastic process that runs forward in time instead of backward. in the metaphor the bugs represent gene copies when one bug eats another this represents the coalescent event when there is only one bug left the entire population has coalesced. in the beggining the rate is fast because there is more contact as the number declines there is less contact so it takes longer till a single bug is left
how does the coalescent process influence the amount of variation we see in populations particulary in small populations
we have
focused on the genealogy of gene copies, irrespective of their allelic state. To
understand patterns of genetic variation, we now need to add allelic differences to
our coalescent model.

generate new alleles, shown in
orange and red. Notice that all of
the variation at this locus has arisen
subsequent to the coalescent point.
the fundamental observation that links coalescent trees with genetic variation
Any allelic differences among a set of gene copies at the same
locus must have arisen by mutation subsequent to the coalescent point for this set of gene copies.
Thus, if we know the shape of the coalescent tree and the places where mutations
arose after the coalescent point, we know everything about the variation in the
present population
The structure of coalescent trees in a population tells us a great deal about the
amount of variation we should expect to see
With all else equal, the deeper the coalescent point, the
more variation we expect to see in the population
for a neutral locus
we can separate the genealogical history of the locus from the mutational
process that takes place at that locus

the process by which variation arises at the locus as the result
of two separate processes: (1) the genealogical process by which a coalescent
tree is formed, and (2) the mutation process by which variation arises along
the coalescent tree
we can separate these processes because at a neutral locus
all gene copies are equally likely to leave descendants, irrespective
of their allelic state. Thus, the mutation process and the allelic states of gene
copies have no effect on the genealogical process and the resulting shape of the
coalescent tree.
in this case the coalescent process tells us about
the strength of genetic drift to eliminate variation. a small population has a much more recent coalescent time so we expect less variation will have been generated since coalescent in the small population is more recent. because drift acts more strongly to reduce heterozygosity in a small population than in a large one.
the pattern of variation that we see at
a neutral locus
the result
of two sources of randomness superimposed on one another: (1) the randomness
associated with which particular genealogical history happens to occur—that is,
the coalescent tree of the present population, and (2) the randomness associated
with where mutations arise along this coalescent tree
a population of constant size with no selection, assortative mating,
or migration,
two randomly selected alleles are separated by on average 4N
generations, and the mutation rate is μ per locus per generation, we expect two
randomly selected alleles to differ by an average of 4Nμ mutations. But there are
two sources of randomness that cause variation around this average number of
differences: (1) genealogical history is a random process, so the two alleles may be
separated by considerably more or less than 4N generations, and (2) the mutation
process varies, so if the two alleles are separated by say 1000 generations, we may
see more or less than 1000μ mutations distinguishing them.
selective processes also have a substantial influence on the shape of coalescent trees
selection drives alleles quickly to fixation leading to a more recent coalescent time.

the neutral mutation arises and slowly replaces the ancestral allele by driftnew neutral allele arises by mutation as indicated. In this
particular example, the new allele drifts, by chance, to fixation. Note, however, that
most newly arisen neutral alleles will be lost, rather than fixed, by drift
alleles under positive selection
do not have to rely on drift alone to reach fixation the new allele is positively selected and, because of selection, it
quickly replaces all other alleles in the population. As a result, the population
has a more recent coalescent point than in the neutral example. This is a useful
observation. Because a recent selective event results in a more recent coalescent
point, we expect to find less neutral variation—
that is, fewer silent substitutions—at the locus
under selection
balancing selection
forms of balancing
selection such as overdominance or negative frequency dependence can maintain balanced polymorphisms of two or more alleles.a new allele arises by mutation that is under balancing selection with the ancestral
allele. Because balancing selection favors the new allele when it is rare, but favors the ancestral allele when the new allele is common, neither allele is easily able to go to fixation. As a result, both remain in the population for an extended
period of time, and the coalescent point for thislocus occurs further in the past than it did for the neutral and positively selected cases. Becausethe population is finite, we expect one allele willeventually replace the other by chance despite
balancing selection. But this may take a very longtime to occur, and in the meantime we observe abalanced polymorphism with a coalescent point
far from the present
incomplete lineage sorting
population genetic theory tells us we wont always get such a simple pattern. for recent or repeated and rapid sepciation process there might not be time for the genetic lineages to sort. the gene tree can be different from the species trees
gene trees vs species trees
failure to coalesce within species lineages drives divergence of relationships between gene and species trees
any allelic difference among a set of gene copies of the same locu (orthologs)
must have arisen by mutation subsequent to the coalescent point for this set of gene copies
paralogous genes are duplicated copies
that do not share a common evolutionary history.
a gene tree is not necessarily a species tree. is ortholog evolution mutation marks divergence no duplication paralog descent and divergence while in the same species. not xenologs come from other species
preadaptation vs exaptation
a character that serves a new function by chance
examptation-preadaptations that are derived to serve a new funcion
an adaptation enhances overall fitness
of a phenotype compared with other charcter states or diferent triaits. when a trait hs been under natural selection in a specific population and that trait seves the same primary function of functions today as it did in the past it is an adaptations
the comparative method
for a character to be an adaptation it must be a derived character evolving in response to a sepcfic agent.