Skip to main content
  • Other Publications
    • Philosophical Transactions B
    • Proceedings B
    • Biology Letters
    • Open Biology
    • Philosophical Transactions A
    • Proceedings A
    • Royal Society Open Science
    • Interface
    • Interface Focus
    • Notes and Records
    • Biographical Memoirs

Advanced

  • Home
  • Content
    • Latest issue
    • All content
    • Subject collections
    • Special features
    • Videos
  • Information for
    • Authors
    • Reviewers
    • Readers
    • Institutions
  • About us
    • About the journal
    • Editorial board
    • Author benefits
    • Policies
    • Citation metrics
    • Publication times
    • Open access
  • Sign up
    • Subscribe
    • eTOC alerts
    • Keyword alerts
    • RSS feeds
    • Newsletters
    • Request a free trial
  • Submit
Open Access

Modelling mitochondrial site polymorphisms to infer the number of segregating units and mutation rate

Michael D. Hendy, Michael D. Woodhams, Andrew Dodd
Published 25 March 2009.DOI: 10.1098/rsbl.2009.0104
Michael D. Hendy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael D. Woodhams
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrew Dodd
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Abstract

We present a mathematical model of mitochondrial inheritance evolving under neutral evolution to interpret the heteroplasmies observed at some sites. A comparison of the levels of heteroplasmies transmitted from mother to her offspring allows us to estimate the number Nx of inherited mitochondrial genomes (segregating units). The model demonstrates the necessity of accounting for both the multiplicity of an unknown number Nx, and the threshold θ, below which heteroplasmy cannot be detected reliably, in order to estimate the mitochondrial mutation rate μm in the maternal line of descent. Our model is applicable to pedigree studies of any eukaryotic species where site heteroplasmies are observed in regions of the mitochondria, provided neutrality can be assumed. The model is illustrated with an analysis of site heteroplasmies in the first hypervariable region of mitochondrial sequence data sampled from Adélie penguin families, providing an estimate Nx and μm. This estimate of μm was found to be consistent with earlier estimates from ancient DNA analysis.

Keywords:

1. Introduction

It has recently been found that estimates of the human mitochondrial mutation rate from pedigree data are many times higher than those estimated from sequence divergence at putatively neutral sites (e.g. Parsons et al. 1997; Howell et al. 2003; Santos et al. 2005). One possible reason for this is that heteroplasmy (variation among different organelle genomes within the same individual) may be maintained for many generations, after the origin by mutation of a new variant, provided that there is not a very small bottleneck in the number of organelle genomes during transmission from parent to offspring (reviewed by Birky 1991). The persistence of heteroplasmic mutations for many generations after their origin may lead to an inflated contribution to a pedigree-based estimate of mutation rate, if they are treated as mutations that have become fixed within individuals. We propose a method for estimating the mutation rate and size of the transmission bottleneck for maternally transmitted mitochondrial genomes. We illustrate this using the dataset of Millar et al. (2008) on the Adélie penguin.

Millar et al. (2008) sequenced a segment of 344 base pairs in a fast evolving region of first hypervariable region (a region they argued was not under strong selective pressure) from 1931 Adélie penguins (508 families) at Antarctic nesting sites. This was a follow-up to the study of Lambert et al. (2002) of ancient and contemporary samples of the same region of mitochondrial DNA. In the data reported in Millar et al. (2008), a low proportion of the sites in individual birds exhibited heteroplasmies, where two bases were called on the electropherogram at that site, each with a signal greater than θ=0.23 of the total signal. Signals below the detection threshold θ, could not be reliably distinguished from background noise and were not reported there. Fifty-five mothers (out of 508) exhibited site heteroplasmies (above θ), with seven having two heteroplasmic sites. Hence, the proportion of sites with an observed heteroplasmy across all mothers wasEmbedded Image(1.1)In all but three cases, the observed site heteroplasmies persisted above the threshold in both mother and chicks. Site heteroplasmies in a chick were only checked at sites where a heteroplasmy had been observed in its mother.

Following Lambert et al. (2002) and Millar et al. (2008), we define mutation rate as the rate at which a base substitution is incorporated into all mitochondrial genomes of an individual. Previous studies linking mutation rate with observed site heteroplasmies include two studies (Brandstätter et al. 2004; Santos et al. 2005) of human populations of up to three generations, where the mutation rate was estimated from assuming that each observed site heteroplasmy represented a transitional substitution. We find most of the substitutions giving rise to an observed site heteroplasmy are subsequently lost, and those that eventually become fixed may persist as an observed site heteroplasmy for many generations and be oversampled.

For our study, we follow the maternal ancestry of an individual, tracking the trajectory of a site substitution introduced into the germ line by some ancestor. At each generation, the population of mitochondrial genomes will pass through a bottleneck when the oocyte segregates a small number of its mother's genomes. Lambert et al. (2002) argued that this region of the mt-genome is not under strong selective pressure, so our model assumes random drift among the Nx segregating genomes.

This model should be applicable for any region of the mt-genome assumed neutral under selection for any eukaryote species.

2. Material and methods

Our model focuses on an individual site of the mitochondrial genomes in the maternal ancestry of one individual. When a heteroplasmy (comprising two different nucleotides, generically X, Y) is observed at a site, we presume, this is a consequence of a somatic substitution X→Y or Y→X at that site in a maternal ancestor some g generations prior, inherited by a daughter, which has persisted for g generations.

Consider the maternal ancestry of an individual A0, with A1 its mother and A2 its maternal grandmother, etc. Suppose a nucleotide substitution X→Y has occurred at a site in one genome of a maternal line ancestor Ag (g≥1), which is inherited by Ag−1. Our model shows that the probability of an additional substitution inherited at that site among Ag−1, …, A0 is less than 10−2, so we will neglect this possibility. Suppose this site heteroplasmy persists for (exactly) k≥1 generations, inherited by Ag−1, …, Ag−k, but not by Ag−k−1. If k≥g, then A0 will be heteroplasmic at that site, although not necessarily observable. If k<g, Ag−k−1, …, A0 are not heteroplasmic at that site, with their genomes either containing all X or all Y.

We propose a model where each oocyte recruits Nx mitochondrial genomes independently from the population of its mother's genomes. (The number Nx is sometimes referred to as the number of segregating units. For the Adélie penguins, we estimated Nx to have a 95% confidence interval (CI) of 25<Nx<69.)

The observed levels of most site heteroplasmies from the blood samples of a mother and her chicks closely agree, which reflects a close agreement in the germ line. We will assume that the variation, after accounting for measurement uncertainty, is due to sampling at the recruiting bottleneck and that the proportions in the blood sample estimate the inherited proportions.

If A1 (the mother of A0) contains a site heteroplasmy with the nucleotides Y and X appearing in proportions ϕ and 1−ϕ, then the probability (using a binomial selection with replacement) that A0 inherits i genomes with allele Y and Nx−i genomes of allele X isEmbedded ImageIf A1 had inherited j copies of allele Y and Nx−j of allele X from her mother A2, we assume she exhibits the proportions ϕ=j/Nx and 1−ϕ=1−j/Nx of the alleles in the genomes available for inheritance. Hence,Embedded Image(2.1)is the probability that A0 inherits i copies of allele Y, and Nx−i copies of allele X, given her mother had inherited j and Nx−j corresponding copies. LetEmbedded Image(2.2)be the matrix of these probabilities, where the Nx−1 rows and columns are indexed by i,j∈{1,2, …, Nx−1}.

Given that Ag has a somatic mutation in a descendant of one of her Nx founding genomes, the probability that Ag−1 inherits more than one mutated genome is very small. Hence, we will assume Ag−1 is heteroplasmic at that site, with proportion 1/Nx of its genomes containing Y. If the heteroplasmy is lost g generations later, then A0 has either all X or all Y at that site. Let hX,Y be the proportion of cases where the heteroplasmy persists and hX and hY be the proportions where the mutation is lost or fixed. The neutral model predicts that as g increasesEmbedded Image

In a simulation study of 107 heteroplasmic site histories, we followed the introduction of one mutation until the site heteroplasmy was lost, for Nx=20 and for Nx=40. We found (table 1) for θ=0.23 that hX≈((Nx−1)/Nx) and hY≈1/Nx. Table 1 also gives the average numbers of generations that the site heteroplasmies persist, and are observable. We note that over all histories, the average numbers of generations a site heteroplasmy was observable were almost identical for Nx=20 and 40, but the average variation in the levels of a heteroplasmy at a site between a mother and her chick differed significantly.

View this table:
  • View inline
  • View popup
Table 1

Site heteroplasmy histories: results from 107 simulations for Nx=20 and 40 segregating units. (In more than 99.9% of the histories, the introduced mutation is either lost or fixed within 200 generations, and no site heteroplasmy survived more than 520 generations. Statistics are presented for the cases HXn (X→Y lost and never observed at θ=0.23), HXo (lost but observable for at least one generation) and HY (the mutation X→Y is ultimately fixed). We note that the proportions fixed (HY) is close to its expected value of 1/Nx. gav records the mean number of generations the site heteroplasmy survives, and gav(obs) records the mean number of generations the site heteroplasmy is in the observable range. Note as Nx is doubled, the gav(obs) values double, but the proportions are halved so the mean number of generations for which a site heteroplasmy is observable is approximately constant, close to 2.30 in both cases. In the final row, we note the average mean square difference between the mother/chick pairs over all observed heteroplasmic sites differs for Nx=20 and Nx=40.)

For each Ak in the ancestry, let nk be the number of its founding genomes with nucleotide Y at that site. We have assumed that ng−1=1. If for k<g−1, nk=0 or Nx, the heteroplasmy is lost. Suppose 1≤nk+1=j<Nx, then the probability that nk=i, (1≤i<Nx) isEmbedded ImageIn lemma 1 of the electronic supplementary material, we show thatEmbedded Imagewhich is the ith entry of the leading column of Embedded Image. Assuming that the probability that Ag introduces a somatic mutation into the germ line at a selected site is α, then summing over all generations g≥1, the probability that A0 has a site heteroplasmy with n0=i (1≤i<Nx) isEmbedded Image(2.3)where Embedded Image is the first entry in the ith row ofEmbedded ImageAs Embedded Image is the expected number of generations that a new site heteroplasmy persists with i copies (figure 1), the expected number of generations a heteroplasmy is observable at that site isEmbedded ImageMost site heteroplasmies never reach the detection threshold, those that do, usually persist for many more than Embedded Image generations (table 1).

Figure 1
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1

Values of Embedded Image (written as Embedded Image), the expected number of generations a site heteroplasmy persists, and of Embedded Image, the expected number of generations the site heteroplasmy are observable with threshold θ=0.23 (plotted on a logarithmic scale). Note that Embedded Image converges to the constant value 2 ln(θ−1−1)=2.417 (dotted line), while Embedded Image (solid line) grows logarithmically as Nx increases.

In figure 2, we plot the values of (Q20)i,1 and (Q40)i,1, noting that these two distributions are almost identical, and that Embedded Image is closely approximated by 2/i within the observed region. (The limit Embedded Image as Nx→∞ was noted by Fisher and Wright (Ewens 2004, eqn (1.56)).)

Figure 2
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2

The values of per20(i), (a) for i=2, …, 19, and per40(i), (b) for i=2, …, 39, are plotted as dots. The solid lines are the curves 2/i. The shaded regions are outside the detection threshold for θ=0.23 with the observable region (unshaded) between them. We note that in both observed regions the points Embedded Image fit the curve 2/i closely.

We show in lemma 2 of the electronic supplementary material, that assuming Embedded Image, the probability β that a site has an observable heteroplasmy is closely approximated by 2α ln(θ−1−1). Assuming neutral evolution, 1/Nx of the substitutions entering the germ line will become fixed in the maternal line of descent, so that the mutation rate can be estimated asEmbedded Image(2.4)where t is the generation time.

3. Results

We have shown that majority of heteroplasmies cannot be observed, leading to undersampling; but that those that do may persist for many generations, leading to oversampling. These two effects do not balance. We have demonstrated that the estimate of the mutation rate from the density of observed heteroplasmic sites is dependent both on the number of segregating units Nx and on the detection threshold θ.

For the Adélie penguin data of Millar et al. (2008), with θ=0.23, t=6.46 and Embedded Image, we used Bayesian analysis (see ‘Estimating Nx from mother–chick comparisons’ of the electronic supplementary material) to obtain a maximum-likelihood estimate of Nx=36.5, with 95% CI of 25.0≤Nx≤66.9. Using equation (2.4), Millar et al. (2008) estimated the median value to be μm=0.55 s s−1Myr−1 (CI 0.29≤μm≤0.88), a value not significantly different from the ancient evolutionary rate of k=0.86 s s−1Myr−1 (CI 0.53≤k≤1.17) estimated by Lambert et al. (2002) for the same region. In their study, Millar et al. showed that if they had assumed that each observed heteroplasmy represented a mutation in transition, the estimate of μ would have been increased by a factor of nearly 100.

4. Discussion

This model has assumed neutral evolution for the regions of mitochondria genome under analysis. Whether specific regions are under selective pressure is outside the scope of this study; however, Rand et al. (1994), for example, addressed this issue. Our model is applicable wherever the assumption of neutrality can be assumed for pedigree studies of any species where multiple copies of the mitochondrial genome are inherited, and a sufficient number of heteroplasmies are observed.

Accounting for the effects modelled here may illuminate the apparent discrepancies reported between molecular substitution rates and those estimated from pedigree studies, such as for human studies, e.g. Parsons et al. (1997), Howell et al. (2003) and Santos et al. (2005). As these studies do not report observational thresholds, our model cannot be applied directly to their data.

Improvement in the accuracy of determining the relative expression levels of site heteroplasmies (with Embedded Image) might lead to a direct estimate of Nx. However, it is likely that Nx may vary across the sample, in which case the distribution would not clump around the i/Nx values. We have shown (by simulation) that the model is robust against moderate variations in Nx, provided we take Nx to be an idealized harmonic mean of the individual values in the sample.

Acknowledgments

We gratefully acknowledge support from the New Zealand Centres of Research Excellence Fund. We thank the following people who assisted in the development of this paper: J. Esins, D. M. Lambert, C. D. Millar, D. Penny, and K. Schliep. We thank the editor and referees for their very helpful comments in improving the presentation.

Footnotes

  • ↵† Present address: School of Information Technologies, University of Sydney, Sydney, New South Wales 2006, Australia.

  • ↵‡ Present address: Summit plc, Abingdon, Oxfordshire OX14 4RY, UK.

  • One contribution to a Special Feature on ‘Whole organism perspectives on understanding molecular evolution’.

    • Received February 8, 2009.
    • Accepted February 27, 2009.
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

  • Copyright © 2009 The Royal Society

References

  1. ↵
    1. Selander R.K.,
    2. Clark A.G.,
    3. Whittam T.S.
    1. Birky C.W.
    Evolution and population genetics of organelle genes. In Evolution at the molecular level Selander R.K., Clark A.G., Whittam T.S. 1991pp. 112–134. Eds. Sunderland, MA:Sinauer. (See also pp. 202–221.)
  2. ↵
    1. Brandstätter A.,
    2. Niederstätter H.,
    3. Parson W.
    2004 Monitoring the inheritance of heteroplasmy by computer-assisted detection of mixed base-calls in the entire human mitochondrial control region. Int. J. Legal Med. 118, 47–54. doi:10.1007/s00414-003-0418-z.
    OpenUrlCrossRefPubMedWeb of Science
  3. ↵
    1. Ewens W.J.
    Mathematical population genetics. In Springer 2004 New York, NY:Springer
  4. ↵
    1. Howell N.,
    2. Smejkal C.B.,
    3. Mackey D.A.,
    4. Chinnery P.F.,
    5. Turnbull D.M.,
    6. Herrnstadt C.
    2003 The pedigree rate of sequence divergence in the human mitochondrial genome: there is a difference between phylogenetic and pedigree rates. Am. J. Hum. Genet. 72, 659–670. doi:10.1086/368264.
    OpenUrlCrossRefPubMedWeb of Science
  5. ↵
    1. Lambert D.M.,
    2. Ritchie P.A.,
    3. Millar C.D.,
    4. Holland B.,
    5. Drummond A.J.,
    6. Baroni C.
    2002 Rates of evolution in ancient DNA from Adélie penguins. Science. 295, 2270–2273. doi:10.1126/science.1068105.
    OpenUrlAbstract/FREE Full Text
  6. ↵
    1. Millar C.D.,
    2. Dodd A.,
    3. Anderson J.,
    4. Gibb G.C.,
    5. Ritchie P.A.,
    6. Baroni C.,
    7. Woodhams M.,
    8. Hendy M.D.,
    9. Lambert D.M.
    2008 Mutation and evolutionary rates in Adélie penguins from the Antarctic. PLoS Genet. 4, e000209 doi:10.1371/journal.pgen.1000209.
    OpenUrl
  7. ↵
    1. Parsons T.J.,
    2. et al.
    1997 A high observed substitution rate in the human mitochondrial DNA control region. Nat. Genet. 15, 363–368. doi:10.1038/ng0497-363.
    OpenUrlCrossRefPubMedWeb of Science
  8. ↵
    1. Rand D.M.,
    2. Dorfsman M.,
    3. Kann L.M.
    1994 Neutral and non-neutral evolution of Drosophila mitochondrial DNA. Genetics. 138, 741–756.
    OpenUrlAbstract/FREE Full Text
  9. ↵
    1. Santos C.,
    2. Monteil R.,
    3. Sierra B.,
    4. Bettencourt C.,
    5. Fernandez E.,
    6. Alvarez L.,
    7. Lima M.,
    8. Abade A.,
    9. Aluja M.P.
    2005 Understanding differences between phylogenetic and pedigree-derived mtDNA mutation rate: a model using families from the Azores Islands (Portugal). Mol. Biol. Evol. 22, 1490–1505. doi:10.1093/molbev/msi141.
    OpenUrlAbstract/FREE Full Text
View Abstract
Back to top
Share
Modelling mitochondrial site polymorphisms to infer the number of segregating units and mutation rate
Michael D. Hendy, Michael D. Woodhams, Andrew Dodd
Biol. Lett. 2009 -; DOI: 10.1098/rsbl.2009.0104. Published 25 March 2009
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Connotea logo Facebook logo Google logo Mendeley logo
Email

Thank you for your interest in spreading the word on Biology Letters.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Modelling mitochondrial site polymorphisms to infer the number of segregating units and mutation rate
(Your Name) has sent you a message from Biology Letters
(Your Name) thought you would like to see the Biology Letters web site.
Print
Manage alerts

Please log in to add an alert for this article.

Sign In to Email Alerts with your Email Address
Citation tools

Modelling mitochondrial site polymorphisms to infer the number of segregating units and mutation rate

Michael D. Hendy, Michael D. Woodhams, Andrew Dodd
Biol. Lett. 2009 -; DOI: 10.1098/rsbl.2009.0104. Published 25 March 2009

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Download

Article reuse

  • Article
    • Abstract
    • 1. Introduction
    • 2. Material and methods
    • 3. Results
    • 4. Discussion
    • Acknowledgments
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

See related subject areas:

  • molecular biology
  • evolution

Related articles

Cited by

Large datasets are available through Biology Letters' partnership with Dryad

Open biology

  • BIOLOGY LETTERS
    • About this journal
    • Contact information
    • Purchasing information
    • Submit
    • Author benefits
    • Open access membership
    • Recommend to your library
    • FAQ
    • Help

Royal society publishing

  • ROYAL SOCIETY PUBLISHING
    • Our journals
    • Open access
    • Publishing policies
    • Conferences
    • Podcasts
    • News
    • Blog
    • Manage your account
    • Terms & conditions
    • Cookies

The royal society

  • THE ROYAL SOCIETY
    • About us
    • Contact us
    • Fellows
    • Events
    • Grants, schemes & awards
    • Topics & policy
    • Collections
    • Venue hire
1744-957X

Copyright © 2018 The Royal Society