## Abstract

Because the fossil record is incomplete, the last fossil of a taxon is a biased estimate of its true time of extinction. Numerous methods have been developed in the palaeontology literature for estimating the true time of extinction using ages of fossil specimens. These methods, which typically give a confidence interval for estimating the true time of extinction, differ in the assumptions they make and the nature and amount of data they require. We review the literature on such methods and make some recommendations for future directions.

## 1. Introduction

Palaeontologists have long been aware that the age of the last or youngest known fossil of a taxon inevitably underestimates its true time of extinction, because it is unlikely that the taxon's last individual will be preserved as a fossil and later recovered. Before 1980, there was little motivation for calculating high-precision estimates of times of extinction. However, the landmark hypothesis by Alvarez *et al.* [1] that the end-Cretaceous mass extinction was caused by a bolide impact brought new motivation to the study of extinction. The Alvarez hypothesis implied that a large number of taxa should have gone extinct simultaneously (or nearly so) coincident with the impact. A literal reading of the fossil record did not seem to support such a pattern of extinctions; instead, most taxa appeared to go extinct sequentially before the impact. But a seminal paper by Signor & Lipps [2] pointed out that owing to the incompleteness of the fossil record, the appearance of a gradual extinction would be expected even if the extinctions were truly simultaneous. According to Signor and Lipps, the size of the gap between the youngest fossil found and the true time of extinction will vary across taxa, thus giving the appearance that different taxa went extinct at different times preceding the impact. The recognition of this so-called Signor–Lipps effect stimulated an interest in estimating true extinction times with high precision, which has continued to this day.

Here we review quantitative methods for estimating times of extinction for fossil taxa, particularly confidence interval approaches that place bounds on possible extinction times. We note that there is a substantial body of work on inferring extinctions in the ecology and conservation biology literature [3]. In that literature, the goal is typically to determine the probability that a modern taxon is extinct based on past sightings, rather than estimating the time of extinction of a taxon known to be extinct (but see [4]). Here our focus is on methods from the palaeontology literature addressing the latter issue. We divide the methods into first-, second- and third-generation approaches depending on the assumptions made and information used in deriving the estimate (table 1). First-generation methods assume uniform preservation and recovery of fossils—that fossils are equally likely to be found at any time when the taxon was extant. Second-generation methods allow for non-uniform recovery, either by requiring quantitative information about fossil recovery potential, or by attempting to explicitly or implicitly infer recovery potential from the pattern of fossil occurrences. Third-generation methods allow for non-uniform recovery by explicitly modelling stratigraphic and environmental factors that affect fossil preservation. We review work in each category and offer a prospectus for future research.

## 2. First-generation methods

Strauss & Sadler [5] were the first to propose a formal method for estimating the time of extinction using a confidence interval approach. They assumed that a confidence interval would take the form of a range extension—an extension of the taxon's known temporal range in the fossil record, beyond its last fossil occurrence. They then determined the width of the extension needed to contain the time of extinction with a specified level of confidence. (A more approachable derivation is given in [20].) Marshall [27] expanded upon their approach, providing new applications for their method.

Strauss & Sadler [5] assumed uniform preservation and recovery of fossils. Although unrealistic, this assumption greatly simplifies mathematical computation. We refer to methods that make the assumption of uniformity as first-generation methods. (Such methods were called ‘Class 1’ methods by Rivadeneira *et al.* [3].) Under this assumption, Wang *et al.* [13] proved that the Strauss and Sadler interval is optimal in the sense that it has the shortest average width among intervals that are invariant to measurement scale.

Several authors adopted a Bayesian approach to estimating the time of extinction. Bayesian methods treat the time of extinction as a random variable rather than a fixed parameter as in the frequentist (classical) paradigm. The goal is then to estimate the posterior distribution, which summarizes our knowledge about the time of extinction given the observed data, and upon which point estimates and credible intervals of the time of extinction are based. The first Bayesian method was described in [5] and has received surprisingly little attention. A variant of this method was described in [28], who used a different prior distribution and a slightly different model. Another method was proposed in [6], which improved on previous methods by using a more realistic discrete sampling model.

Confidence intervals have also been applied to higher-order questions—those arising from a group of taxa rather than a single taxon. An example of such a question was whether taxa in a locality were extirpated simultaneously, a key question for testing the Alvarez impact hypothesis. Marshall [9] and Marshall & Ward [29] applied the classical intervals of Strauss & Sadler [5] to groups of Late Cretaceous ammonites, showing that their fossil record was consistent with a simultaneous extinction and thus impact. Wang & Marshall [19] extended the methodology of these papers to improve the precision of resulting confidence intervals, and Wang *et al.* [13] further adapted the classical intervals of Strauss & Sadler [5] to estimating the common extinction time of multiple taxa.

Other first-generation methods were also developed to evaluate higher-order questions. Springer [7] based a test for simultaneous extinction of a group of taxa on the uniformity of *p*-values under the null hypothesis, and Solow [10] derived a likelihood ratio test for simultaneous extinction and a confidence interval for the age of such an extinction. Solow *et al.* [8] introduced a model accounting for radiocarbon dating error and used a maximum-likelihood approach to compare the times of extinction of Pleistocene megafauna. Wang & Everson [12] described a similar hypothesis test and confidence interval for the separation between two extinction pulses, which they later generalized to a confidence interval for the duration of an extinction event occurring in any number of pulses [14]. The latter reframed the question of whether an extinction event was simultaneous or gradual by asking instead, how gradual was it? Marshall [9] also provided a way of assessing the range of gradual extinction scenarios consistent with a given fossil record. Further work reframed the question in a different way, computing a confidence interval on the number of pulses in which the extinction occurred, with a single pulse being equivalent to a simultaneous extinction [15].

## 3. Second-generation methods

It was apparent early on that the assumption of uniform recovery potential was mathematically convenient but unrealistic. Accordingly, a long-standing goal has been to relax this assumption and allow non-uniform recovery potential. We refer to methods that do so, without requiring detailed information on stratigraphic architecture, as second-generation methods.

To accomplish this goal, some methods required that the pattern of recovery potential be known quantitatively. These include methods [18,20] that generalized the approach of Strauss & Sadler [5] to the case when recovery potential was non-uniform and known. (Such methods were called ‘Class 2’ methods by Rivadeneira *et al.* [3].) The former method was used by Labandeira *et al.* [30] in their study of Late Cretaceous plant–insect associations.

However, it would be ideal to have a method that allows for non-uniform recovery without prior knowledge of the recovery function, which may be hard to come by. (Such methods were called ‘Class 3’ methods by Rivadeneira *et al.* [3].) An early attempt [16] made the restrictive assumption that the size of gaps between fossil occurrences is not correlated with time—an assumption that would be violated in many scenarios, for example, when recovery potential is steadily increasing or decreasing [31]. In fact, the only situation where this assumption is valid is when there is a catastrophic extinction over a time interval that has negligible stratigraphic thickness. Solow [21] proposed a method that, like [5], is based on the size of the gaps between fossil occurrences. However, whereas the method of Strauss & Sadler [5] is based on the size of the average gap between all fossil occurrences, the method of Solow [21] is based on only the gap between the last two occurrences. A different method was used by Roberts & Solow [4] to estimate the time of extinction of the dodo based on recorded sightings. This optimal linear estimation (OLE) method is based on the fact that the joint distribution of the last fossil occurrences approximately follows a Weibull extreme value distribution under a broad range of conditions. This method was found to provide ‘generally accurate and precise estimates' ([32], p. 345), and was also recommended by Solow [33]. Another approach [22] was based on a first-generation method from the conservation biology literature [34]. This method, which gives higher weight to occurrences closer to the time of extinction and bases confidence limits on resampled dates to account for dating uncertainty, was recommended by Saltré *et al.* [35].

Several second-generation methods have adopted a Bayesian approach. Weiss *et al.* [17] extended the method of [6] to allow for non-uniform recovery potential based on abundance counts. Alroy [23] describes a simple method based on a discrete sampling model, which appears to work well in simulations when recovery potential is uniform or decreasing, although it is less effective when recovery potential is increasing. Alroy [23] also presents a variant based on runs of presences or absences, which better accounts for non-uniform recovery as long as ‘a reasonable sample size’ ([23], p. 597) is available. Wang *et al.* [24] propose a method that explicitly models recovery potential using a modified beta distribution, which is able to take on a variety of increasing and decreasing shapes. Their method performed well in simulations even for sample sizes as small as five occurrences.

## 4. Third-generation methods

Research over the past several decades has made substantial advances in understanding the effect of environmental factors and stratigraphic architecture on the spatial and temporal distribution of taxa, and thus the probability of preservation and recovery of fossils [36]. We refer to methods that account for such information as third-generation methods. These methods differ in that they attempt to infer the process—a causal model or mechanism—rather than just the pattern of fossil occurrences.

For instance, gradients in environmental factors such as water depth [37–39], substrate consistency [38,40] and salinity [41] are known to correlate with the distribution of marine taxa, thereby influencing recovery potential. Sequence stratigraphic principles have also informed our understanding of the distribution of fossils, such as the size of gaps [42] and the clustering of first and last occurrences around sequence boundaries [42–45]. In fact, recent work has shown that last occurrences may reflect stratigraphic architecture more so than actual times of extinction, implying that the pattern of last occurrences is not simply the result of the Signor–Lipps effect [46]. In such cases, incorporating stratigraphic information will be essential for inferring the true time of extinction. Holland [25] and Schueth *et al.* [26] give examples of third-generation analyses. Holland's approach [25] uses Marshall's second-generation method [18] to compute confidence intervals, although the accuracy of the inferred fossil recovery potentials has been questioned [47]. A drawback of such methods is that they typically require large datasets with multiple taxa; Holland ([25], p. 476) recommends at least ‘20–30 taxa and 50–60 samples’.

## 5. Going forward

It might seem that second-generation methods should have rendered first-generation methods obsolete, and that third-generation methods should have done the same to second-generation methods. But nearly all analyses carried out thus far, including those in the past few years, have used first-generation methods [48–51]. Why have first-generation methods persisted? First, the Strauss & Sadler [5] method is widely known and well-established as the standard among first-generation methods, but no such standard exists among the various second-generation methods. Instead, among second-generation methods there is a confusing variety of choices, each with different strengths and weaknesses [3,24,35] and no known optimal solution as in the case of first-generation methods [13], and, therefore, a lack of clarity or consensus on which method to use. Second, some second-generation methods require quantitative knowledge of recovery potential, which is not typically available. Third, palaeontologists are typically interested in estimating not just the time of extinction of a single taxon, but rather in higher-order questions arising from groups of taxa, particularly whether all taxa went extinct simultaneously, and most methods for answering such higher-order questions are first-generation methods (although see [19] for an exception).

One could also argue that third-generation methods are not truly competing with first- and second-generation methods. The former typically require multiple taxa with a large number of occurrences and attempt to infer or model factors such as water depth, abundance, etc., whereas the latter may be attempting to estimate the time of extinction of one or a small number of taxa known only from a handful of occurrences. The choice of which type of method to use may therefore be determined primarily by the limitations of one's data. Certainly, if large sample sizes and detailed knowledge of stratigraphic architecture are available, a third-generation method is preferable. But a palaeontologist who wants to estimate the time of extinction of a single taxon having only five occurrences will not be able to carry out the kind of analyses done in [25], for example. In such a situation, second-generation methods such as those of Roberts & Solow [4] and Wang *et al.* [24] may be the best option. These methods often produce wide confidence intervals, sometimes to the point of being uninformative, particularly when applied to small datasets. This should not be construed as a drawback of such methods, however; it is a consequence of the fact that they account for variability that first-generation methods ignore, and therefore more accurately represent the uncertainty in our knowledge. Unfortunately, excessively wide intervals have sometimes led researchers to focus on the point estimate of the time of extinction and disregard the confidence interval [35,52]—a practice that may give an overly optimistic sense of the precision of the estimate.

So if second-generation methods are made obsolete by third-generation methods when data and information are plentiful, and excessively wide otherwise, what is the use of second-generation methods going forward? Confidence interval methods were historically developed for datasets from a single or composite section, but a promising use might be for datasets from global databases such as the Paleobiology Database (http://www.paleobiodb.org). Such datasets may have large enough sample sizes for second-generation methods to give reasonably precise intervals, but often lack the stratigraphic information to make use of third-generation methods. One such example is a study of the time of extinction of *Carcharocles megalodon*, the largest shark known [52]. This study used a dataset of 53 occurrences compiled from the Paleobiology Database and the primary literature. Despite the large sample size, the resulting confidence interval (using the second-generation method of Roberts & Solow [4]) was nonetheless rather wide owing to poorly resolved ages of many of the fossil occurrences. Nonetheless, this study provides an example of the kind of dataset for which second-generation methods may be well-suited and third-generation methods are inapplicable.

Another future application of second-generation methods may be answering higher-order questions involving groups of taxa, such as whether several taxa went extinct simultaneously and when, how much time separated pulses of extinction, or how many pulses were involved in a mass extinction. Currently, most of the methods that have been developed for answering such questions have assumed uniform recovery [7–12,14,15,29]. Generalizing such methods to account for non-uniform recovery potential will be an important step in making them more geologically realistic.

## Authors' contributions

S.C.W. and C.R.M. wrote the paper.

## Competing interests

We have no competing interests to declare.

## Funding

No funding was obtained for this paper.

## Acknowledgements

We thank the editors for inviting us to contribute to this special feature. We are also grateful to S. Chang, S. M. Holland and A. R. Solow for their assistance, and to three anonymous referees for helpful reviews.

## Footnotes

An invited contribution to the special feature ‘Biology of extinction: inferring events, patterns and processes’.

- Received November 25, 2015.
- Accepted April 4, 2016.

- © 2016 The Author(s)

Published by the Royal Society. All rights reserved.