## Abstract

Bayesian inference about the extinction of a species based on a record of its sightings requires the specification of a prior distribution for extinction time. Here, I critically review some specifications in the context of a specific model of the sighting record. The practical implication of the choice of prior distribution is illustrated through an application to the sighting record of the Caribbean monk seal.

## 1. Introduction

Understanding the timing of species extinctions is of interest to both palaeobiologists concerned with macroevolution and ecologists concerned with the fate of modern species. A variety of methods have been proposed for statistical inference about the extinction of a species based on a record of its sightings [1–3]. Recent work has focused on Bayesian methods [4–6]. The Bayesian approach can be appealing for both philosophical and practical reasons [7]. Among the latter is the fact that standard results for non-Bayesian inference (e.g. that the likelihood ratio statistic has an approximate *χ*^{2} distribution) do not hold in the case of an endpoint like extinction time [8]. As described below, Bayesian methods require the specification of a prior distribution for extinction time. There has been some debate in the literature over aspects of this specification and the purpose of this paper is to lay out the underlying statistical issues in one place. For concreteness, this discussion will be framed in the context of calculating extinction probability under a statistical model in which the pre-extinction sighting rate is constant over continuous time. However, the issues raised here also arise in other situations—e.g. when the sighting rate declines prior to extinction or when interest centres on estimating the time of extinction.

The remainder of this paper is organized in the following way. The next section presents the basic statistical model and Bayesian inference up to the specification of the prior distribution for extinction time. Some alternative choices for this prior distribution are then discussed and the practical implications of these choices are illustrated using the sighting record of the Caribbean monk seal (*Monachus tropicalis*). The final section contains some concluding remarks.

## 2. Statistical model

The situation described here is the continuous version of the discrete one laid out in [9] in a discussion of [5]. Suppose that during the observation period (0, *T*) sightings of a species occur at times *s* = (*s*_{1},*s*_{2}, … ,*s _{n}*). Note that the convention here is that time increases from a baseline in the past toward the present. The reverse convention, in which time increases from the present into the past, is used in palaeobiology. These sightings are assumed to follow a Poisson process with rate function
2.1where

*τ*is the unknown extinction time. That is, the sighting rate is constant before extinction and 0 after. By itself, the total number of sightings—as opposed to their timing—provides no information about extinction time and it is natural to eliminate the parameter

*λ*by conditioning on

*n*. It is a property of the Poisson process that, conditional on

*n*,

*s*represents a realization of

*n*independent random variables uniformly distributed over the interval (0,

*τ*).

Bayesian inference about *τ* is based on its conditional or posterior distribution given *s*. Here and below, I will abuse notation by using the same symbol to denote a random variable and its realized value. By Bayes' theorem, the posterior probability density function (pdf) of *τ* is
2.2where *p*(*s*|*τ*) is the likelihood of *s* given *τ* and *p*(*τ*) is the prior pdf of *τ*. For the model outlined above
2.3where *s*_{(n)} is the most recent sighting time. Note that, for this model, the sighting record enters this likelihood only through *s*_{(n)}: in statistical terminology, *s*_{(n)} is a sufficient statistic. It follows from (2.2) and (2.3) that
2.4The term pr(*τ* > *T*) in the denominator of this expression is the prior probability that the species is not extinct by the end of the observation period. The posterior pdf in (2.4) provides the basis for Bayesian inference about extinction time.

## 3. Prior specification

To evaluate the posterior pdf *p*(*τ*|*s*) in (2.4), it is necessary to specify the prior pdf *p*(*τ*). This prior pdf encodes available knowledge about *τ* that is independent of *s*. This knowledge may come from theory, expert opinion or results for similar situations. For example, if the scientific assumption is made that the instantaneous risk of extinction is constant over time, then the conditional prior distribution for *τ* is exponential with pdf [5]
3.1This pdf is conditional on the parameter *θ*, which is the reciprocal of the expected extinction time. More generally, the exponential model is reasonable if extinction risk is roughly constant over the observation period.

It is important to emphasize that (3.1) is a stochastic model of the lifetime of a species and not a representation of human uncertainty. In the Bayesian formulation, human uncertainty is reflected in a prior pdf *p*(*θ*) for the parameter of this distribution which can be combined with (3.1) to produce the unconditional prior pdf of *τ*
3.2needed to evaluate *p*(*τ*|*s*). A flexible and convenient choice for *p*(*θ*) is the gamma pdf
3.3in which case *p*(*τ*) is the Pareto pdf
3.4

For this prior distribution, and the integral in the denominator of (2.4) must be evaluated numerically.

The problem now is to specify *α* and *β* which, in Bayesian terminology, are referred to as hyperparameters. It is useful here to distinguish between choices of these hyperparameters that are intended to reflect prior information and those that are intended to be neutral or non-informative. Two options in the first instance are the elicitation [10] and combination [11] of the subjective opinions of experts and parametric empirical Bayes methods [12] that essentially fit the hyperparameters using sighting data for similar species. The way in which these approaches are applied depends on the particular situation and, although both can be useful, I will not pursue them here.

In contrast with the informative case, the specification of non-informative hyperparameters can often be based on formal rules—see [13] for a review and critique. The most popular of these is the Jeffreys prior, which ensures that the results of a Bayesian analysis are invariant to alternative parametrizations of the model. For the exponential distribution, the Jeffreys prior corresponds to the limiting case so that For this choice, This pdf is improper (i.e. its integral diverges) and it is not possible to find pr(*τ* > *T*), which is needed to evaluate (2.4). A standard approach in this situation is to approximate the Jeffreys prior by taking *α* and *β* close to 0.

In the context of the exponential model, Alroy [5] proposed a non-informative choice taking *θ* = log 2/*s*_{(n)}, ensuring that pr(*τ* > *s*_{(n)}) = 1/2. The sense in which this is non-informative is unclear. On the technical side, the dependence of *θ* on *s*_{(n)} violates the Bayesian principle that the prior distribution should be specified independently of the data used to update it. Briefly, Bayes' theorem in (2.2) relies on decomposing the joint pdf of *s* and *τ* into the product of the conditional pdf of *s* given *τ* and the unconditional pdf of *τ*. This decomposition is not maintained if the latter is replaced by a function of *s*.

This proposal was modified in [6] by taking *θ* = log 2/*T*, ensuring that pr(*τ* > *T*) = 1/2. The latter is a natural non-informative prior probability in the Bayesian version of testing for extinction, but ensuring it by fixing *θ* conflates the stochastic lifetime model with human uncertainty [14]. Indeed, specifying a single value for *θ* is equivalent to complete knowledge of the instantaneous extinction risk and is, in this sense, fully informative. For example, treating *θ* as known eliminates the possibility of learning about it from the sighting record. This is in contrast with the hierarchical specification outlined above in which the sighting record can be used to update *p*(*θ*).

For an exponential prior distribution for *τ* with fixed parameter *θ*, pr(*τ* > *T*) = exp(−*θT*) and the integral in the denominator of (2.4) involves the incomplete gamma functions and is easily evaluated numerically.

A non-informative prior specification in the spirit of that described in [6] that is not connected to a scientific model of species lifetime but reflects pure human uncertainty is
3.5and
3.6so that, given that extinction has occurred, all possible extinction times are equally likely. For this specification,
3.7Another choice for *p*(*τ*/*τ* < *T*) is the truncated exponential
3.8but, as with (2.5), this would require specifying a prior distribution for *θ*.

## 4. Illustration

Since 1908, there have been seven confirmed sightings of the Caribbean monk seal. The latest of these was in 1952 and, to my knowledge, there has been none since. For this sighting record, *n* = 7, *s*_{(n)} = 44 and *T* = 108 (the year at writing being 2016). Here, I will focus on the posterior probability of extinction
4.1

The prior and posterior probabilities of extinction are reported in table 1 for four different specifications of *p*(*τ*) discussed in the previous section. In this case, as a consequence of the small number of sightings, the effect of the prior on the posterior dominates the effect of the data. If *s*_{(n)} remains at 44 but *n* is increased to 11 then, despite the substantial range of prior extinction probabilities, the posterior extinction probabilities are all at least 0.95. Although the relative influence of the data on the posterior will tend to increase with the number of sightings, the rate at which this occurs will depend on *s*_{(n)} and on the prior. For the latter, note that the effect of increasing *n* to 11 is much greater for the Pareto prior than for the other three specifications. It is also notable that the two cases for which the prior extinction probability is 0.5 have virtually identical posterior extinction probabilities. This underscores the fact that a seemingly innocuous representation of prior ignorance can have a strong influence on the posterior results when the number of sightings is small.

## 5. Discussion

The main message of this paper is that, in Bayesian inference about extinction, it is important to think carefully about the specification of the prior distribution of extinction time. Of the specifications considered here, only the Jeffreys prior has a theoretical justification. On the practical side, it cannot be used directly and an approximation is needed. The prior in [5] not only lacks a clear justification, but its dependence on the sighting record means that it does not produce a true posterior distribution for extinction time. At first glance, the prior in [6] seems like a reasonable representation of prior ignorance but, as discussed above, it is actually strongly informative. The conditional uniform prior avoids this by separating the probability of extinction from the distribution of extinction time conditional on extinction, but is unconnected to a scientific model of extinction risk. As the results for the Caribbean monk seal with *n* = 11 suggest, the choice among these prior specifications can have little practical consequence as the number of sightings in the record increases. This underscores the premium on extending the sighting record, even to the extent of including sightings of questionable reliability, provided this is accounted for in the statistical model [15]. Of course, this is not always an option.

I have focused in this paper on prior specifications that are intended to be non-informative. This is appealing on the grounds of objectivity but, for a variety of reasons, there has been a move in the field of statistics away from non-informative priors. One good reason for this is that, in many cases, prior information is actually available. For example, in the case of the Caribbean monk seal, it is known that the reef fish that constituted its main prey were overfished [16]. Depending on the temporal pattern of this overfishing, this could militate against a prior distribution for extinction time that declines (or is flat) over the observation period. How information like this is encoded in a prior distribution is part of the bread and butter of applied Bayesian statistics.

## Competing interests

I declare that I have no competing interests.

## Funding

There are no funders to report.

## Acknowledgements

The helpful comments of four anonymous reviewers are acknowledged with gratitude.

## Footnotes

One contribution to the special feature ‘Biology of extinction: inferring events, patterns and processes’ edited by Barry Brook and John Alroy.

- Received January 31, 2016.
- Accepted May 10, 2016.

- © 2016 The Author(s)

Published by the Royal Society. All rights reserved.