A multiple-site similarity measure

Ola H Diserud, Frode Ødegaard

Abstract Similarity measures are among the most intuitive and common measures for comparing two or more sites, or samples, with respect to their species overlap. A restriction of similarity measures is that they are limited to pairwise comparisons even in a multiple-site study. This work presents a multiple-site similarity measure that makes use of information on species shared by more than two sites and avoids the problem of covariance between pairwise similarities in a multiple-site study. Further, we show that our multiple-site similarity measure is related to β-diversity measures such as Whittaker's β-diversity. Similarity measures can also be used as descriptors of effective specialization of insects to host species by measuring similarity from host observations. Finally, we show that multiple-site similarity and host specificity are two sides of the same coin.

Keywords:

1. Introduction

Understanding spatial patterns of species diversity is a crucial topic in ecology and conservation biology, for instance, when predicting species richness from local to regional scales (MacArthur 1965; Cornell 1985; Ricklefs 1987; Thomas 1990; Gering & Crist 2002). One approach to partition species diversity is to define α-diversity as within-habitat diversity, β-diversity as a measure of between-habitat diversity (within landscape) and γ-diversity as within-landscape diversity (Magurran 2004). The α- and γ-diversities measure inventory diversity (e.g. number of species), whereas the β-diversity describes differentiation diversity (the change in species composition between two or more habitats; Whittaker 1960; Koleff et al. 2003; Magurran 2004). There exists a wide variety of methods for measuring β-diversity, among which similarity measures are the simplest and the most commonly used for calculating β-diversity from abundance or presence/absence data (Wolda 1981; Koleff et al. 2003).

Most evaluations of similarity between multiple sites are based on the average, or plots, of pairwise similarities (e.g. Lennon et al. 2001; Vellend 2001; Condit et al. 2002; Basset et al. 2004). Information on the identity of species shared across more than two sites is not preserved, so average similarity across all sites does not tell us to what extent there is a change in shared species between pairs. This approach also ignores the problem of covariance between similarities, since some pairs must share the same site (Ødegaard et al. 2005). Pairwise comparison of neighbouring sites will suffice if the goal is to look at how species composition changes along a physical or environmental gradient, but if we view our sites as a random collection of samples from a larger region, such as an island or a landscape, a multiple-site similarity measure is required.

2. Material and methods

(a) Multiple-site similarity

All similarity indices represent variations over three parameters: species composition in each of two sites and the species shared between the two sites (Novotny & Weiblen 2005). The widely used Sørensen similarity index (Magurran 2004) measures similarity in species composition for two sites, A and B, by the equationEmbedded Image(2.1)where a is the number of species found in site A; b is the number of species in site B and ab is the number of species shared by the two sites.

For studies where more than two sites are evaluated, the overall similarity is calculated as the average of the pairwise similarities. As an illustration of the shortcomings of such an approach, we can look at two hypothetical cases. Let case 1 have three sites with four species in each: [(s1,s2,s3,s4), (s1,s2,s5,s6), (s1,s2,s7,s8)], where si is species number i. The similarity is the same for all pairs of sites, CS=4/8=1/2, with average similarity also equal to 1/2. Case 2 also has three sites with four species in each, but with a different distribution: [(s1,s2,s3,s4), (s1,s2,s5,s6), (s3,s4,s5,s6)]. The similarity is still CS=1/2 for all pairs, so the Sørensen similarity index does not ‘see’ the difference in species composition between the two cases. Using traditional similarity measures on assemblages with more than two sites, we will never do more than compare two sites at a time and thereby ignore ‘higher order similarities’.

We will now suggest a multiple-site similarity measure and start with the situation where we have three sites in a study. We follow the notation from equation (2.1), with a, b and c the numbers of species found in sites A, B and C, respectively, and ab the number of species shared by sites A and B, etc., until abc which is the number of species found in all three sites. Extending the approach of the Sørensen similarity index, a foundation for a three-site similarity measure can beEmbedded Image(2.2)The numerator gives the number of species counts exceeding the first; and the denominator gives the sum of species counts over all the sites. This expression will equal 2/3 if all species are shared by all sites, since a species can contribute at most two times in the numerator and three times in the denominator. The three-site similarity measure should therefore be Embedded Image in order to be in the range 0–1, with 1 indicating complete similarity. The general multiple-site similarity measure for T sites can be formulated in the same mannerEmbedded Image(2.3)where ai is the number of species in site Ai, i=1, …, T; aij is the number of species shared by sites Ai and Aj; and aikj is the number of species shared by sites Ai, Aj and Ak, etc. With T=2, we are back at the definition of the Sørensen similarity index (equation (2.1)). The total number of species in the T sites can, by the inclusion–exclusion principle, be written as Embedded Image, simplifying the notation of our multiple-site similarity measure toEmbedded Image(2.4)For the two hypothetical cases discussed earlier, we get Embedded Image for case 1 and Embedded Image for case 2. Our multiple-site similarity measure evaluates the sites in case 2 as more similar than the sites in case 1, which is in agreement with the assumption that evenness in the number of site observations for the species should be valued more, i.e. the similarity measure increases with a more even distribution of site observations. For case 2, we also obtain a lower total number of species (γ-diversity), indicating a lower species turnover, hence a higher similarity.

Both cases 1 and 2 have covariance 0 between pairwise similarities, since all similarities are equal to 1/2. With T=3, all pairs of similarities must necessarily be dependent, since they all share one site. The effect of covariance between pairwise similarities on average similarity will depend on the sign and magnitude of the covariance, as well as the proportion of independent pairwise similarities (Ødegaard et al. 2005). To illustrate one possible effect of covariance, let case 3 also have three sites with four species in each: [(s1,s2,s3,s4), (s1,s2,s3,s4), (s4,s5,s6,s7)]. Here, the covariance between pairwise similarities is negative. The average similarity is still 1/2, but now Embedded Image.

(b) Multiple-site similarity versus β-diversity and host specificity

β-diversity is essentially also a measure of how similar sites are in terms of the variety of species found in them. A high similarity indicates that there are few species differences between sites, yielding low β-diversity values. One of the most straightforward measures of β-diversity is Whittaker's (1972) measure, Embedded Image, where ST is the total number of species; and Embedded Image is the average species richness for the T sites. The link between Sørensen's similarity measure for two sites and β-diversity measures is well known (Koleff et al. 2003). The relation between our multiple-site similarity and Whittaker's βW is simplyEmbedded Image(2.5)If all sites contain the same species, both Embedded Image and βW will equal 1. If no sites share species, Embedded Image and βW=T, indicating that the total number of species ST is just the product Embedded Image.

If, instead of species-sites data, we are studying host observations of, for example, phytophagous insect species on host plant species, the comparison of species compositions on different host plants can be performed by both similarity and host-specificity measures. The host specificity calculated from trophic interactions is defined as Embedded Image (Ødegaard et al. 2000; Novotny et al. 2002), where ST is the total number of insect species found on T host plant species; Embedded Image is the average number of insect species associated with each host plant species; and T is the number of host plant species in the study. The product Embedded Image is thereby the total number of host observations. Host specificity views all host plant species simultaneously and can be considered a ‘multiple host dissimilarity measure’. The link between our multiple-site similarity measure and host specificity isEmbedded Image(2.6)Note also that FT=βW/T. If all species are shared by all hosts, the host specificity is 1/T and the multiple-site similarity equals 1. With no species overlap, host specificity equals 1 and similarity becomes 0. If we regard our first two hypothetical cases as host observations of insect species on three different host species, we get host specificities 2/3 and 1/2, respectively. Case 1 has more monophagous species; therefore, it should also have higher host specificity.

3. Discussion

The proposed similarity measure for multiple sites Embedded Image (equation (2.4)) provides a more relevant index for species' spatial distribution. Instead of calculating the average over a set of dependent pairwise similarities, we make use of information on the identity of species shared across more than two sites. For a given number of sites T, Embedded Image decreases with increasing number of ‘rare’ species, i.e. species observed in only one or a few sites. Conversely, Embedded Image increases with increasing number of species observed in several sites. The multiple-site similarity measure can be regarded as a linear function of Whitaker's β-diversity and host specificity, thereby inheriting their statistical properties.

Similarity measures are generally believed to be underestimates (e.g. Lande 1996), i.e. true similarity between sites is biased downwards when estimated from random samples. This is often illustrated by simulating random samples from the same community with true similarity equal to 1. But for situations where true similarity is less than 1, the direction of the possible bias will depend on species abundance distributions within sites as well as species turnover between sites, especially whether rare species are shared between sites or not. For example, assume a larger region where some species are abundant at all sites, and for each site, some additional rare species are present at this site only. Small sample sizes from each site may include the dominant species only, thereby overestimating similarity between sites even if species richness at each site and in total are underestimated. In tropical forests, calculations of similarity tend to be underestimated owing to the dominance of rare species in the species pool (Mawdsley 1996; Stork 1997; Ødegaard 2006) and small sample sizes (Chao et al. 2000).

The multiple-site similarity measure has been based on presence/absence data, but this approach can be modified to handle abundance data as well. The abundance-based similarity measure used as the foundation for the modification should be chosen so that the effects of small sample sizes and varying sampling efforts are minimized (e.g. Chao et al. 2006).

In many studies, applying pairwise similarities between multiple sites may be an appropriate approach, typically when we want to evaluate species turnover along an environmental gradient. However, when the sites are viewed as a random set of observations from a region, evaluating overall similarity from the average of the pairwise similarities can be misleading. The pairwise similarities are not all independent, since each site is included in T−1 pairs, and they ignore information on species shared among more than two sites. As our multiple-site similarity measure Embedded Image solves these matters, it is more consistent with multiple-site β-diversity. We have shown that the parameters multiple-site similarity Embedded Image, Whittaker's β-diversity (Whittaker 1972) and host specificity FT (considering the hosts as sites) measure the same characteristics of community structure, with simple transformations based on the number of sites, or hosts, T.

Footnotes

    • Received September 7, 2006.
    • Accepted September 27, 2006.

References

View Abstract