Royal Society Publishing

Are men better than women at acoustic size judgements?

Benjamin D. Charlton, Anna M. Taylor, David Reby


Formants are important phonetic elements of human speech that are also used by humans and non-human mammals to assess the body size of potential mates and rivals. As a consequence, it has been suggested that formant perception, which is crucial for speech perception, may have evolved through sexual selection. Somewhat surprisingly, though, no previous studies have examined whether sexes differ in their ability to use formants for size evaluation. Here, we investigated whether men and women differ in their ability to use the formant frequency spacing of synthetic vocal stimuli to make auditory size judgements over a wide range of fundamental frequencies (the main determinant of vocal pitch). Our results reveal that men are significantly better than women at comparing the apparent size of stimuli, and that lower pitch improves the ability of both men and women to perform these acoustic size judgements. These findings constitute the first demonstration of a sex difference in formant perception, and lend support to the idea that acoustic size normalization, a crucial prerequisite for speech perception, may have been sexually selected through male competition. We also provide the first evidence that vocalizations with relatively low pitch improve the perception of size-related formant information.

1. Introduction

Vocal tract resonances or ‘formants’ are the key acoustic parameters underlying the phonetic diversity of human speech (for an overview, see [1]). However, they can also provide non-linguistic information about potentially important biosocial dimensions of speakers. In particular, lower and more closely spaced formant frequencies are indicative of larger speakers [2,3] because vocal tract length and body size are correlated in humans [4] and longer vocal tracts produce lower formants [5]. Hence, the entire formant pattern is scaled down or up in larger and smaller speakers, respectively. In addition, because physical size often determines the outcome of competitive interactions, the use of formants for assessing body size from vocal signals may have been important in our ancestors for reliably assessing the quality or competitiveness of potential mates and/or rivals [1]. Indeed, recent studies have shown that formant spacing is a reliable cue to body size in several non-human mammal species that can play a functional role in female mate choice and male–male competition (reviewed in [6]). Other recent work [710] has shown that human listeners rate speakers with lower formants as sounding larger, more dominant, masculine and attractive.

This body of work not only suggests that formants are used by humans and non-human mammals to assess potential mates and rivals but also indicates that formant perception, which is crucial for speech perception [1,5,11,12], may have evolved through sexual selection. Furthermore, whereas men appear to use formants to judge the physical dominance of potential rivals [8], formants are not consistently found to predict women's attractiveness ratings of men's voices [2,9]. As a result, we may expect men to have more acute perception of size-related formant information in vocal signals. Surprisingly though, no previous studies have investigated whether men and women actually differ in their ability to use formants to make auditory size judgements.

The primary aim of the current study was to investigate whether men and women differ in their ability to make relative size judgements using small differences in the formant spacing of synthetic stimuli representing different size animals. We predict that listeners will rate stimuli with lower formants as coming from larger animals (humans or otherwise), and that male listeners will be better than female listeners at this task. In addition, we also examined comparison performance over a wide range of fundamental frequencies (the main determinant of vocal pitch, hereafter F0). Formant perception in human speech is compromised at higher F0s (e.g. [11]) presumably because formant peaks become poorly resolved as the density of harmonics sampling the formant envelope decreases below a certain threshold [13]. Thus, based on the assumption that lower F0 improves the perceptual salience of formants, we predict that the ability to categorize the apparent size of the vocal stimuli will improve as F0 decreases.

2. Material and methods

(a) Subjects

The study was conducted at the Bader International Study Centre, East Sussex, UK. A total of 55 college undergraduates (18 males and 37 females) completed the experiment. Participants were aged between 17 and 20 years. All participants gave informed consent.

(b) Stimuli

We synthesized a set of vocal stimuli representing different sized animals using Praat 5.1.32 DSP package (, and following the principles of the source-filter theory of voice production [5]. The stimuli consisted of a 1 s long harmonic complex tone (the ‘source’) combined with a formantGrid pattern (the ‘filter’) with equally spaced formants so that it approximates an idealized uniform straight tube (or an unperturbed vocal tract). The formant pattern consisted of 10 formants with an overall formant spacing of 1100 Hz (corresponding to a vocal tract length of 15.9 cm), which falls within the typical human range ([5]; for more details see the electronic supplementary material). The stimuli were arranged in matched pairs so that stimuli with the original formant pattern (baseline condition) were followed 0.5 s later by stimuli that had been rescaled by shifting all of the formants up or down by 1–5% (figure 1). Stimulus pairs were created with F0s of 10, 20, 40, 80, 160 and 320 Hz. These F0 values encompass the F0 range of the human speaking voice [5] and allowed us to test the ability of listeners to detect small differences in apparent size across a wide range of F0s (examples of the stimuli are provided as electronic supplementary material).

Figure 1.

(af) Spectrograms to illustrate the experimental stimuli at the six different pitch classes (spectrogram settings: window length = 0.05 s, frequency step = 20 Hz, dynamic range = 40 dB, Gaussian window shape). In each example, the formants are shifted up by 5% in the second presentation. The formants are labelled F1–F4.

(c) Experimental procedure

The stimuli were presented through JVC HA-S360 professional headphones (London, UK) at a comfortable pre-set volume. Participants were informed that they would hear pairs of audio stimuli representing two different animals, and that their task was to decide which one sounded ‘larger’ by clicking on the appropriate button on the computer screen. Each participant received 60 unique stimulus pairs representing the six pitch classes (10–320 Hz) with the formants shifted up or down 1–5%. Custom-written software in Python v. 2.6 was used to randomize stimulus presentation and collect responses, and a generalized linear model fitted with maximum likelihood estimation was used to examine variation in listeners’ size categorization performance (see the electronic supplementary material for further details).

3. Results

Male participants were significantly better at classifying the apparent size of stimuli than female participants (Wald Embedded Image, p = 0.034) (figure 2a). In addition, significant main effects of formant rescaling (Wald Embedded Image, p < 0.001) and pitch (Wald Embedded Image, p = 0.017) on the proportion of correct size judgements made by listeners were revealed: in particular, listeners were better at categorizing low-pitched stimuli according to their apparent size than they were at categorizing high-pitched stimuli (figure 2b), and size categorization performance increased steadily as the difference in formant rescaling between the baseline and test stimulus increased from 1–5% (figure 2c). No statistically significant interaction effects were observed (gender × pitch: Wald Embedded Image, p = 0.945; gender × formant condition: Wald Embedded Image, p = 0.700; pitch × formant condition: Wald Embedded Image, p = 0.385; gender × pitch × formant condition: Wald Embedded Image, p = 0.934).

Figure 2.

(a) Proportion of correct classifications ± s.e.m. made by male and female participants, (b) relationship between the proportion of correct classifications and stimulus pitch, and (c) proportion of correct classifications ± s.e.m. for the different formant rescaling conditions.

4. Discussion

We found that men were significantly better than women at using small differences in the formant spacing of synthetic vocal stimuli to make relative size judgements. This sex difference was consistent for shifts in apparent size of 1–5% and across a wide range of F0s (from 10 to 320 Hz), as indicated by the absence of significant interaction effects. The fact that untrained men are better than women at spontaneously using the formant structure of vocal stimuli to correctly compare their apparent size is consistent with studies showing that women are more reliant on voice pitch than formants when they rate the attractiveness of male voices [9], whereas men tend to use formant spacing for dominance attributions [8]. While men also appear to be better at perceiving temporal and tonal contrasts in speech and non-speech sounds [14,15], to our knowledge, our results represent the first demonstration of a sex difference involving human formant perception.

Furthermore, the ability to perceive formant frequency spacing is crucial for the perception of speech sounds because the human auditory system needs to normalize the size-related formant variation in speech sounds produced by differently sized speakers with different vocal tract lengths, in order to retrieve the phonetic information encoded in the relative, rather than absolute position of formant frequencies [12]. This ‘size normalization’ appears to be applied to all sounds at a relatively early stage in auditory processing [16], suggesting that humans have dedicated perceptual mechanisms for automatically processing size-related formant information. Our results show that this ability is more developed in men, and support the idea that sexual selection might have played a role in the evolution of this key prerequisite of speech perception [1]. Future studies could aim to reveal whether sex differences in the auditory processing of size-related information in vocal signals also exist at a neurological level.

In addition, we have shown that the ability of human listeners to classify the apparent size of synthetic non-speech sounds varying only in their formant spacing is greater in stimuli with low F0. Low F0 vocalizations are predicted to be particularly well suited for highlighting formants because the dense harmonic spacing should allow the formant peaks to be more clearly resolved [13]. Furthermore, ‘pulsatile’ vocalizations, where there is no pitch percept and the individual glottal pulses are heard as separate events, should be ideal for the auditory discrimination of formant frequencies because they have no perceivable pitch and each of the discrete pulses contains energy across a broad frequency range, making it likely that formant-related information is emphasized. Interestingly, the vocal repertoires of several animal species include vocalizations characterized by very low F0 that may function to increase the salience of formant-related information [17,18]. Our results provide the first empirical support that lowering F0 does indeed improve the perception of size-related formant information.

We suggest that future studies investigate whether sex differences in the processing of size-related formant information exist in non-human mammals, and examine whether the sex difference we have reported in human listeners is specific to human voice-like sounds or generalizes to other resonant sources. Finally, it is also important to note that the sex difference in size discrimination we report in the current study could be innate or acquired or both. Hence, while it is compatible with the hypothesis that men rely on size assessment more than women, it does not conclusively demonstrate that these abilities arose through sexual selection. For example, it is possible that males learn to cue on size-related information in vocal signals more than females because this information is more important to them during their everyday social interactions. There may also be key differences across cultures, particularly in societies where gender roles differ markedly. Thus, future studies that examine the effects of training and personality, as well as social and cultural factors on the development of human auditory size discrimination, are also warranted.


A Leverhulme Trust Early Career Fellowship awarded to Benjamin D. Charlton financially supported this work. The University of Sussex Research Ethics Committee approved the study (BC0312).

  • Received March 25, 2013.
  • Accepted May 8, 2013.


View Abstract