The recombinational environment influences patterns of molecular evolution through the effects of Hill–Robertson interference. Here, we examine genome-wide patterns of gene expression with respect to recombinational environment in Drosophila melanogaster. We find that regions of the genome lacking crossing over exhibit elevated levels of expression, and this is most pronounced for genes on the entirely non-crossing over fourth chromosome. We find no evidence for differences in the patterns of gene expression between regions of high, intermediate and low crossover frequencies. These results suggest that, in the absence of crossing over, selection to maintain control of expression may be compromised, perhaps due to the accumulation of deleterious mutations in regulatory regions. Alternatively, higher gene expression may be evolving to compensate for defective protein products or reduced translational efficiency.
The recombinational environment is known to influence patterns of molecular evolution through the effects of Hill–Robertson interference (Hill & Robertson 1966; Felsenstein 1974), such that regions of the genome with no recombination or low rates of recombination experience a reduced effective population size (Ne). This leads to reduced rates of adaptive evolution and a lower efficacy of selection against deleterious mutations (Kimura 1983). In Drosophila, several studies have examined the effects of recombination on patterns of evolution, and showed that linkage impedes the ability of natural selection both to incorporate beneficial mutations and remove deleterious ones (Betancourt & Presgraves 2002; Marais et al. 2004; Presgraves 2005; Bartolomé & Charlesworth 2006; Bachtrog et al. 2008; Larracuente et al. 2008).
In a genome-wide comparison of Drosophila melanogaster and Drosophila yakuba, Haddrill et al. (2007) found that regions of the genome that completely lacked crossing over exhibited elevated rates of non-synonymous and long intron divergence, as well as reduced codon usage bias, and attributed this to a severe reduction in the efficacy of selection, particularly on the fourth (dot) chromosome, which lacks crossing over under normal conditions (Ashburner et al. 2005). Consistent with previous studies (Betancourt & Presgraves 2002; Marais et al. 2004), Haddrill et al. (2007) also noted a negative relationship between the rate of non-synonymous divergence (measured using both the codon-based measure dN (Yang 1997) and the nucleotide-based measure KA (Comeron 1995); see Bierne & Eyre-Walker (2003) and Haddrill et al. (2007) for discussions of the two measures) and the level of codon usage bias (measured by Fop, the frequency of optimal codons).
There is, however, a well-established positive correlation between Fop for a gene and its level of expression (Duret & Mouchiroud 1999; Marais et al. 2001, 2004) and a negative correlation between the level of expression and non-synonymous divergence (Marais et al 2004; Larracuente et al. 2008). This raises an interesting alternative possible explanation for the results of Haddrill et al. (2007). If the genes found in the non-crossover regions of the Drosophila genome are a group of genes that are expressed at a very low level, the elevated levels of non-synonymous divergence and low codon usage bias observed on the D. melanogaster/D. yakuba fourth chromosome (and other non-crossover regions) could simply be explained by the relationships between gene expression, non-synonymous divergence and codon usage bias.
Here, we report a re-analysis of the dataset analysed by Haddrill et al. (2007) combined with genome-wide expression data. Surprisingly, we find that expression levels are elevated in non-crossing over regions, and this pattern is particularly strong for the fourth chromosome genes. Consistent with Haddrill et al. (2007), we find no evidence for differences in gene expression between regions of high, intermediate and low crossing over. The effects of gene expression cannot, therefore, explain the elevated levels of non-synonymous divergence and low codon usage bias in the non-crossover regions of the Drosophila genome.
2. Material and methods
The Haddrill et al. (2007) dataset was combined with genome-wide expressed sequence tag (EST) data from D. melanogaster, downloaded from the NCBI UniGene database (http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=&db=unigene), which contains counts of the number of ESTs associated with each gene in a number of different tissue types. We used the total number of EST counts across all tissue types as the measure of gene expression. We were able to extract expression profiles for 7398 of the 7612 genes in the Haddrill et al. (2007) dataset, including 3752, 2477, 1082 and 87 genes in regions of high, intermediate, low and no crossing over, respectively, thereby including all non-crossover genes analysed by Haddrill et al. (2007). The non-crossover category was further subdivided into 67 fourth chromosome and 20 non-fourth chromosome genes. We refer to crossing over rather than recombination, because there is evidence that gene conversion occurs in the regions of the D. melanogaster genome with very low or zero frequencies of crossing over (Langley et al. 2000; Jensen et al. 2002; Gay et al. 2007).
The recombinational environment had a highly significant effect on the levels of gene expression (Kruskal–Wallis test: H=69.12, p<10−5), and this effect was entirely due to the non-crossover regions, since no significant differences in expression levels were found between high, intermediate and low crossover regions (H=2.28, p=0.31). Figure 1 shows that non-crossover regions exhibit elevated levels of expression compared with the rest of the genome, with median number of EST counts almost four times higher for the non-crossing over region (median=77) than the high (20), intermediate (20) and low (21) crossing over regions.
Since Haddrill et al. (2007) found differences in non-synonymous divergence between genes on the fourth chromosome and other non-crossover genes, we examined the level of expression in these groups separately. Figure 1 shows that within the non-crossing over genes, non-fourth chromosome genes show levels of expression intermediate between the fourth chromosome and the rest of the genome (median EST counts: non-fourth chromosome=40, fourth chromosome=93), and are significantly different from both these groups (Wilcoxon signed-rank test: non-fourth versus fourth, W=893.5, p=0.024; non-fourth versus high, intermediate and low regions combined, W=53480.5, p=0.037). This is consistent with the results of Haddrill et al. (2007) for intron divergence and codon usage bias in non-fourth chromosome genes, which were intermediate between the fourth chromosome genes and the rest of the genome.
Marais et al. (2004) and Larracuente et al. (2008) reported a negative relationship between non-synonymous divergence and expression level, and we see the same relationship here (Spearman's rank correlation, Rs, between EST count and KA, with 95% confidence intervals obtained by bootstrapping across genes: high crossover region Rs=−0.40 (−0.43 to −0.37); intermediate crossover region Rs=−0.42 (−0.45 to −0.39); low crossover region Rs=−0.36 (−0.41 to −0.31); and non-crossover region Rs=−0.24 (−0.42 to −0.03)). However, within the non-crossover region, this relationship is significantly different from zero only for the non-fourth chromosome genes (Rs=−0.58 (−0.82 to −0.17)) but not the fourth chromosome genes (Rs=−0.19 (−0.41 to 0.05)); the differences between the categories are, however, not significant. A positive relationship between expression level and the codon usage bias measure, Fop, has also been reported in Drosophila (Duret & Mouchiroud 1999; Marais et al. 2001, 2004). Consistent with this, we find a significantly positive relationship between EST count and Fop for all crossing over regions (high crossover region Rs=0.33 (0.31 to 0.37); intermediate crossover region Rs=0.32 (0.28 to 0.35); low crossover region Rs=0.16 (0.10 to 0.22)), but not for any of the non-crossover regions (all non-crossover genes Rs=−0.16 (−0.38 to 0.06); fourth chromosome genes Rs=−0.02 (−0.28 to 0.24); non-fourth chromosome genes Rs=−0.11 (−0.48 to 0.30)).
The main conclusion of this study is that the elevated level of non-synonymous divergence previously observed in the non-crossing over regions of the Drosophila genome (Haddrill et al. 2007) cannot simply be explained by the established negative correlation between non-synonymous divergence and expression level. Genes in the non-crossover regions actually experience higher levels of expression than the rest of the genome, opposite to what is expected on this basis. Within the non-crossover regions, the fourth chromosome genes show the most extreme elevations in expression, with median EST counts double those in other non-crossover regions. This pattern of the fourth chromosome showing the most extreme deviations from the rest of the genome is consistent with previous findings, and may reflect enhanced Hill–Robertson effects on a larger region of zero recombination than elsewhere, and/or a long history of no crossing over on the fourth chromosome, which may not apply to other non-crossover genes (Haddrill et al. 2007).
Since Haddrill et al. (2007) concluded that the efficacy of selection was severely reduced in the non-crossing over regions, especially on the fourth chromosome, these results may suggest that selection has lost control of expression in the non-crossover regions, which would imply that these genes are being expressed at a higher level than optimal. If these regions are accumulating deleterious mutations, these mutations are likely to affect sequences important for regulating expression patterns as well as coding regions. Indeed, Haddrill et al. (2007) found that divergence in long introns was close to that of short introns in non-crossover regions, despite evidence for strong selective constraints in long introns in the rest of the genome (Bergman & Kreitman 2001; Andolfatto 2005; Haddrill et al. 2005; Halligan & Keightley 2006); introns are likely to contain regulatory elements important for the control of expression (Casillas et al 2007), and evidence from the non-recombining neo-Y chromosome of Drosophila miranda suggests that regulatory mutations that have accumulated on this chromosome have increased expression levels, as well as decreased them (Bachtrog 2006).
In addition to this, the genome-wide relationships observed between non-synonymous divergence/Fop and expression level may have broken down in the non-crossover regions, further suggesting that, while selection normally maintains tight control over expression patterns, this regulation appears to have deteriorated in the absence of crossing over. However, relaxation of selective constraints might be expected to increase the variance in expression level between genes, and there is some evidence for this (Rifkin et al. 2005). After transformation of our data to a log scale, however, there is no statistical support for higher variance in expression level in the non-crossover regions (data not shown), suggesting that other mechanisms may be influencing the patterns observed.
Alternatively, it is possible that the effects observed are a result of the evolution of a form of dosage compensation. If genes in non-crossover regions are accumulating deleterious mutations that reduce the function of their proteins, selection may favour mutations that increase expression to compensate for this, similar to what has been proposed for the evolution of X and Y chromosomes (e.g. Charlesworth & Charlesworth 2000; Bachtrog 2006). Similarly, if the presence of unpreferred codons in the fourth chromosome/other non-crossover genes results in a reduction in translational efficiency or accuracy, the higher expression level observed may represent a means of compensating for this. Interestingly, it has been reported that the protein ‘Painting of fourth’ (POF) exclusively binds to the fourth chromosome in Drosophila (Larsson et al. 2004). The extent of POF binding to fourth chromosome genes is correlated with their transcription levels, and mutational knockouts of POF lead to reduced gene expression on the fourth chromosome (Johansson et al. 2007). This suggests that it is involved in chromosome 4-specific gene regulation, perhaps even having a dosage compensation-like role.
We are very grateful to Daniel Halligan for assistance with the statistical analysis package R, Beatriz Vicoso for assistance with the EST database, Andrea Betancourt for drawing our attention to POF and two anonymous reviewers for useful comments on the manuscript. P.R.H. was supported by a grant from NERC to B.C., F.M.W. by a Marie Curie Early Stage Fellowship and B.C. by the Royal Society.
- Received July 8, 2008.
- Accepted August 20, 2008.
- © 2008 The Royal Society