We know that G+C content increases thermostability of nucleic acids. So we would expect that prokaryotes that live at higher temperatures (i.e., those with a higher optimal living temperature or Topt) would have a higher G+C content in their genomes. The evidence (or not) for this has been the subject of some quite heated exchanges in the literature. There are two broad things at issue here. First, it is obvious that genomic G+C content will be associated with gene/protein content, which is associated with the functions that an organism needs to perform. Second, it should also be obvious that the phylogenetic relationships of different organisms will reflect shared patterns in genomic G+C content and Topt. So it is quite likely that all these factors obscure the underlying relationship between genomic G+C content and Topt. In fact, when you plot a graph of genomic G+C content and Topt in bacteria, there is no apparent relationship. Of course, I am not saying anything new, and others have tried to uncover this relationship, including restricting comparisons to certain closely related groups or by applying certain multivariate methods.
A few years ago, I tried to tackle this with a talented intern, Norbert Kopocz — you can see the presentation of his results here. We essentially performed a simple phylogenetic comparative analysis by computing the ancestral G+C content and the ancestral Topt at each node on a tree using squared-changes parsimony. Then we correlated the change in G+C content (ΔGC) against the change in Topt (ΔTopt) along each branch. By doing this, we essentially analyzed whether an increase (or decrease) in G+C content is associated with an increase (or decrease) in Topt, regardless of what the starting values for G+C and Topt are. Our analyses indicated that this relationship is statistically significant, although the proportion of variation explained is low. (By the way, at the time I was reasonably sure that this ancestral reconstruction approach was related to the independent contrasts method, and I have just found a paper that seems to demonstrate this; however, the paper doesn’t seem to resolve the apparent inflation in the amount of data — ancestral state reconstruction obtains correlations based on data for the 2(n-1) branches on a tree, whereas independent contrasts has n-1 comparisons ).
Nowadays, there are, of course, much better ways of doing this analysis, especially given the number of full-length genomes available. We would not need to reconstruct genomic G+C content using the values of this trait at the tips of the tree. Instead, we could reconstruct ancestral genes (maybe, large parts of the genomes?) to obtain the ancestral genomic G+C content at the nodes. Perhaps there are even better ways of doing the analysis: ways that build in the correlation between ΔGC and ΔTopt as a model parameter in a phylogenomic analysis. We could then test whether this correlation is significantly different from zero.