Distance-based estimation of evolutionary parameters

These days, people routinely use Bayesian inference to estimate evolutionary parameters. MrBayes and BEAST are extremely popular packages, and deservedly so.  But there is no getting round the fact that these analyses take time.  So, what if we used distance-based methods to perform the same kinds of analyses?  Distance-based methods tend to be a lot faster, although the variances of the estimates are usually larger.  But perhaps for a given dataset — especially a large one, with long sequences — the sampling variance may be negligible.

So — here’s an example of what we might do.  Imagine that we are trying to work out parameters associated with a relaxed clock model.  Here is a plausible algorithm:

1.  Build a neighbor-joining tree.

2.  Root the tree by finding the point such that the variance of distances between the root and all tips is minimized.

3. For a tree with n tips, there will be at most n-1 branches that will need to be lengthened or shortened to ensure that all tips terminate at exactly the same distance from the root. Find these n-1 branches, and calculate the multipliers that modify the lengths of these branches.

There are many way to do (3), of course; perhaps the easiest may be some kind of stepwise approach.

By the way, this is not the way that “standard” relaxed clock models work — with your typical relaxed clock model, you have a distribution of rates and/or an inheritance model of rates.  The model above tries to identify certain branches where there is a speed-up/slow-down of rates.  From an evolutionary perspective, this is equivalent to saying that there are some lineages where species may have encountered environmental situations that lead to rapid acquisition of substitutions.

There are other things we can do with distance-based methods.  The original skyline plots, for instance, did not use Bayesian methods.  The beauty of the Bayesian skyline plot is that it gives a smooth representation of population trajectory. But can we get the same smoothness by bootstrapping out distance-based trees?