Modelling changes between cellular compartments on a genealogy

We know that viruses like HIV can move in and out of different cellular and/or systemic compartments.  For instance, a few years ago, Perelson et al developed a three-compartment model to account for the decay in viral loads during antiretroviral therapy. Each compartment corresponds to a different type of cell, each with a different generation time.  The proportion of viruses produced by each compartment is also different.

It seems that we should be able to use a phylogenetic tree to derive equivalent estimates of the relative generation times of these cellular compartments and the proportions of each compartment.  How do we do this? Well, we can assume that for all viruses, the intrinsic mutation rate remains the same; however, a virus that is produced by a cell with a longer generation time will have a lower observed rate of mutation.  So, imagine a simple case in which there are only two cellular compartments, and viruses switch stochastically between one and the other. If close to 100% of the viruses come from one compartment, then almost all lineages will have the same observed rate.  Thus, we would expect that that a phylogeny of the viruses will look as though it agreed with a molecular clock.  As we increase the number of the viruses produced by the other compartment, we can expect to see greater deviation from the molecular clock.  As viruses move between compartments more rapidly, the overall rate along all lineages can be approximated by a weighted average of the two rates, and we return to a tree that looks clock-like. So there is a sweet spot where the rate of movement between compartments is sufficiently low, so that we are able to estimate the relative proportions and generation times of the different compartments based on the degree of deviation from the molecular clock.

One can imagine that a simple solution may be to treat this as an example of the structured coalescent and incorporate migration.  But it turns out to be not so simple for two reasons.  First, when we sample the viral sequences to build our phylogeny (or more precisely our genealogy), we don’t know which compartment/cell these viruses were last in. In other words, we don’t know the “demes” or “areas” from which these viruses were sampled. Second, the observed substitution rates change depending on which “deme” or “area” the virus finds itself in.

So — this model of cellular compartments and differing generation times/mutation rates provides a mechanistic explanation for relaxed clock models when these are applied to viruses.  It also suggests that we should be able to develop a covarion-type analysis to figure out the relative proportions of the different compartments and their generation times. But how?