Mathematics seminar: Novel tools to analyse pathogen phylogenies
Inexpensive sequencing means that we now have access to considerable information about the genomes of pathogens. Typically, genetic data are studied either by analysing pairwise distances between sequences directly, or via comparing branch lengths in an inferred phylogenetic tree to a relevant null model. But for some pathogens, these methods may not be ideal.
In tuberculosis, for example, reconstructing transmission trees is hampered by long periods of infectiousness before and even possibly after detection, and by potentially long latent periods, among other factors. At the same time, existing methods do not make much use of structural properties of phylogenetic trees, yet the sheer numbers of possible structures suggest that they may contain relevant information. Here we explore how the shapes of pathogen phylogenies are affected by the kinds of outbreaks from which they are derived. We use topological summary statistics to compare one outbreak to another. We find that the structures show a clear signal of the extent of 'super-spreading' (interpreted generously) and we develop a computational classifier to encode the relevant associations. We explore whether this kind of approach might be useful in real time. Time permitting, in the second part of the talk, we describe methods to detect positive selection - analagous to 'super-spreadering' - in the broader context of a very large phylogenetic tree of sparsely sampled isolates.