Modern-day SIV viral diversity generated by extensive recombination and cross-species transmission

Citation: Bell SM, Bedford T (2017) Modern-day SIV viral diversity generated by extensive recombination and cross-species transmission. PLoS Pathog 13(7): e1006466. https://doi.org/10.1371/journal.ppat.1006466

Editor: Guido Silvestri, Emory University, UNITED STATES

Received: January 19, 2017; Accepted: June 12, 2017; Published: July 3, 2017

Copyright: © 2017 Bell, Bedford. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All datasets, config files, documentation, and scripts used in this analysis or to generate figures are available publicly at <https://github.com/blab/siv-cst>.

Funding: TB is a Pew Biomedical Scholar and his work is supported by National Institutes of Health award R35 GM119774-01. SB is a National Science Foundation Graduate Research Fellow and her work is supported by grant DGE-1256082. Her work is also supported by a training grant from the National Cancer Institute of the National Institutes of Health, award number T32CA080416; a Vassar Fellowship for Graduate Study; and an ARCS foundation fellowship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

As demonstrated by the recent epidemics of EBOV and MERS, and by the global HIV pandemic, viral cross-species transmissions (CST) can be devastating [1,2]. As such, understanding the propensity and ability of viral pathogens to cross the species barrier is of vital public health importance. Of particular interest are transmissions that not only “spillover” into a single individual of a new host species, but that result in a virus actually establishing a sustained chain of transmission and becoming endemic in the new host population (“host switching”) [3].

HIV is the product of not just one successful host switch, but a long chain of host switch events [4,5]. There are two human immunodeficiency viruses, HIV-2 and HIV-1. HIV-2 arose from multiple cross-species transmissions of SIVsmm (simian immunodeficiency virus, sooty mangabey) from sooty mangabeys to humans [68]. HIV-1 is the result of four independent cross-species transmissions from chimpanzees and gorillas. Specifically, SIVcpz was transmitted directly from chimpanzees to humans twice; one of these transmissions generated HIV-1 group M, which is the primary cause of the human pandemic [9]. SIVcpz was also transmitted once to gorillas, generating SIVgor [10], which was in turn transmitted twice to humans [11].

Looking further back, SIVcpz itself was also generated by lentiviral host switching and recombination. Based on the SIV sequences available at the time, early studies identified SIVmon/-mus/-gsn (which infect mona, mustached, and greater spot-nosed monkeys, respectively) and SIVrcm (which infects red-capped mangabeys) as probable donors [12]. Functional analysis of accessory genes from these putative parental lineages indicate that the specific donors and genomic locations of these recombination event(s) were crucial for enabling what became SIVcpz to cross the high species barrier and establish an endemic lineage in hominids [13].

The complex evolutionary history resulting in HIV illustrates the importance of natural history to modern day viral diversity, and although the history leading to HIV is well detailed, broader questions regarding cross-species transmission of primate lentiviruses remain [14]. With over 45 known extant primate lentiviruses, each of which is endemic to a specific host species [4,5,15], we sought to characterize the history of viral transmission between these species to the degree possible given the precision allowed from a limited sample of modern-day viruses.

Here, we utilized phylogenetic inference to reconstruct the evolutionary history of primate lentiviral recombination and cross-species transmission. We assembled datasets from publicly available lentiviral genome sequences and conducted discrete trait analyses to infer rates of transmission between primate hosts. We find evidence for extensive interlineage recombination and identify many novel host switches that occurred during the evolutionary history of lentiviruses. We also find that specific lentiviral lineages exhibit a broad range of abilities to cross the species barrier. Finally, we also examined the origins of each region of the SIVcpz genome in greater detail than previous studies to yield a more nuanced understanding of its origins.

Results

There have been at least 13 interlineage recombination events that confuse the lentiviral phylogeny

In order to reconstruct the lentiviral phylogeny, we first had to address the issue of recombination, which is frequent among lentiviruses [16]. In the context of studying cross-species transmission, this is both a challenge and a valuable tool. Evidence of recombination between viral lineages endemic to different hosts is also evidence that at one point in time, viruses from those two lineages were in the same animal (i.e., a cross-species transmission event must have occurred in order to generate the observed recombinant virus). However, this process also results in portions of the viral genome having independent evolutionary—and phylogenetic—histories.

To address the reticulate evolutionary history of SIVs, we set out to identify the extent and nature of recombination between lentiviral lineages. Extensive sequence divergence between lineages masks site-based methods for linkage estimation (S1 Fig). However, topology-based measures of recombination allow for “borrowing” of information across nearby sites, and are effective for this dataset. We thus utilized a phylogenetic model to group segments of shared ancestry separated by recombination breakpoints instantiated in the HyPhy package GARD [17]. For this analysis, we used a version of the SIV compendium alignment from the Los Alamos National Lab (LANL), modified slightly to reduce the overrepresentation of HIV sequences (N = 64, see Methods) [18]. Importantly, because each virus lineage has only a few sequences present in this alignment, these inferences refer to inter-lineage recombination, and not the rampant intra-lineage recombination common among lentiviruses.

GARD identified 13 locations along the genome that had strong evidence of inter-lineage recombination (Fig 1). Here, evidence for a particular model is assessed via Akaike Information Criterion (AIC) and differences in AIC between models indicate log probabilities, so that a delta-AIC of 10 between two models would indicate that one model is e10/2 = ~148 as likely as the other [19]. In our case, delta-AIC values ranged from 154 to 436 for each included breakpoint, indicating that these breakpoints are strongly supported by the underlying tree likelihoods. The 14 resulting segments ranged in length from 351 to 2316 bases; in order to build reliable phylogenies, we omitted two of the less supported breakpoints from downstream analyses, yielding 12 segments ranging in length from 606–2316 bases. We found no evidence to suggest linkage between non-neighboring segments (S2 Fig). While it has been previously appreciated that several lineages of SIV are recombinant products (e.g., [12,20]), the 13 breakpoints identified here provide evidence that there have been at least 13 inter-lineage recombination events during the evolution of SIVs. Identifying these recombination breakpoints allowed us to construct a putatively valid phylogeny for each segment of the genome that shares an internally cohesive evolutionary history.

thumbnail

Fig 1. Inferred interlineage recombination breakpoints and supporting tree topologies.

The SIV LANL compendium, slightly modified to reduce overrepresentation of HIV, was analyzed with GARD to identify 13 recombination breakpoints across the genome (dashed lines in B; numbering according to the accepted HXB2 reference genome—accession K03455, illustrated). Two of these breakpoints were omitted from further analyses because they created extremely short fragments (< 500 bases; gray dashes in B). For each of the 11 remaining breakpoints used in further analyses, we split the compendium alignment along these breakpoints and built a maximum likelihood tree, displayed in A. Each viral sequence is color-coded by host species, and its phylogenetic position is traced between trees. Heuristically, straight, horizontal colored lines indicate congruent topological positions between trees (likely not a recombinant sequence); criss-crossing colored lines indicate incongruent topological positions between trees (likely a recombinant sequence).

https://doi.org/10.1371/journal.ppat.1006466.g001

Most primate lentiviruses were acquired by cross-species transmissions

We then looked for phylogenetic evidence of cross-species transmission in the tree topologies of each of the 12 genomic segments. For this and all further analyses, we constructed a dataset from all publicly available primate lentivirus sequences, which we curated and subsampled by host and virus lineage to ensure an equitable distribution of data (see Methods). This primary dataset consists of virus sequences from the 24 primate hosts with sufficient data available (5–25 sequences per viral lineage, N = 423, S3 Fig). Alignments used the fixed compendium alignment as a template (see Methods).

In phylogenetic trees of viral sequences, cross-species transmission appears as a mismatch between the host of a virus and the host of that virus’s ancestor. To identify this pattern and estimate how frequently each pair of hosts has exchanged lentiviruses, we used the established methods for modeling evolution of discrete traits, as implemented in BEAST [21,22]. In our case, the host of each viral sample was modeled as a discrete trait. This is analogous to treating the “host state” of each viral sample as an extra column in an alignment, and inferring the rate of transition between all pairs of host states along with inferring ancestral host states across the phylogeny. This approach is similar to common phylogeographic approaches that model movement of viruses across discrete spatial regions [23] and has previously been applied to modeling discrete host state in the case of rabies virus [21]. Here, we took a fully Bayesian approach and sought the posterior distribution across phylogenetic trees, host transition rates and ancestral host states. We integrate across model parameters using Markov chain Monte Carlo (MCMC). The resulting model provides phylogenetic trees for each segment annotated with ancestral host states alongside inferred transition rates.

Fig 2 shows reconstructed phylogenies for 3 segments along with inferred ancestral host states. Trees are color coded by known host state at the tips, and inferred host state at internal nodes/branches; color saturation indicates the level of certainty for each ancestral host assignment. A visual example of how the model identifies cross-species transmissions can be seen in the SIVmon/SIVtal clade, which infect mona- and talapoin monkeys, respectively (starred in Fig 2A–2C). Due to the phylogenetic placement of the SIVmon tips, the internal node at the base of this clade is red, indicating that the host of the ancestral virus was most likely a mona monkey. This contrasts with the host state of the samples isolated from talapoin monkeys (tips in green). These changes in the host state across the tree are what inform our estimates of the rate of transmission between host pairs. In total, the support for each possible transmission is derived from both A) whether the transmission is supported across the posterior distribution of phylogenies for a particular segment, and B) whether this is true for multiple genomic segments.

thumbnail

Fig 2. Lentiviral phylogenies highlighting the mosaic origins of SIVcpz and examples of how CST is inferred from the phylogenies A,B,C.

Bayesian maximum clade credibility (MCC) trees are displayed for segments 2 (gagA), 6 (int and vifB), and 9 (envC) of the main dataset (N = 423). Tips are color coded by known host species; internal nodes and branches are colored by inferred host species, with saturation indicating the confidence of these assignments. Monophyletic clades of viruses from the same lineage are collapsed, with the triangle width proportional to the number of represented sequences. An example of likely cross-species transmission is starred in each tree, where the host state at the internal node (red / mona monkeys) is incongruent with the descendent tips’ known host state (green / talapoin monkeys), providing evidence for a transmission from mona monkeys to talapoin monkeys. Another example of cross-species transmission of a recombinant virus among African green monkeys is marked with a dagger. D—The genome map of SIVcpz, with breakpoints used for the discrete trait analysis, is color coded and labeled by the most likely ancestral host for each segment of the genome.

https://doi.org/10.1371/journal.ppat.1006466.g002

Notably, the tree topologies are substantially different between segments, which emphasizes both the extent of recombination and the different evolutionary forces that have shaped the phylogenies of individual portions of the genome. In all segments’ trees, we also see frequent changes in the host state between internal nodes (illustrated as changes in color going up the tree), suggestive of frequent ancient cross-species transmissions. On average, primate lentiviruses switch hosts once every 6.25 substitutions per site per lineage across the SIV phylogeny.

The cross-species transmission events inferred by the model are illustrated in Fig 3, with raw rates and Bayes factors (BF) in S4 Fig. As shown, the model correctly infers nearly all pairs of hosts with previously identified (to our knowledge) CST events [7,9,11,12,24,25], with the exception of the putative CST from sooty mangabeys to sabaeus monkeys reported by [20] (see Discussion). Importantly, we also identify 14 novel cross-species transmission events with strong statistical support (cutoff of BF > = 10.0). Each of these transmissions is clearly and robustly supported by the tree topologies (all 12 trees are illustrated in S5 Fig).

thumbnail

Fig 3. Network of inferred CSTs of primate lentiviruses.

The phylogeny of the host species’ mitochondrial genomes forms the outer circle. Arrows represent transmission events inferred by the model with Bayes’ factor (BF) > = 3.0; black arrows have BF > = 10, with opacity of gray arrows scaled for BF between 3.0 and 10.0. Width of the arrow indicates the rate of transmission (actual rates = rates * indicators). Circle sizes represent network centrality scores for each host. Transmissions from chimps to humans; chimps to gorillas; gorillas to humans; sooty mangabeys to humans; sabaeus to tantalus; and vervets to baboons have been previously documented. To our knowledge, all other transmissions illustrated are novel identifications.

https://doi.org/10.1371/journal.ppat.1006466.g003

To control for sampling effects, we repeated the analysis with a supplemental dataset built with fewer hosts, and more sequences per host (15 host species, subsampled to 16–40 sequences per viral lineage, N = 510), and see consistent results. As illustrated in S6S8 Figs, we find qualitatively similar results. When directly comparing the average indicator values, host state transition rates, and BF values between analyses, the results from the main and supplemental datasets are strongly correlated, indicating robust quantitative agreement between the two analyses (S9 Fig). We also see similar agreement between replicates when we regenerate the main dataset via independent sampling draws (see online repository).

Taken together, these results represent a far more extensive pattern of CST among primate lentiviruses than previously described [4,5]; nearly every primate clade has at least one inbound, robustly supported viral transmission from another clade. We thus conclude that the majority of lentiviruses have arisen from a process of host switching, followed by a combination of intraclade host switches and host-virus coevolution.

Primate hosts may serve as either sources or sinks of SIVs

While most SIVs are the product of ancient recombination and host switching, the distribution of these host switches is not uniform; when we assess the network centrality of each virus we find a broad range (Figs 3, S7 and S11, as node size), indicating some hosts act as sources in the SIV transmission network and other hosts act as sinks. From this, we infer that some viruses have either had greater opportunity or have a greater ability to cross the species barrier than others.

In particular, the SIVs from the four closely related species of African green monkeys (SIVsab, SIVtan, SIVver, SIVgrv; collectively, SIVagm) appear to exchange viruses with other host species frequently (Fig 3, 12’ o-clock). An example of SIVagm CST events can be seen in the tree topologies from gag, prot, reverse transcriptase (RT), and vif (Figs 2 and S5, segments 2, 3, 4, & 6). Here, SIVtan isolates reported by Ayouba et al. [26] (denoted with ✝ in Fig 2) clearly cluster with SIVsab, in a distant part of the phylogeny from the rest of the SIVagm viruses (including the majority of SIVtan isolates). For all other segments, however, the SIVagms cluster together. We thus concur with the conclusion of Ayouba et al. that these samples represent a recent spillover of SIVsab from sabaeus monkeys to tantalus monkeys, and the model appropriately identifies this transmission.

Contrastingly, previous studies of the lentiviral phylogeny have noted that SIVcol is typically the outgroup to other viral lineages, and have hypothesized that this may implicate SIVcol as the “original” primate lentivirus [27]. We find this hypothesis plausible, but the evidence remains inconclusive. For the majority of genomic segments, we also observe SIVcol as the clear outgroup (Figs 2 and S5). In contrast, for portions of gag/pol (segments 3 and 4) and some of the accessory genes (segment 7), we find that there is not a clear outgroup. For these segments, many other lineages of SIV are just as closely related to SIVcol as they are to each other. However, with the occasional exception of single heterologous taxa with poorly supported placement, SIVcol remains a monophyletic clade (N = 16), and does not intercalate within the genetic diversity of any other lineage in our dataset. Based on these collective tree topologies, our model does not identify strong evidence for any specific transmissions out of colobus monkeys, and identifies only a single, weakly supported inbound transmission (likely noise in the model caused by the fact that red-capped mangabeys are the marginally supported root host state; see below). This is consistent with previous findings that the colobinae—in a different genus than most of the Cercopithecus primates in our dataset—have a unique variant of the APOBEC3G gene, which is known to restrict lentiviral infection and speculated to be a barrier to cross-species transmission [27]. These observations generally support the idea of SIVcol as having maintained a specific relationship with its host over evolutionary time.

Additionally, while most host species carry only one lineage of SIV, mandrils and mustached monkeys carry 2 and 3 lineages of SIV, respectively [2831]. In agreement with these previous studies, SIVmnd-1 and SIVmnd-2 do not always cluster together in the phylogeny; the same is true for SIVmus-1, SIVmus-2, and SIVmus-3, indicating that each of these viral lineages likely has a unique origin. This stands in stark contrast to baboons, which have only been infected by an SIV via a single documented spillover event [24].

Collectively, these examples demonstrate that the nature of the host-virus relationship is highly variable for primate lentiviruses, with some viruses switching hosts often while others putatively maintain strict host specificity. Likewise, while some hosts have acquired multiple SIV lineages, most are infected by only one SIV, or do not have an endemic SIV.

SIVcpz, the precursor to HIV-1, has a mosaic origin with unknown segments

Unlike SIVcol, SIVcpz appears to be the product of multiple CSTs and recombination events. SIVcpz actually encompasses two viral lineages: SIVcpzPtt infects chimpanzees of the subspecies Pan troglodytes troglodytes, and SIVcpzPts infects chimpanzees of the subspecies Pan troglodytes schweinfurthii [32]. There are two additional subspecies of chimpanzees that have not been found harbor an SIV despite extensive surveys, suggesting that SIVcpzPtt was acquired after chimpanzee subspeciation [25]. Both this previous work and our own results support the hypothesis that SIVcpz was later transmitted from one chimpanzee subspecies to the other, and SIVcpzPtt is the only SIVcpz lineage that has crossed into humans. Given the shared ancestry of the two lineages of SIVcpz, we use “SIVcpz” to refer specifically to SIVcpzPtt.

Based on the lentiviral sequences available in 2003, Bailes et al [12] suggested that the SIVcpz genome is a recombinant of just two parental lineages. SIVrcm (which infects red-capped mangabeys) was identified as the 5’ donor, and an SIV from the SIVmon/-mus/-gsn clade (which infect primates in the Cercopithecus genus) was identified as the 3’ donor. Since the time of this previous investigation many new lentiviruses have been discovered and sequenced. In incorporating these new data, we find clear evidence that the previous two-donor hypothesis may be incomplete.

The tree topologies from env in the 3’ end of the genome (segments 8–11) support the previous hypothesis [12] that this region came from a virus in the SIVmon/-mus/-gsn clade. These viruses form a clear sister clade to SIVcpz with high posterior support (Fig 2C and 2D). We find strong evidence for transmissions from mona monkeys (SIVmon) to mustached monkeys (SIVmus), and from mustached monkeys to greater spot-nosed monkeys (SIVgsn) (see Discussion of potential coevolution below). We also find more evidence in support of a transmission from mona monkeys to chimpanzees than from the other two potential donors, but more sampling is required to firmly resolve which of these viruses was the original donor of the 3’ end of SIVcpz.

We find phylogenetic evidence to support the previous hypothesis [12,13] that the integrase and vif genes of SIVcpz (segments 4–6) originated from SIVrcm; however, we find equally strong evidence to support the competing hypothesis that pol came from SIVmnd-2, which infects mandrils (Fig 2B and 2D). In these portions of the genome, SIVmnd-2 and SIVrcm together form a clear sister clade to SIVcpz. The vpr gene, in segment 7, is also closely related to both SIVrcm and SIVmnd-2, but this sister clade also contains SIVsmm from sooty mangabeys. Notably, we infer a transmission from red-capped mangabeys to mandrils, but we cannot determine whether this portion of the SIVcpz genome was acquired directly from SIVrcm or from SIVmnd-2.

Interestingly, we do not find evidence to support either SIVrcm/mnd-2 or SIVmon/mus/gsn as the donor for the 5’ most end of the genome (segments 1–5), including the 5’ LTR, gag, and RT genes. This is also true for the 3’ LTR (segment 12). SIVcpz lacks a clear sister clade or ancestor in this region, and SIVrcm groups in a distant clade; we therefore find no evidence to suggest that an ancestor of an extant SIVrcm was the parental lineage of SIVcpz in the 5’ most end of the viral genome as previously believed (Fig 2A and 2D). This may support the possibility of a third parental lineage, or a number of other plausible scenarios (discussed below).

Discussion

Limitations and strengths of the model

Additional sampling is required to fully resolve the history of CST among lentiviruses.

In addition to the 14 strongly supported novel transmissions (BF > = 10) described above, we also find substantial evidence for an additional 8 possible novel transitions, but with lower support (BF > = 3) (Figs 3, S4). These transmissions are more difficult to assess, because many of them are inferred on the basis of just a few “outlier” tips of the tree that group apart from the majority of viral samples from the same lineage. In each case, the tips’ phylogenetic position is strongly supported, and the primary literature associated with the collection of each of these “outlier” samples clearly specifies the host metadata. However, due to the limited number of lentiviral sequences available for some hosts, we are unable to control for sampling effects for some of these lower-certainty transmissions. We report them here because it is unclear whether these outliers are the result of unidentified separate endemic lineages, one-time spillovers from other hosts, or species misidentification during sample collection. It is also important to note that while some of these less-supported transmissions are potentially sampling artifacts, many of them may be real, and may be less supported simply because they lack the requisite available data for some genome segments.

Ultimately, far more extensive sampling—specifically, obtaining more full-length sequences from undersampled lineages—of primate lentiviruses is required in order to resolve these instances. We included only sequences at least 500 bases long; each taxon may contribute more informative sites to some segments than to others. When splitting the master alignment along breakpoints, we removed from each segment any taxon that had no informative bases. However, for each segment, there were between 0 and 13 (mean: 3.6) taxa that had some informative sites, but were very short (<100 informative sites). Statistically, these short taxa contribute little information, and are placed in the topology for each segment with high uncertainty. This phylogenetic uncertainty is then propagated forward to the discrete traits model, meaning that these short taxa should not statistically influence our results in any meaningful way. Notably, though, their removal does result in extensive technical challenges (this “pruned” dataset results in poor mixing and rather divergent results, seen in S10S12 Figs). Given the high congruence between results from independent sampling replicates of the main dataset and from an alternative sampling scheme, we believe this to be a technical issue, rather than reflecting true differences. However, this issue does further emphasize the importance of additional sampling in fully resolving the natural history of SIVs.

Most lentiviruses were originally acquired by CST and have since coevolved with their hosts.

Some of these “noisier” transmission inferences, particularly within the same primate clade, may be the result of coevolution, ie. lineage tracking of viral lineages alongside host speciation. Within the model, viral jumps into the common ancestor of two extant primate species appear as a jump into one of the extant species, with a secondary jump between the two descendants. For example, the model infers a jump from mona monkeys into mustached monkeys, with a secondary jump from mustached monkeys into their sister species, red-tailed guenons (Fig 3). Comparing the virus and host phylogenies, we observe that this host tree bifurcation between mustached monkeys and red-tailed guenons is mirrored in the virus tree bifurcation between SIVmus and SIVasc for most segments of the viral genome (Figs 2 and 3). This heuristically suggests that the true natural history may be an ancient viral transmission from mona monkeys into the common ancestor of mustached monkeys and red-tailed guenons, followed by host/virus coevolution during primate speciation to yield SIVmus and SIVasc.

The possibility of virus/host coevolution means that while we also observe extensive host switching between primate clades, many of the observed jumps within a primate clade may be the result of host-virus coevolution. However, we also note that the species barrier is likely lower between closely related primates, making it challenging to rigorously disentangle coevolution vs. true host switches within a primate clade [33].

Cross-species transmission is driven by exposure and constrained by host genetic distance

Paleovirology, biogeography, and statistical models of lentiviral evolution estimate that primate lentiviruses share a common ancestor approximately 5–10 million years ago [27,3436]. This, along with the putative viral coevolution during primate speciation, suggests that many of these transmissions were ancient, and have been acted on by selection for millions of years. Thus, given that the observed transmissions almost exclusively represent evolutionarily successful host switches, it is remarkable that lentiviruses have been able to repeatedly adapt to so many new host species. In the context of this vast evolutionary timescale, however, we conclude that while lentiviruses have a far more extensive history of host switching than previously understood, these events remain relatively rare overall.

As noted above, our results illustrate that some SIVs cross the species barrier more readily than others, and some primate host species become infected with new viral lineages more commonly than others. This is likely governed by both ecological and biological factors. Ecologically, frequency and form of exposure are likely key determinants of transmission [3], but these relationships can be difficult to describe statistically. For example, many primates are chronically exposed to many exogenous lentiviruses through predation [5,15,37]. Using log body mass ratios [38] as a proxy for predation, we do not see a statistically significant association between body mass ratio and non-zero transmission rate (Fig 4A, blue; p = 0.678, coef. 95% CI (-0.311, 0.202)). We believe the lack of signal is likely due to the imperfect proxy, although it is also possible that predator-prey relationships do not strongly structure the CST network. It is also likely that geographic overlap and habitat similarity are ecological determinants of SIV CST, but modern primate habitats are likely very different since the time that these transmissions actually occurred.

thumbnail

Fig 4. Logistic regressions of body mass ratios (a proxy for predation) and host genetic distance on the probability of CST.

For each pair of host species, we (A) calculated the log ratio of their average body masses and (B) found the patristic genetic distance between them (from a maximum-likelihood tree of mtDNA). To investigate the association of these predictors with cross-species transmission, we treated transmission as a binary variable: 0 if the Bayes factor for the transmission (as inferred by the discrete traits model) was = 3.0. Each plot shows raw predictor data in gray; the quintiles of the predictor data in green; and the logistic regression and 95% CI in blue.

https://doi.org/10.1371/journal.ppat.1006466.g004

Biologically, increasing host genetic distance has a clear negative association with the probability of cross-species transmission (Fig 4B, blue, p<0.001, coef. 95% CI [-7.633, -2.213]). Importantly, as already discussed, the strength of this association may be inflated by instances of lineage tracking (virus/host cospeciation). However, it is well established that increasing host genetic distance is associated with a higher species barrier [33]. As previously documented in the literature we expect this is due to several factors, such as the divergence of host restriction factor genes, which are key components of the innate immune system (reviewed in [39]) and differences in host cell receptor phenotypes (e.g., [4042]). Functional assays of these host phenotypes against panels of SIVs, while outside the scope of this study, will be important for further identifying the molecular bases of the species barriers that have led to the transmission patterns identified here.

Origins of HIV-1 and HIV-2

Epidemiological factors were key to the early spread of HIV.

Understanding the underlying dynamics of lentiviral CST provides important ecological context to the transmissions that generated the HIV pandemic. As discussed above, our results support a view of lentiviral cross-species transmission as a rare event. Notably, only two lentiviruses have crossed the high species barrier from Old World monkeys into hominids: SIVsmm and the recombinant SIVcpz. Both HIV-1 and HIV-2 have arisen in human populations in the last century [5,43]. While it is possible that this has occurred by chance, even without increased primate exposure or other risk factors, we nevertheless find it striking that humans would acquire two exogenous viruses within such a short evolutionary timespan.

Examining this phenomenon more closely, the history of HIV-2 is enlightening. HIV-2 has been acquired through at least 8 independent spillover events from sooty mangabeys [5]. Notably, 6 of these transmissions have resulted in only a single observed infection (spillovers) [7,8,44]; only 2 of these events have established sustained transmission chains and successfully switched hosts to become endemic human pathogens [4547]. This pattern, as well as serology-based reports of other limited spillovers of SIVs into humans [28,48], suggest that there have been many isolated introductions of lentiviruses into humans over the past 200,000 years. However, these other viral exposures did not result in new endemic human pathogens either because of species-specific immune barriers, non-conducive epidemiological conditions, or a combination thereof. The rapid and repeated emergence of HIV-1 and HIV-2 is on a timescale more congruent with changes in epidemiological conditions than mammalian evolution, perhaps emphasizing the importance of the concurrent changes in human population structure and urbanization in facilitating the early spread of the epidemic [43]. Significantly, though, this also highlights the importance of careful public health surveillance and interventions to prevent future epidemics of zoonotic viruses.

Evolutionary time obscures the identity of the “original” primate lentivirus.

Among primates, our results clearly illustrate that the vast majority of lentiviral lineages were originally acquired by cross-species transmission. It is intriguing to speculate as to which virus was the “original” source of all of these lineages. Because of its consistent position as the outgroup of primate lentiviral trees, it has been hypothesized [27] that SIVcol was this original lentivirus among primates. While SIVcol is certainly the most evolutionary isolated extant lentivirus that has been sampled to date, this does not definitively place it as the ancestral lentivirus. Alternative scenarios (also noted by [27,49]) include an extinct original lentiviral lineage (and/or primate host species) or an unsampled ancestral lentivirus. It is also plausible that another known extant lentivirus was the “original” lineage, but has diverged and/or recombined to such an extant that its origins are obscured.

Supporting information

S7 Fig. Network of inferred CSTs of primate lentiviruses (supplemental dataset).

The phylogeny of the host species’ mitochondrial genomes forms the outer circle. Arrows with filled arrowheads represent transmission events inferred by the model with Bayes’ factor (BF) > = 3.0; black arrows have BF > = 10, with opacity of gray arrows scaled for BF between 3.0 and 10.0. Transmissions with 2.0 < = BF < 3.0 have open arrowheads (see Discussion). Width of the arrow indicates the rate of transmission (actual rates = rates * indicators). Circle sizes represent network centrality scores for each host. Transmissions from chimps to humans; chimps to gorillas; gorillas to humans; sooty mangabeys to humans; sabaeus to tantalus; and vervets to baboons have been previously documented. To our knowledge, all other transmissions illustrated are novel identifications.

https://doi.org/10.1371/journal.ppat.1006466.s007

(TIF)

Original Post

Leave a Reply

Your email address will not be published. Required fields are marked *