Why use phylogenetics




















Specifically, all echolocating bats typically form a monophyletic group in morphological trees, suggesting a single origin of bat echolocation But they tend to form a paraphyly in molecular trees 33 , 34 , 35 , 36 , suggesting the possibility of two origins of bat echolocation or one origin followed by a loss. In the original total evidence tree Supplementary Fig.

Note that using low-convergence morphological characters alone does not result in this new topology. For comparison, we generated 50 randomly subsampled data sets, each with 1, morphological and 2, molecular characters. Although 18 of them also yielded the same topology as in Fig.

Our analysis of comparably large numbers of morphological and molecular characters previously used in inferring the mammalian tree showed that morphological characters experienced more convergent evolution than molecular characters, confirming a long-held belief of the phylogenetics community. Nevertheless, we caution that our conclusion should be further scrutinized using additional data from additional groups of species, because they are currently based on only one, albeit very large, data set of one group of species.

There are three potential sources of error in our inference of convergence. First, use of a wrong species tree could bias our inference. But, as demonstrated, our results are robust to different species trees used. Second, our inference of convergence relies on ancestral state reconstruction by parsimony that may contain errors But, such errors should be comparable between the two types of characters.

Third, it was recently proposed that some inferred convergences may be caused by incomplete lineage sorting rather than genuine convergent changes Similar to genuine convergence, apparent convergence owing to incomplete lineage sorting also confounds phylogenetic inference and thus need not be separated from our estimates of convergence.

Hence, the three potential errors do not affect our conclusion. Regarding the reason behind the higher convergence of morphological characters than molecular characters, our results do not support the common view that morphological characters are intrinsically more prone to convergence because they are more frequently subject to positive selection.

A likely explanation for this unexpected finding is that phylogeneticists have removed morphological characters that are subject to frequent positive selection for example, body size and coat colour from phylogenetic analysis, because such characters are known to lack reliable phylogenetic signals As a result, the morphological characters used for phylogenetic inference have relatively low intrinsic propensities for convergence.

If most convergences of the morphological characters in the data analysed are not manifestations of repeated adaptations but pure chance, one wonders what morphological characters are responsible for the clustering of species with seemingly adaptive convergences in the morphological tree, such as the clade of the four ant- and termite-eaters: the nine-banded armadillo Dasypus novemcinctus , collared anteater Tamandua tetradactyla , Chinese pangolin Manis pentadactyla , and aardvark Orycteropus afer Supplementary Fig.

We found that, even on the basis of the molecular tree, at most 14 morphological characters are inferred to have experienced convergence among the three lineages, and the actual number is likely much smaller because, for 13 of the 14 characters, convergence is but one of several equally parsimonious evolutionary scenarios.

However, none of the 14 characters are apparently related to ant- and termite-eating or are specific to these four species. For instance, the only character for which the sole parsimonious reconstruction indicates convergence among the three lineages describes the shape of the medial border of humerus trochlea. The humerus is a long bone in the arm or forelimb that runs from the shoulder to the elbow and trochlea refers to a grooved structure reminiscent of a pulley's wheel.

This character does not appear to be related to ant- and termite-eating. In fact, manatee Trichechus manatus and ring-tailed lemur Lemur catta also have the same state as the four ant- and termite-eating mammals for this character. These findings are consistent with our conclusion that most morphological convergences observed here are caused by chance rather than repeated adaptations.

Of course, we cannot exclude the possibility that a small number of morphological convergences observed in this data set are adaptive. Nevertheless, morphological characters experience more convergences than molecular characters, because of much fewer states in the former than the latter.

The low number of states per morphological character may be related to one or both of the following reasons 7 , First, curating multistate morphological characters may be more subjective and error-prone, resulting in a reduced use of such characters in phylogenetics Second, most morphological characters may have a small state space, rendering finding multistate characters difficult Because of the higher prevalence of convergence among morphological characters than molecular characters and the rapid accumulation of molecular sequence data, we suggest that phylogenetic reconstruction should normally use only molecular data.

In the event that molecular data are inaccessible for some taxa such as fossils, one should consider using morphological characters with relatively large numbers of states to minimize convergence in phylogenetic analysis. Given a data set of morphological and molecular characters, we proposed a method to reconstruct more accurate total evidence trees by identifying and removing convergence-prone characters in the data set, and demonstrated its validity by computer simulation.

Homoplasy, which interferes with phylogenetic inference, also includes reversal in addition to convergence. While our study focuses on convergence, it is worth noting that convergence-prone characters are also expected to be reversal-prone if most convergences are chance events owing to the availability of only few states, as indicated by the present data.

Thus, in removing convergence-prone characters, we effectively also take out many reversal-prone characters; the success of our method may be in part attributable to this effect. Because our method relies on the assumption that characters that are convergence-prone in the quartets analysed are also convergence-prone in other species, it is not effective in removing characters that are convergence-prone in a few specific lineages such as those subject to adaptive convergence.

In principle, one could also downweight instead of removing convergence-prone characters, but the appropriate weights are unknown. Future studies can investigate how to acquire the best weights for improving phylogenetic accuracy. We showed that the original total evidence mammalian tree in which all echolocating bats form a monophyly is altered upon the removal of convergence-prone characters. The low-convergence tree shows a paraphyly of echolocating bats, identical to the recently published genome-based bat phylogeny Assuming that the genome-based tree is correct, our results demonstrated the utility of our method in actual phylogenetic inference with the total evidence approach.

Besides, our low-convergence tree also supports the monophyly of pangolin Manis pentadactyla and carnivores Supplementary Fig. As shown by our computer simulation, although removing convergence-prone characters improves phylogenetic accuracy, low-convergence trees may still contain errors. Identifying and removing convergence-prone characters is by no means a panacea for phylogenetics.

While rapidly accumulating genome sequences will eventually dwarf the morphological data of any extant species, morphological data will remain useful in phylogenetic analysis that needs to contain fossils, whose value to understanding evolution is indispensable. For this reason, understanding and remedying convergence, which is more prevalent in morphological than molecular characters, will remain an important task in phylogenetics.

Of course, morphological characters that can be studied in fossils do not represent a random sample of all morphological characters. Whether this nonrandomness will bias phylogenetic inference 43 is also worth investigation.

The original data set is composed of 4, morphological characters and 11, amino acid sites It includes 86 species, with 40 fossil taxa having only morphological characters and 46 extant species having both types of characters. We focused on extant species in this study because they have both types of characters for comparison. The morphological tree, molecular tree, and total evidence tree that is, based on both types of characters built using the parsimony method were provided by the original study see Supplementary Fig.

We removed all parsimony-uninformative characters for the 46 extant species. A parsimony-informative character has at least two states, each represented by at least two taxa. Parsimony trees of the 46 extant species based on the remaining 3, morphological characters or 5, amino acid sites agree with those based on all characters of the same types. Ancestral states of each parsimony-informative character were inferred for all interior nodes in the morphological tree, molecular tree or total evidence tree by parsimony using Mesquite V.

Missing extant states of a character were inferred simultaneously during the inference by parsimony, such that no additional changes are required due to the missing state assignment. Mesquite also output the number of states appearing in the 46 extant species for each character and the number of changes each character experienced along the entire tree.

An independent branch pair refers to two branches that are not ancestral to each other and contain no common node.

For example, let the starting and end states of one branch node 1 to node 3 be X 1 and X 3 , and let those of another branch node 2 to node 4 be X 2 and X 4 , respectively. These two branches form an independent branch pair if i the four nodes are all distinct from one another, ii node 3 is not on the path from the tree root to node 4 and iii node 4 is not on the path from the tree root to node 3.

This definition includes both parallel and convergent changes previously defined Thus, once ancestral states are inferred, we know whether a character experiences convergence, divergence, or neither for a branch pair. For a character, the consistency index ci is the smallest minimal number of changes required to explain the observed states by any tree Min divided by the minimal number of changes required by the tree under evaluation Obs.

Rescaled consistency index rc equals consistency index multiplied by retention index Values of ri and ci were calculated by Mesquite. For example, in the branch pair leading to wolf and aardvark, we inferred We obtained a two-tailed P value of 0. For the same branch pair, we inferred 9. The obtained two-tailed P value from Fisher's exact test is 3. There were two branch pairs with no convergence and no divergence for molecular characters under the morphological tree, and three such branch pairs under either the molecular tree or the total evidence tree.

Hence, they were excluded from the analysis and corresponding figures. Because branch pairs or quartets are not independent from one another, simple parametric statistic tests cannot be used.

We thus used a bootstrap method to test the null hypothesis that per character convergence is lower for morphological characters than molecular characters. First, we generated one bootstrap sample containing the same number of both morphological and molecular characters as in the original data.

This fraction is an estimate of the probability that the null hypothesis is correct, hence is the P value of this bootstrap test. Four extant taxa Y 1 , Y 2 , Y 3 and Y 4 are selected if they satisfy the following conditions: i Y 1 and Y 2 form a monophyletic group in exclusion of Y 3 and Y 4 in both the morphological and molecular trees of all extant taxa examined; ii Y 3 and Y 4 form a monophyletic group in exclusion of Y 1 and Y 2 in both the morphological and molecular trees; and iii the root of this four-species tree is located on the internal branch in both the morphological and molecular trees.

Mapping a parsimony-informative character onto this quartet tree, we say that the character shows a convergence if the states of Y 1 , Y 2 , Y 3 , Y 4 are A, B, A, B or A, B, B, A , where A and B are two observed states of the character in the four species. By contrast, we say that the character shows a consistency if A, A, B, B is observed.

Statistical tests followed those in the whole-tree analysis, except that quartets replaced branch pairs. There were quartets with zero convergence and zero consistency for molecular characters. Morphological and molecular characters are divided into bins according to the number of states. Finally, this ratio is averaged across bins, weighted by the number of morphological characters in each bin.

The evolution of morphological and molecular characters was simulated according to Markov processes, based on the tree topology and branch lengths of the nucleotide maximum likelihood tree from the original study 25 Supplementary Fig. The Newick format of the tree is In the simulated evolution, a model equivalent to the Jukes—Cantor model assuming equal equilibrium frequencies of all states and equal exchange rates among all states was used. For each morphological character, its number of states N is a randomly drawn number from the empirical distribution of the number of states in the original morphological data Fig.

The relative evolutionary rate r of the character is randomly drawn according to a Pearson correlation of 0. Specifically, we draw a random variable n' from the empirical distribution of the number of states and compute. We then normalize r such that the mean r from all characters equals 1. The character evolution then starts from a random initial state at the tree root and evolves by a Markov chain along tree branches.

Molecular characters were similarly simulated. Fifty simulations were conducted, each composed of 20, morphological characters and 40, molecular characters. The number of states used to generate each character and the number of substitution steps in evolution were recorded for downstream analysis. Quartet analysis based on a randomly picked simulation showed that the properties of these characters resemble those of the real data.

Because the evolutionary models of morphological characters have not been well established, model-based tree inference is not used here. Instead, we inferred maximum parsimony trees using PAUP4. When analysing the real data, 1, replicated heuristic searches were performed with parameters from the original study All fossil taxa were included when morphological characters were used in the inference.

Consensus trees were derived when multiple equally parsimonious topologies were found, with a strict collapse of branches and equal weights of all topologies.

In the analysis of simulated characters, 5, replications were used instead of 1, Bootstrap tests were conducted in PAUP with 1, replicates unless otherwise mentioned. Bootstrap values were calculated and mapped by custom Python scripts; equal weights were given to all equally parsimonious trees resulting from each bootstrapped data set. As a control, we randomly drew 1, morphological and 2, molecular characters from all 9, characters and conducted a phylogenetic analysis.

This control was repeated 50 times. How to cite this article: Zou, Z. Morphological and molecular convergences in mammalian phylogenetics. Springer, M. Science , Jarvis, E. Against this background, what is the potential role of phylogenetics in enhancing our understanding of emergence and spread of drug resistance? First, who are the main transmitters of drug resistance, and are they receiving ART? Second, what is the contribution of transmission during acute infection to the spread of drug resistance?

Third, what is the persistence of drug-resistant virus strains within the population? Finally, as PrEP becomes widespread, can we identify the emergence and transmission of resistant strains from patients who are infected while receiving PrEP? HIV viruses rapidly accumulate genetic variation because of short generation times and high mutation rates.

Phylogenetic inference methods use these variations for reconstruction of phylogenies phylogenetic trees from contemporary sequencing data. The root of the tree represents the ancestral lineage, and the tips correspond to the virus sequences at the moment of sampling.

Going from the root to the tips corresponds to moving forward in time. When a lineage splits speciation , it is represented as a branching node of the phylogeny. When the sampling is dense, such a split can be interpreted as a virus transmission infecting a new individual, and the whole tree is an approximation of the transmission tree [ 15 ].

To access the robustness of the reconstructed tree, the support values on its branches can be calculated using statistical methods, such as bootstrap [ 16 ]. These values tend to decrease when going back in history, from tips to the root. To remove the uncertain data from the study, genetic clusters are often used instead of the whole tree. Such clusters correspond to the well-supported subtrees that contain sequences closely related to each other and distant from the rest of the tree see [ 17 ] for an overview of genetic clustering methods.

A cluster of sequences that also share a common trait values eg, geographic location, risk group, presence of a given drug resistance mutation [DRM] is called a phylotype [ 18 ]. The branch lengths in genetic clusters are typically short, and therefore a cluster can be interpreted as representing a recent outbreak, as, for example, when a virus acquires a DRM under drug-selective pressure and the patient starts transmitting the resistant virus.

The root of the cluster would correspond to the first transmission event. Viral phylodynamics is defined as the study of how epidemiological, immunological, and evolutionary processes act and potentially interact to shape viral phylogenies [ 19 , 20 ].

Phylodynamics methods have been used to estimate the parameters shaping the emergence of drug resistance and spread of resistant viruses, such as, for example, the persistence time of DRMs in the untreated population. Wensing et al [ 21 ] used phylogenetic reconstruction and genetic clustering to study the persistence of DRMs in HIV infected treatment-naive patients from 19 countries across Europe.

They found a significant difference in the level of baseline resistance between recently infected patients The origin of TDR has been addressed by several groups. Yerly et al [ 22 ] reconstructed HIV transmission clusters in Geneva using phylogenetic analysis, showing that newly diagnosed HIV infections are a significant source of onward transmission, notably of resistant strains. The same conclusion was reached by Lewis et al [ 24 ] in a study including approximately patients from London, predominantly men who have sex with men, using a similar transmission-cluster-based approach.

Mourad et al used a parsimony-based approach [ 27 ] to extract phylotypes of sequences, the most recent common ancestor of which was bearing a DRM that is still shared by the majority of the sequences in the phylotype. The simplicity of the method makes it computationally very efficient.

Moreover, reversion to wild type occurred at a low frequency, and drug-independent reservoirs of resistance have persisted for up to 13 years. These conclusions are very close to those of Drescher et al [ 28 ], who studied the transmission of resistances among men who have sex with men in the Swiss HIV Cohort.

Their method was different, because they did not reconstruct the ancestral resistance status of the sequences, but they also extracted well -supported transmission clusters from a large sequence phylogeny and searched for the potential sources of the resistances observed in these clusters. The discrepancy between the results obtained by Mourad et al [ 26 ] and Drescher et al [ 28 ] and those obtained by Audelin et al [ 23 ] and Lewis et al [ 24 ] is most likely attributable to the size of the data sets, ranging from approximately [ 24 ]; published in to approximately [ 26 ]; published in When the proportion of missing data is high, it is not possible to determine the origin of the transmission for isolated drug-naive patients harboring DRMs.

In summary, we argue for building phylogenetics into a more detailed epidemiological surveillance of HIV drug resistance. With an ever-reducing cost of genetic sequencing, there is a move to generate full-length HIV sequences [ 29 ].

This has the capacity to increase the phylogenetic resolution, owing to longer sequences. Through a large simulated data set, we have shown that the accuracy of trees was nearly proportional to the length of sequences, with gag-pol-env data sets showing best performance compared with the partial pol sequences commonly created through drug resistance testing [ 30 ].

An added advantage of extended sequencing is the ability to capture integrase inhibitor resistance. Care must be taken in the sampling frame in the context of HIV prevalence, to produce realistic estimates. This will facilitate a better understanding of the drivers of resistance spread, the source of transmitted resistance, and how this is changing over time in the face of ARV drug rollout.

Supplement sponsorship. Potential conflicts of interest. All authors: No reported conflicts. Conflicts that the editors consider relevant to the content of the manuscript have been disclosed. Emergence of acquired HIV-1 drug resistance almost stopped in Switzerland: a year prospective cohort analysis.

Clin Infect Dis ; 62 : — 7. Google Scholar. Science ; : — PLoS One ; 5 : e AIDS ; 30 : — Predicted levels of HIV drug resistance: potential impact of expanding diagnosis, retention, and eligibility criteria for antiretroviral therapy initiation. AIDS ; 28 suppl 1 : S15 — Pre-exposure prophylaxis to prevent the acquisition of HIV-1 infection PROUD : effectiveness results from the pilot phase of a pragmatic open-label randomised trial.

Lancet ; : 53 — On-demand preexposure prophylaxis in men at high risk for HIV-1 infection. N Engl J Med ; : — Conservation : Phylogenetics can help to inform conservation policy when conservation biologists have to make tough decisions about which species they try to prevent from becoming extinct. Bioinformatics and computing : Many of the algorithms developed for phylogenetics have been used to develop software in other fields. Coming soon…?

With the advent of newer, faster sequencing technologies, it is now possible to take a sequencing machine out to the field and sequence species of interest in situ.

Phylogenetics is needed to add biological meaning to the data. Phylogenetics An introduction. Open Tree arrow-right-1 Course overview Search within this course What is phylogenetics?



0コメント

  • 1000 / 1000