Monday, April 23, 2018

A (wal)nut to crack – what a network tells you that no tree can

In this post, I will show a network that I generated some time ago as illustration of a point: morphological data should not be used to infer trees, but networks, instead — especially when the goal is to place some fossils in a modern-day phylogenetic framework.

In 2007, Manos et al. (Systematic Biology 56:412–430) published an interesting phylogenetic study that provided a phylogenetic framework to place some enigmatic fossils of the Juglandaceae, the walnut family. Following my preferred procedure (presumably without realizing it), they recruited a palaeobotanical expert to erect a morphological partition.

Given the high quality of the matrix, this is an ideal example to demonstrate the utility of networks in (palaeo)phylogenetic research and to discuss the question of potential ancestor-descendant relationships, and their poor representation in trees (especially cladograms). Phylogenetic relationships within modern Juglandaceae are relatively well resolved. Rhoiptelea, a relict genus found in the mountains of northern Vietnam and south-western China, is sister to the remainder of the family — it is now subfamily Rhoipteleoideae, but was traditionally its own family. Rhoiptelea is an living fossil: flowers with fitting in-situ pollen and seeds have been found in the Late Cretaceous (Heřmanová et al. 2011, IJPS 172: 285–293; cryptically named Budvaricarpus serialis, the "Serial Budvarseed", because one is not allowed to use a modern-day genus for naming a 85–90 million year old angiosperm, even when it looks the same). The remainder of the Juglandaceae falls into two main clades, recognized as subfamilies:
  1. the Juglandoideae — the walnuts (Juglans) and their closest relatives: the (eastern) North American-East Asian disjunct genus Carya, the Eurasian relict genus Pterocarya (mainly Transcaucasia, East Asia), and the monotypic genera Cyclocarya and Platycarya.
  2. the Engelhardioideae — a group of tropical-subtropical, mostly relict genera: Alfaroa + Oreomunnea in the equatorial regions of the New World; and South East Asian-Malesian genus Engelhardia and the, probably monotypic, Alfaropsis widespread in China (sometimes still included in Engelhardia; e.g. current Flora of China, despite unambiguous molecular and morphological evidence).
Juglandaceae produce (winged) seeds and pollen that are relatively easy to identify. They are well-known and very common companions of palaeontologists during much of the Cenozoic, especially the (today geographically very restricted) Engelhardioideae. But in addition to the modern genera, the family includes some very interesting, unique fossils — the idea is to place these in a phylogenetic framework.

Results of the study of Manos et al. (2007).
Arrows indicate the position of the fossils. a) A majority rule consensus cladogram using a cut-off of 50 based on the morphological partition; b) the total evidence counterpart.

As can be seen from the above trees (taken from the paper), morphology reflects some of the molecular phylogenetic relationships — the Juglandoideae are supported as a clade, as are most genera (except for Engelhardia and Oreomunnea). Two fossils, Pal(a)eoplatycarya and Platycarya americana were resolved as sister taxa to their modern counterpart, Platycarya strobilacea; and the two enigmatic fossils Polyptera (the "many-winged one") and Cruciptera (the "cross-winged one") could be associated with the Juglandoideae. The total evidence approach indicated that Cruciptera is part of the "crown-group" Juglandoideae, in contrast to Polyptera, that appears at a more "basal" (root-proximal) position in this subclade. A sixth fossil, Pal(a)eooreomunnea could not be resolved with certainty (placed as sister to all Juglandoideae in the total evidence tree). As the name indicates, literally the "Ancient Oreomunnea", we would have expected it to group with the Engelhardioideae, which form a clade in the total evidence tree.

This is okay so far as it goes but, beyond potential sister relationships, these cladograms show very little. When I place a fossil such as Cyclocarya in the phylogeny, I would like to know whether it is more closely related to Juglans, Pterocarya or Cyclocarya. Is it an early sister lineage of all of these, or even a precursor? Cladograms cannot answer such questions.

The persistent issue of pseudo-clades

It has been pointed out in earlier posts that clades/grades are not necessarily synonyms of Hennig's concepts of monophyly and paraphyly, mainly because of convergent evolution creating data splits that are incongruent with the true tree. Parsimony-based analyses are especially vulnerable, because each change represents a step to be optimized.

One alternative method to place fossils in a (molecular-based) phylogenetic framework is the evolutionary placement algorithm (EPA; Berger & Stamatakis 2010, AICCSA conference paper). This changes to a probabilistic framework, and queries each fossil alone using its morphological partition but using the molecular-based tree as framework.

Summarized result of the evolutionary placement algorithm as implemented in RAxML.
The number represents a probability to join the fossil at the according branch using maximum likelihood as optimality criterion.

This gives the above tree as the result for the Walnut data set. Palaeooreomunnea is now unambiguously linked to one of the two included species of Oreomunnea, O. mexicana. Cruciptera is associated (again unambiguously) with Cyclocarya. Furthermore, not only are Palaeoplatycarya and the extinct North American Platycarya relatives of the modern-day Platycarya, but also Polytera. This, according to the original analysis, is the first-branching member of the remainder of the Juglanoideae, ie. all genera except Platycarya.

And the network shows us why

The most important problem with morphological data sets is that their signals are complex, and usually not very tree-like. Hence, whenever we optimize fossils along a tree (either by directly analyzing the morphological data or by some form of total evidence approach), the analysis has to fit in this odd little OTU at all cost, even when it means collapsing an entire clade. Simultaneous optimisation of two or more fossils triggers further branching artifacts, and may decrease branch support, because we have no molecular data compensating for eventual branch attraction conflicting with the actual phylogeny.

Let's take the Polyptera as an example. If we de-root the trees, the original total evidence placement and the ML-EPA are not that different from each other: Polyptera is just moved one node. A easily inferred Neighbour-net, which is not 1-dimensional like a phylogenetic tree, but 2-dimensional, shows the reason why (and only by using the morphological data partition).

The neighbour-net based on the Manos et al.'s morpho-data partition.
Numbers at branches represent nonparametric boostrap support (Least-squares and Maximum parsimony criteria) and Bayesian posterior probabilities.

  • We can see that Polyptera has a unique morphology (it shows the longest terminal edge of all fossils), making it equally similar to Platycarya and the remaining Juglandoideae: Juglans, Pterocarya, Cyclocarya, and Carya (Annamocarya is a not-widely-accepted Chinese genus, genetically indistinct from other East Asian Carya). This explains its instability in tree-based reconstructions. Assuming that Rhoiptelea points to the actual root, one could use the relatively high branch support values as an argument to say that Polyptera evolved after Platycarya split from the remainder of the Juglandoideae. But the network shows that the signal is not that straightforward, and Polyptera may just be a third lineage within the Juglandoideae (note the short orange edge bundle in contrast to the large red and green ones). A crucial question to check, also regarding the ML-EPA result, is whether the orange-edge clade (including Polyptera) is supported by uniquely shared characters and not just a tree-branching artifact because of the distinctness of the Platycarya group. Being substantially distinct (genetically and morphologically) from the remainder of the Juglandoideae, they must be placed as sister taxa. Being a fossil Polyptera is not that distinct, hence, placed in the Juglandoideae core clade. Distance-based and parsimony methods are more vulnerable to long-branch attraction (or short-branch culling) than is ML; and Bayesian analysis optimizes to a tree best comforting all signals in the data (compatible or not).
  • Cruciptera is more similar to Cyclocarya and Pterocarya than to Juglans, and represents a more primitive (ancestral) form. Based on the position of Cyclocarya and Pterocarya, we can directly conclude that they are morphologically less derived than Juglans, their sister taxon. Hence, one should be careful interpreting Cruciptera as a precursor of eg. Pterocarya, but would have to go back into the matrix and assess which characters differentiate within this part of the graph, in order to decide whether the similarity between them is a genuine representation of shared (common) origin, and not just due to symplesiomorphies.
  • The fossil counterparts of modern-day Platycarya span a quite prominent box-like structure in the network, but the blue edge has little support from tree-based analyses. A simple explanation would be that these two more ancient members of the Platycarya lineage, and are less derived than their modern counterpart and the other Juglandoideae.
  • Palaeooreomunnea is placed as one would expect for an ancestral form of the Engelhardoideae. It is clearly closer to the New World pair Alfaroa and Oreomunnea than to the Old World Alfaropsis and Engelhardia.
Data & software for EPA

The data matrix that I used for the ML-EPA, the Neighbour-net and the competing branch support analyses can be found in the supplementary information of the original paper.

EPA is implemented in RAxML since Version 7 and usually used to place environmental short sequence reads (Berger et al. 2011, Syst. Biol. 60:291–302). For a published application of EPA to place fossils, see e.g. Bomfleur et al. 2015, BMC Evol. Biol. 15:126.