How I Learned to Stop Worrying and Love the Cladogram

Halichoeres · July 10, 2015, 02:00:01 PM

Based on this topic, I thought people might like to know why cladograms are so prone to changing, so here's a primer on how cladistics works in a conceptual sense and some of its limitations. Please feel free to ask any questions or to point out errors.

The cladogram is one of the most important conceptual tools in evolutionary biology. A cladogram is our representation of phylogeny, that is, a drawing of our best estimate of the tree of life. Tree diagrams have a history as old as the idea of natural selection: the only illustration in On the Origin of Species is a schematic tree.

"Cladogram" literally means "branch drawing," and it is a topological representation. What that means is that only the connections matter, so we read it along the branches (for many of you, this is review, but it's a common mistake that I think is worth addressing). So for example, the following two phylogenies are identical, even though they might look different at first glance.

Baum et al., 2005 Science

Many people interpret the tree on the right to mean that frogs are closer to mice than to lizards, but if you trace the branches, you can see that lizards and mice are equally closely related to frogs. If that seems odd, take a moment to slowly walk along the branches mentally, to convince yourself that it is true.

I said before that the cladogram is only the drawing; the pattern that it represents is the phylogeny, or the real evolutionary history of organisms. People often use the two interchangeably, but you should be aware that many systematists (people who study the interrelationships of organisms) make the distinction. The consequences of using them interchangeably are pretty small though, so don't stress about it. The tree we draw is always a hypothesis, and therefore it can always be refuted by new data. This is one fundamental reason that cladograms change, but it isn't the only one. Other reasons that they change have to do with how they are constructed. So how do we make the trees in the first place?

Building trees
Our view of the tree of life used to be just expert opinion, which was often correct but nowhere close to always. Occasionally people still do it this way; we facetiously call the trees they make "expertograms." In 1950, though, a German entomologist named Willi Hennig tried to introduce some objectivity into the process, and phylogenetic systematics was born. His idea was to just count up similarities and differences in a matrix and let that tell you who is most closely related, rather than just seizing on one particular trait and saying that it held the real answer, which was often the case before (and still is among people not trained in systematics). The process was refined by several American and British biologists and mathematicians, and produced a formal criterion call parsimony. Parsimony is an instantiation of Occam's Razor, which essentially says, 'don't add unnecessary hypotheses.' When you trace the evolution of traits along a proposed tree, each change in a trait represents its own hypothesis. Parsimony forces you to select the tree that requires the fewest possible changes—the fewest ad hoc hypotheses.

So who's still with me? Parsimony has some problems. It used to be used for every kind of data and uses a matrix like this:

Aaen AGAGGTCCTACCTGCCCAGTGATTTAATTAAACGGCCGCGGTATTTTGACCGTGCAAAGGTAGCGCAATC
Abez AGAGGTCCTACCTGCCCAGTGATTTTATTAAACGGCCGCGGTATTTTGACCGTGCAAAGGTAGCGCAATC
Amex AGAGGTCCTACCTGCCCAGTGATTTAATTAAACGGCCGCGGTATTTTGACCGTGCAAAGGTAGCGCAATC

If you look closely, you can see that there are a few differences among these gene sequences (this is only a small fragment of the 16S locus for three species of fish). We could just count the differences, which is just about what parsimony does. The thing is that some characters are more likely to change than others. We have pretty good ideas about how this works for molecules: with DNA data, for example, we know that third codon positions are much more likely to mutate (well, strictly speaking, mutations at these sites are only more likely to go to fixation). We have good estimates of mutation rates for different positions in different kinds of genes in different kinds of organisms, and so now when we build a tree from DNA data, we program computers to decide which data should be assigned greater importance. It's still objective, and it reflects what we know about DNA evolution. We call this kind of approach model-based, or more specifically maximum likelihood or Bayesian inference depending on the particulars.

It turns out that, mathematically, parsimony is a special case of maximum likelihood that we call over-parameterized. Likelihood approaches stop adding rate estimates to your model when the improvement achieved by adding more estimates reaches a point of diminishing returns. Parsimony always adds the maximum number of rate and branch length estimates. Why does this matter for the tree? Think of it as making a life-sized map of the world: it has a lot of information, but it's pretty useless. If you're unsure what the world looked like in the first place, trying to build a life-sized map of the world would give you something that was the size of the world but completely incorrect. This is what happens to parsimony. Simulation studies show that when you add more character data, it can actively mislead you. I believe this is the main reason that trees change so much. This is a weird result and deserves more attention. So the question we're all wondering about is:

How do more characters make worse trees?
First, what's a character? It's any trait. Any trait at all. What that trait looks like in a particular taxon is called its character state. Parsimony compares character states and finds the tree that makes the fewest state changes as you trace the tree. But you still have to tell it the characters and character states. For example, most mammals have incisors. You could decide that one column of your matrix is whether an animal has incisors or not. Then another column might be 8 incisors, like us, versus 10+ incisors, like an opossum. Another column might be whether the incisors are shaped like ours, or pointy like a lion's, or tusks like an elephant. Each of these is a perfectly valid trait to consider. The problem is that with parsimony, when you start having more characters than taxa, you start looking at different tiny aspects of the same characters more than once. I've just counted the incisors three times. Is that valid or not? Who knows? There is no answer to that for morphological characters.

DNA analysis never uses parsimony anymore (or at least, it shouldn't) because we have all these models of DNA evolution that let us better estimate changes. There are no good models yet for how morphology changes, although some people are trying to develop them. So for now, when we study animals whose DNA we don't have access to, we have to use morphology. And that means we have to use parsimony. And as I discussed above, parsimony gives you high confidence in wrong trees when you have too many characters. This is probably because you're accidentally counting things twice. This is particularly easy to do with fossils because we have a limited number of trait types preserved for all taxa. It's a problem because parsimony assumes that all characters are independent—if you have two or three tooth characters, that assumption is slightly violated, but if you have 35, it's seriously violated. So when I see a dinosaur tree that uses 600 characters, I assume that much of the tree is wrong because they are artificially inflating the number of useful characters and egregiously violating the assumptions of parsimony.

One other reason that trees change is that all of the mathematics of tree-building only allow you to find sister-group relationships. The math literally will not let you find mothers. The fossil record almost certainly contains at least some direct ancestors of living animals, but we can never recover them that way because the math won't let us. So hypothetically, if you had one mother species and two daughter species, you could recover the following trees:

but not the correct tree:

(Sharp-eyed readers will note that a third incorrect tree is possible, wherein daughter species 1 forms a clade with the mother species, to the exclusion of daughter species 2.) In the true tree, daughter species occupy tips and the mother species occupies a node. The trees we build have nodes; however, they are not occupied by any of the taxa under study, but hypothetical common ancestors of the tips. All of our study taxa are required by the algorithm to occupy tips (this is true of both parsimony and model-based approaches).

Finally, new taxa are often weird—they present genuinely novel combinations of character states. Even if we had good models of morphological evolution, and even if we could reconstruct mother lineages, our trees will always be at least a little wrong just because of the things that we don't yet know were out there.

All of these things combined mean that our knowledge of the tree of life will always be incomplete. But the fact that more and more fossil finds confirm what existing trees tell us is encouraging. Like Newton, we are getting to be less wrong than we were before. But as long as we have to use parsimony to construct trees for fossil taxa, and especially if we double-count anatomical features when we construct those trees, the trees will be sensitive to the addition of new taxa with new combinations of character states.

Dinoguy2 · July 10, 2015, 04:10:49 PM

Great post! I just want to mention that, correct me if I'm wrong, this tree can also be viewed correct just incompletely labelled. The mother is shown as a sister group, but it is possible for a sister group to ALSO be a common ancestor. So, for example, look at this tree from the new Wendiceratops paper:

If I understand correctly, based on this tree it is possibly that Einiosaurus is the common ancestor of Achelousaurus and Pachyrhrinosaurus, and Achelousaurus is the common ancestor of all Pachyrhinosaurus, and they simply evolved one into the other over time. Because cladograms cannot show a direct lineage like that, they come out as successive sister groups when in fact it's entirely possible for them to have a grandmother-mother-daughter relationship. By the same token, based on this tree Wendiceratops could have been the direct ancestor of Sinoceratops, etc.

Halichoeres · July 10, 2015, 05:08:15 PM

Quote from: Dinoguy2 on July 10, 2015, 04:10:49 PM

Great post! I just want to mention that, correct me if I'm wrong, this tree can also be viewed correct just incompletely labelled. The mother is shown as a sister group, but it is possible for a sister group to ALSO be a common ancestor. So, for example, look at this tree from the new Wendiceratops paper:

If I understand correctly, based on this tree it is possibly that Einiosaurus is the common ancestor of Achelousaurus and Pachyrhrinosaurus, and Achelousaurus is the common ancestor of all Pachyrhinosaurus, and they simply evolved one into the other over time. Because cladograms cannot show a direct lineage like that, they come out as successive sister groups when in fact it's entirely possible for them to have a grandmother-mother-daughter relationship. By the same token, based on this tree Wendiceratops could have been the direct ancestor of Sinoceratops, etc.

Well, sort of! If we were drawing the cladogram non-algorithmically, we could draw it with mother species at the nodes, which is where they belong in my example. The nodes are explicitly interpreted as inferred ancestors, after all. But all current phylogeny reconstruction methods are some type of clustering algorithm, so they can join things into groups but are incapable of finding that one gave rise to the other. So the tree in the Wendiceratops paper plainly shows Wendiceratops as sister to Sinoceratops. Similarly, it shows Achelousaurus as sister to Pachyrhinosaurus. A paleontologist might look at this result, and, armed with the knowledge that the reconstruction method is incapable of recovering ancestor-descendant relationships, interpret the sister relationship as evidence for direct ancestry (you see this the most with trees of hominids). The tree doesn't say one is descended from the other, but we, the interpreters of the tree, can say that the sister relationship is consistent with a hypothesis of direct ancestry.

I think that the Wendiceratops+Sinoceratops clade in particular is to be taken as provisional for another reason: long-branch attraction. Under parsimony, isolated taxa on long branches with many changes can tend to glue themselves to things that aren't in fact their closest relative, because of a phenomenon called homoplasy. Homoplasy is basically just ambiguity in the tree, and can result from either coincidence or very similar selection pressures in non-sister taxa--basically, just two things having similar traits for any reason other than common ancestry. An infamous example of this is a paper that found that guinea pigs were sister to rabbits, rather than related to other rodents such as mice and squirrels. Long branch attraction is usually fixable with denser taxon sampling, but with fossils that's harder to do. In this case, Sinoceratops is on a very long branch, and I suspect that new taxa will play hob with the tree shown here.

DinoLord · July 10, 2015, 05:20:50 PM

This is a great thread! Personally I've always enjoyed seeing how different interpretations of cladistics vary over time with different publications and new discoveries.

paleoferroequine · July 13, 2015, 09:57:52 PM

Thanks for posting this . It helps quite a bit. There is an awful lot of ceratopsid material missing in China to get from Wendiceratops in N.A. to Sinoceratops in China three-four million years later. I realize they are not directly related, there is a 10 my gap between Zuniceratops and everything else. Hopefully more taxon will be recovered in China and elsewhere.

Halichoeres · July 14, 2015, 05:22:55 PM

Glad to be of service! The late Cretaceous radiation of ceratopsians really is perplexing and to my mind suggests runaway sexual selection. That alone would make morphological homoplasy in skull features very likely, if preferences for secondary sexual characteristics were shared among different lineages.

paleoferroequine · July 14, 2015, 06:43:54 PM

Quote from: Halichoeres on July 14, 2015, 05:22:55 PM
Glad to be of service! The late Cretaceous radiation of ceratopsians really is perplexing and to my mind suggests runaway sexual selection. That alone would make morphological homoplasy in skull features very likely, if preferences for secondary sexual characteristics were shared among different lineages.

Good example would be the chasmosaurine Medusaceratops from Judith River Montana and the centrosaurine Albertaceratops from Oldman Alberta. Very similar morphology, same general area and time.

News:

How I Learned to Stop Worrying and Love the Cladogram

Halichoeres

Dinoguy2

Halichoeres

DinoLord

paleoferroequine

Halichoeres

paleoferroequine