Using Molecular Evidence in Classification Revision Notes for OCR A-Level Biology A

Using Molecular Evidence in Classification

Development of molecular phylogeny

Until the mid-twentieth century, taxonomists classified organisms using data from anatomy, morphology, behaviour, physiology and cell structure. The development of new technologies enabled scientists to study macromolecules, particularly the sequences of amino acids in proteins and nucleotides in DNA. This led to the emergence of molecular phylogeny – a branch of biology that uses molecular data to determine evolutionary relationships between organisms.

infoNote

Molecular evidence provides independent verification of classification systems based on structural features, and often reveals relationships that are not apparent from morphology alone. For example, molecular data shows that fungi are more closely related to animals than to plants, which is not obvious from their structure or growth patterns.

Using antibodies to compare species

Early molecular phylogeny research (beginning in 1904) used antibodies to detect similarities between proteins in different species. This method exploits the specificity of antibody-antigen binding.

Methodology

The technique involves several steps:

Blood is taken from a species (e.g., humans) and processed to obtain serum by removing cells and preventing clotting
This serum is injected into a test animal such as a rabbit or mouse
After approximately one week, the test animal produces an immune response – B cells specific to the foreign proteins divide to form plasma cells
These plasma cells secrete antibodies against the injected proteins
Blood is taken from the test animal and processed to create an anti-serum containing these antibodies
The anti-serum is then mixed with serum from various species

Interpretation of results

When the anti-serum is mixed with serum from different species, the degree of precipitation indicates the similarity between proteins:

Higher precipitation = greater similarity between proteins
Lower precipitation = fewer shared proteins or more divergent protein structures

lightbulbExample

Worked Example: Interpreting Antibody Precipitation Results

When rabbit anti-serum produced against human plasma proteins was tested:

Human serum: $100\%$ precipitation
Chimpanzee serum: $97\%$ precipitation
Cow serum: $10\%$ precipitation

This indicates that chimpanzees and humans share approximately $97\%$ of their plasma proteins, while humans and cattle share only about $10\%$ . These results reflect evolutionary relationships – humans and chimpanzees diverged from a common ancestor more recently than either diverged from cattle.

Protein sequencing

An alternative approach involves directly comparing the amino acid sequences of specific proteins. Cytochrome c has been particularly useful for this purpose because it:

Occurs in many organisms where it plays a key role in respiration
Is located in the inner membranes of mitochondria in eukaryotes
Consists of a single polypeptide of approximately $100$ amino acids
Has a conserved structure across diverse species

Reading amino acid sequences

To interpret protein sequences, you need to understand both the three-letter and one-letter codes for amino acids. These codes are standardized across biology. The table below also shows the DNA codons (on the coding strand) that specify each amino acid.

Constructing phylogenetic trees from protein data

Researchers align amino acid sequences from multiple species and compare them position by position, identifying similarities and differences. The number of differences indicates evolutionary distance – species with fewer differences shared a more recent common ancestor.

infoNote

The branch lengths in phylogenetic trees are drawn proportional to the number of sequence differences. This produces a visual representation of evolutionary relationships.

This phylogenetic tree based on cytochrome c sequences clearly reveals the three main kingdoms of eukaryotes: fungi, animals and plants. Such molecular trees generally agree closely with classification systems based on morphology and anatomy, providing independent evidence for common descent and evolutionary relationships.

DNA hybridisation

DNA molecules consist of two complementary strands held together by hydrogen bonds between base pairs. DNA hybridisation exploits this structure to compare genetic similarity between species.

Principle

Heating DNA to approximately $90°C$ breaks the hydrogen bonds between bases, separating the two strands
When single-stranded DNA from two different species is mixed and allowed to cool, the strands attempt to bond together
However, genetic differences between species mean the match is imperfect
More imperfect matches have weaker hydrogen bonding and require less heat to separate the strands
Closer matches require more heat to break the bonds

Applications

infoNote

DNA hybridisation provided early evidence that chimpanzees and humans are more genetically similar to each other than either is to gorillas or orangutans. This molecular evidence supported earlier antibody-based research showing similar patterns.

DNA sequencing

DNA sequencing has become the most powerful method for determining evolutionary relationships. It provides more detailed and comprehensive data than either antibody comparisons or protein sequencing.

Advantages over protein sequencing

DNA sequencing offers several key advantages:

chatImportant

More detailed information: Two species may have identical amino acid sequences in a protein, but different DNA sequences coding for that protein. This is because the genetic code is degenerate – most amino acids are specified by two or more different codons. For example, leucine has six different codons (TTA, TTG, CTA, CTC, CTG, CTT). Comparing DNA sequences reveals these hidden differences.

Detection of mutations: DNA sequencing can identify:

Silent mutations: changes to base pairs that produce the same amino acid due to code degeneracy
Neutral mutations: changes that alter the amino acid but not the functional properties (e.g., substituting one non-polar amino acid for another)
Functional changes that have become fixed in populations

Greater accessibility: Modern sequencing technology makes it easier to sequence DNA than proteins, and databases contain extensive sequence information for comparison.

Stability: DNA is more stable than protein, allowing sequencing of genetic material from extinct organisms such as:

Mammoths (extinct approximately $4500$ years ago)
Neanderthals (remains dated to $38000$ years ago)

Choice of genes for sequencing

Different genes evolve at different rates, making them suitable for answering different evolutionary questions:

infoNote

Slowly changing genes (e.g., ribosomal RNA genes):

Useful for determining relationships between higher taxonomic groups (domains, kingdoms, phyla)
rRNA genes are particularly valuable because they change slowly and are found across all life forms

Rapidly changing genes:

Help identify relationships between closely related species
Useful for studying recent speciation events
Can distinguish between isolated populations of the same species

Mitochondrial DNA:

Does not contain non-coding sequences within genes, making alignment easier
Not combined with histone proteins ('naked' DNA)
Present in multiple copies per cell
Has a higher mutation rate, useful for studying closely related organisms
Inherited maternally with relatively little change
Extensively sequenced for many species

Housekeeping genes:

Code for proteins required for fundamental cellular processes common to all organisms
Provide useful comparative data across diverse species

Non-coding DNA sequences

In eukaryotes, much nuclear DNA does not code for proteins. These non-coding sequences have various functions or may have no known function. They are relatively free to accumulate mutations without affecting protein structure, and therefore provide valuable information about evolutionary relationships and genetic drift.

Case study: marsupials and placentals

Researchers compared rRNA gene sequences from mitochondria in eight mammalian species – four marsupials (including the extinct thylacine) and four placentals. The table shows nucleotide positions 29-41 in aligned sequences:

lightbulbExample

Worked Example: Interpreting Sequence Alignment Data

Key observations from this data:

All eight species share approximately $50\%$ of nucleotides at these positions – far higher than would occur by chance, indicating common ancestry
Some positions are identical across all species (e.g., positions 29, 30, 33, 35, 36, 39, 40, 41 are highly conserved)
Position 31 shows clear separation: marsupials have A, placentals have G
Position 38 shows variation among marsupials (T or C) but is consistently G in placentals
Position 32 varies among placentals but is consistently T in marsupials

These patterns help identify which species are more closely related and support the separation of marsupials and placentals into distinct groups.

Gene barcoding

DNA sequencing enables species identification through gene barcoding – using specific gene sequences as unique identifiers for species. Researchers:

Sequence particular genes from identified specimens
Publish these sequences as 'barcodes'
Deposit barcodes in databases for public access
Use barcodes to identify unknown specimens

infoNote

This barcode shows the first $193$ bases in the gene for mitochondrial cytochrome oxidase from the fungus Armillaria gallica. Each nucleotide (A, T, C, G) is represented by a different colour. Barcoding has revealed that fruiting bodies of this species found many miles apart can belong to the same underground mycelium network.

Databases for molecular phylogeny

Molecular sequence data is stored in publicly accessible online databases, enabling researchers worldwide to share and compare information:

The Tree of Life (http://tolweb.org)

Provides information on biodiversity, characteristics of organism groups and evolutionary history

NCBI Taxonomy Database (www.ncbi.nlm.nih.gov/taxonomy)

Contains information on approximately $10\%$ of described species
Linked to DNA and protein sequence databases
Part of the National Center for Biotechnology Information

Animal Diversity Web (http://animaldiversity.ummz.umich.edu)

Curated by University of Michigan
Contains data on animal natural history, distribution, classification and conservation

The National Herbarium of The Netherlands (http://vstbol.leidenuniv.nl)

Largest single plant database
Records of over $2$ million specimens

Encyclopedia of Life (EoL) (http://eol.org)

Aims to provide a web page for every species
Aggregates information from museums, scientific societies and experts
Single, easy-to-use online portal

infoNote

These databases allow researchers to:

Compare new sequences with existing data
Identify species from molecular evidence
Study evolutionary relationships across the tree of life
Access information without geographical or institutional barriers

Classification systems and molecular evidence

Molecular evidence has transformed how biologists classify organisms:

Confirms traditional classifications: Molecular phylogenetic trees often agree with classification systems based on morphology and anatomy, providing independent verification of evolutionary relationships.

Reveals hidden relationships: Molecular data can show connections not apparent from structure alone, such as the closer relationship between fungi and animals than between fungi and plants.

Enables comparison of similar species: DNA sequencing can distinguish between closely related species that lack obvious structural differences.

chatImportant

Dynamic and evolving: Classification systems are continuously updated as new molecular evidence emerges. Both the discovery of new species and reinterpretation of relationships between known species mean that taxonomic groupings and names change over time.

Case study: Australasian cicadas

Researchers investigated whether cicadas migrated to New Zealand from Australia by sequencing mitochondrial DNA from $14$ species across Australia, New Zealand and New Caledonia.

lightbulbExample

Worked Example: Interpreting a Phylogenetic Tree

The phylogenetic tree shows that:

Eight cicada species in New Zealand form two distinct groups
One group (species 1-6) is closely related to species from New Caledonia (species 7-8), suggesting possible migration via New Caledonia
Another group (species 11-12) is more closely related to Australian species, suggesting direct migration from Australia
Species 13-14 (Australian species) are most distantly related to all others

This molecular evidence supports the hypothesis that New Zealand cicadas originated from two separate migration events from Australia, occurring within the last $11.6$ million years.

Remember!

bookmarkSummary

Key Points to Remember:

Molecular phylogeny uses DNA and protein sequences to determine evolutionary relationships, providing independent evidence that complements traditional classification based on morphology
Antibody-based methods provided early molecular evidence by measuring protein similarity through precipitation reactions; greater precipitation indicates closer evolutionary relationships
DNA sequencing advantages: reveals silent and neutral mutations hidden in protein sequences, provides more detailed data due to the degenerate genetic code, and enables analysis of extinct organisms
Gene choice matters: slowly evolving genes (like rRNA) are useful for comparing distantly related groups, while rapidly evolving genes help distinguish closely related species; mitochondrial DNA is particularly useful due to its structure and inheritance pattern
Classification is dynamic: molecular evidence continuously refines our understanding of evolutionary relationships, meaning taxonomic groupings and species names change as new data emerges

Using Molecular Evidence in Classification (OCR A-Level Biology A): Revision Notes