Jump to content

User:Manudouz/sandbox/NJ

From Wikipedia, the free encyclopedia

Working example[edit]

Neighbor joining with 5 taxa. In this case 2 neighbor joining steps give a tree with fully resolved topology. The branches of the resulting tree are labeled with their lengths.

The working example is based on a JC69 genetic distance matrix computed from the 5S ribosomal RNA sequence alignment of five bacteria: Bacillus subtilis (), Bacillus stearothermophilus (), Lactobacillus viridescens (), Acholeplasma modicum (), and Micrococcus luteus ().[1][2]

First step[edit]

  • First clustering

Let us assume that we have five elements and the following matrix of pairwise distances between them:

a b c d e
a 0 17 21 31 23
b 17 0 30 34 21
c 21 30 0 28 39
d 31 34 28 0 43
e 23 21 39 43 0
30.7
34.0
39.3
45.3
42.0

For each element , we calculate  :

    (where )
    (where and )

For example:

and so on for , , and .

First step[edit]

  • First joining

We calculate the values of the matrix:

For example, for element :

We obtain the following values for the matrix (the diagonal elements of the matrix are not used and are omitted here):

a b c d e
a −47.7 −49.0 −45.0 −49.7
b −47.7 −43.3 −45.3 −55.0
c −49.0 −43.3 −56.7 −42.3
d −45.0 −45.3 −56.7 −44.3
e −49.7 −55.0 −42.3 −44.3

In the example above, . This is the smallest value of , so we join elements and .

  • First branch length estimation

Let denote the new node. By equation (2), above, the branches joining and to then have lengths:

  • First distance matrix update

We then proceed to update the initial distance matrix into a new distance matrix (see below), reduced in size by one row and one column because of the joining of with into their neighbor . Using equation (3) above, we compute the distance from to each of the other nodes besides and . In this case, we obtain:

The resulting distance matrix is:

u c d e
u 0 7 7 6
c 7 0 8 7
d 7 8 0 3
e 6 7 3 0

Bold values in correspond to the newly calculated distances, whereas italicized values are not affected by the matrix update as they correspond to distances between elements not involved in the first joining of taxa.

Second step[edit]

  • Second joining

The corresponding matrix is:

u c d e
u −28 −24 −24
c −28 −24 −24
d −24 −24 −28
e −24 −24 −28

We may choose either to join and , or to join and ; both pairs have the minimal value of , and either choice leads to the same result. For concreteness, let us join and and call the new node .

  • Second branch length estimation

The lengths of the branches joining and to can be calculated:

The joining of the elements and the branch length calculation help drawing the neighbor joining tree as shown in the figure.

  • Second distance matrix update

The updated distance matrix for the remaining 3 nodes, , , and , is now computed:

v d e
v 0 4 3
d 4 0 3
e 3 3 0

Final step[edit]

The tree topology is fully resolved at this point. However, for clarity, we can calculate the matrix. For example:

v d e
v −10 −10
d −10 −10
e −10 −10

For concreteness, let us join and and call the last node . The lengths of the three remaining branches can be calculated:

The neighbor joining tree is now complete, as shown in the figure.

Conclusion: additive distances[edit]

This example represents an idealized case: note that if we move from any taxon to any other along the branches of the tree, and sum the lengths of the branches traversed, the result is equal to the distance between those taxa in the input distance matrix. For example, going from to we have . A distance matrix whose distances agree in this way with some tree is said to be 'additive', a property which is rare in practice. Nonetheless it is important to note that, given an additive distance matrix as input, neighbor joining is guaranteed to find the tree whose distances between taxa agree with it.

  1. ^ Erdmann VA, Wolters J (1986). "Collection of published 5S, 5.8S and 4.5S ribosomal RNA sequences". Nucleic Acids Research. 14 Suppl (Suppl): r1-59. PMC 341310. PMID 2422630.
  2. ^ Olsen GJ (1988). "Phylogenetic analysis using ribosomal RNA". Methods in Enzymology. 164: 793–812. PMID 3241556.