Talk:Tarjan's strongly connected components algorithm

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Pseudocode is not good[edit]

In this reference web.cecs.pdx.edu we see that TarjanDfs is called not only for first node v0, but all nodes that are not visited. This should be fixed, I am not expert, but here is suggestion.

index = 0                       // DFS node number counter 
S = empty                       // An empty stack of nodes
for each vertex v
 if v.index is undefined
  tarjan(v)                      // Start a DFS at the non-visited node

--211.25.51.200 (talk) 07:05, 23 April 2008 (UTC)[reply]

  • This has been corrected in the article. No longer an issue.MattGiuca (talk) 05:38, 8 February 2011 (UTC)[reply]

Bug in second subcase[edit]

Seems to me that the second update case is wrong:

Instead of

   if (v'.index is undefined)  // Was successor v' visited? 
     tarjan(v')                // Recurse
     v.lowlink = min(v.lowlink, v'.lowlink)
   elseif (v' in S)            // Is v' on the stack?
     v.lowlink = min(v.lowlink, v'.lowlink)

shouldn't it be

   if (v'.index is undefined)  // Was successor v' visited? 
     tarjan(v')                // Recurse
     v.lowlink = min(v.lowlink, v'.lowlink)
   elseif (v' in S)            // Is v' on the stack?
     v.lowlink = min(v.lowlink, v'.index)

--128.9.216.61 (talk) 19:11, 6 January 2009 (UTC)[reply]

Both versions are correct, since v'.lowlink is the same as v'.index for the root of a strongly connected component. Therefore, I suggest to revert this change to the previous version, where the update is uniform in both branches. —Preceding unsigned comment added by 192.35.241.121 (talk) 17:55, 4 August 2009 (UTC)[reply]

I agree that both versions give equivalent results. Note that v'.index is not always the same as v'.lowlink (as v' is not always a root). But it does always get the same result, as you have for both approaches identified another node with a smaller index than this one which is in the same SCC, and set your lowlink to the index of another node in the same SCC (it doesn't matter which, as long as the root and only the root node's lowlink is equal to its index). In any case, I consulted Tarjan's original paper, and he uses index in the latter branch (as 128.9.216.61 changed it to) -- even though I think this is poorer style, we should stick with the original algorithm. Not an issue.MattGiuca (talk) 06:22, 8 February 2011 (UTC)[reply]
I noticed this quirk and wanted to comment. While I agree that strictly speaking the algorithm is correct, if someone wants to make a very tiny change in the end use of the algorithm this is a crippling bug. SCC's are a disjoint set, so a useful operation is to check whether two arbitrary nodes are in the same SCC. If you change it to v.lowlink, then I believe that two nodes are in the same SCC iff they have the same lowlink value. Ultimately the goal should be to inform the reader about this algorithm; should we restrict our information based on the original publication? Surely improvements or knowledge discovered since original publication are worth mentioning? I think this is worth mentioning in the notes section.Nirf (talk) 00:23, 1 February 2013 (UTC)[reply]
The first is correct. The second leads to a bug where a node that is higher up in the chain pointing to an already established set of connected components to not be included as part of the stack pop. The following graph shows this error. 2 should be include as a strongly connected component but it is not because lowlink of 2 is 1 and index of 5 is 3; however lowlink of 5 is 0. In the first case 1 is the min leading to 2 being poped by it self. second case 2 gets a lowlink of 0 and pops all the elements off the stack
   input:
   0->1;0->2
   1->3;1->4
   2->5
   4->5
   5->6;5->7;5->0
   6->4
   7->5
   Output:
   3
   2
   014567
No, I just tried out this case; both versions of code give the same output. MaigoAkisame (talk) 17:35, 4 January 2016 (UTC)[reply]
This was changed once more, and I reverted the change. If someone wants to have w.lowlink in that line they should provide a REVIEWED reference proving that it works, and indicate that the original differed. Someone looking for Tarjan's algorithm on Wikipedia is likely looking for his algorithm - not some other algorithm that may or may not be correct. Assuming the modified algorithm was correct and faster (neither is clear) - how would you compare them: stating that Tarjan's algorithm is better than Tarjan's algorithm?
The 2015 paper linked from the Complexity section does basically v.lowlink=min(v.lowlink,w.lowlink) in Algorithm 3 PEA_FIND_SCC2(V,E). Perhaps we could have a section discussing variations to the algorithm? --Thomasda (talk) 10:20, 27 May 2020 (UTC)[reply]

195.190.84.155 (talk) 12:04, 11 April 2017 (UTC)[reply]

Use v.lowlink = min(v.lowlink, w.index) give lowlink a more meaningful definition: it is the node with the smallest index v can reach with only one back edge. If min(v.lowlink, w.lowlink) is used, the lowlink will loss its meaning (for clarity, let's call this definition lowlink'). If w is a parent of v in the DFS tree, its lowlink' is not finalized, and could still change. This means you can not just define v.lowlink' as the smallest index v can reach, because that is not necessarily true.
For example:
   Input:
   0->1
   1->2
   2->1
   1->3
   3->0
If 2 is reached before 3, its lowlink' will be 1, which is not the lowest node it can reach (which is 0, because 2->1->3->0). 82.34.54.1 (talk) 19:38, 21 August 2018 (UTC)[reply]

Algorithm is still broken[edit]

There are two issues :

1. The conditions on the branches (below) are strictly opposite:

    if (v'.index is undefined)                    // Was successor v' visited?
        tarjan(v')                                // Recurse
        v.lowlink = min(v.lowlink, v'.lowlink)
    else if (v' is in S)                          // Was successor v' in stack S? 
        v.lowlink = min(v.lowlink, v'.index )     //v' is in stack but it isn't in the dfs tree

that is because the only region where we modify .index and push into S is the following:

  v.index = index                                 // Set the depth index for v
  v.lowlink = index
  index = index + 1
  S.push(v)

Which prooves that outside this region the following holds:

bool(v'.index is undefined) != bool(v' is in S)

Therefore in *any* case exactly one branch will be executed and therefore the whole block could be simplyfied by:

    if (v'.index is undefined)                    // Was successor v' visited?
        tarjan(v')                                // Recurse
    v.lowlink = min(v.lowlink, v'.lowlink)
  • The above analysis is incorrect. You are assuming that nothing ever gets popped off the stack. Note that nodes which are identified as SCC roots are popped off the stack during the algorithm. Therefore, once a complete SCC has been identified, all of the nodes in that SCC are a) already visited (have an index), and b) not on the stack. Thus, there is a third (implicit) "else" branch which does nothing -- that is for the case where a node's successor is already in an SCC which is already complete, and therefore it cannot possibly include the current node, so it is safe to ignore it. Do not make the above change. —MattGiuca (talk) 06:30, 8 February 2011 (UTC)[reply]

2. The SCC printed at the end is incorrect because algorithm never pops out nodes. Imagine the SCC is found for the last sucessor of v. In that case *every* sucessor will be printed, which is clearly not correct. Anonymous - 18:16, 16 june 2010 —Preceding unsigned comment added by 145.248.195.1 (talk)

  • Again, this misses the point that nodes are popped during the algorithm, not at the end. There is not just a single SCC identified; this identifies all of them. This does correctly pop nodes once an SCC is found (and as far as I can tell did also at the date of the comment). —MattGiuca (talk) 06:30, 8 February 2011 (UTC)[reply]

Algorithm fixed[edit]

I've fixed the algorithm, it should make much more sense now. I hope there aren't any more bugs. 70.68.114.213 (talk) 02:40, 31 January 2009 (UTC)[reply]

Node with no successor[edit]

Isn't there an issue for node that have no successor ? In that case, the main procedure can be seen like :

  v.index = index                 // Set the depth index for v
  v.lowlink = index
  index = index + 1
  S.push(v)                       // Push v on the stack
  if (v.lowlink == v.index)       // Is v the root of an SCC?
    print "SCC:"
    repeat
      v' = S.pop
      print v'
    until (v' == v)

(The "forall (v, v') loop is not executed). It results in such a node to be always reported as strongly connected root, while in my understanding it can't be. —Preceding unsigned comment added by 90.80.39.42 (talk) 10:50, 10 December 2009 (UTC)[reply]

If v'.index undefined then call Tarjan, then v'.index = v'.lowlink = 0; they cannot change since there will not be any lowlink < 0. So the first vertex will always result to belong to the strongly connected component. The algorithm is not fixed :-(

Ok, I'll work on it and I'll give you the right pseudocode. You make me waste my time. Jimifiki (talk) 12:17, 21 March 2010 (UTC) —Preceding unsigned comment added by 89.226.101.99 (talk) 11:26, 21 March 2010 (UTC)[reply]

  • The algorithm is correct in this regard. A node without a successor is always a SCC root. By definition, it forms its own single-node SCC, because it is not part of any loops. (In other words, it is like a leaf of a tree.) As for "the first vertex will always result to belong to the strongly connected component" -- yes, this is also true. All vertices belong to some SCC, it is just a matter of which. The first vertex is always the root of some SCC. The root of an SCC is defined as the node with the lowest index in the SCC, so the node with index 0 will always be the root of whichever SCC it ends up inside of. None of the above comments are issues with the algorithmMattGiuca (talk) 06:35, 8 February 2011 (UTC)[reply]

Algorithm Fixed[edit]

Ok, I worked on it!! In my opinion the pseudocode is wrong. The right order when calling Tarjan is:

v.index = index;
index = index + 1;
v.lowindex = index;

then the law to update v.lowindex is

v.lowindex <- min(v'.lowindex, v'.index, v.lowindex);

What you have as result is that

  1. All the vertices such that v.index >= v.lowindex belong to the Scc,
  2. All the vertices such that v.index = v.lowindex are root of a Scc
  3. All the vertices such that v.index < v.lowindex belong to the tree part of the graph.

I hope my modifications respect the original Tarjan algorithm, please proove all my claims before copying them on the main page.

Remark that my modifications answer to the Node-without-successor problem above mentioned :-)

  • I don't see why you should set v.lowindex to v.index + 1. (By lowindex do you mean lowlink?) The lowlink is the index of the lowest-known root in the current node's SCC, which is initially assumed to be this node, not the next node. It is the case that "All the vertices such that v.index = v.lowlink are root of a Scc", but it is not the case that "All the vertices such that v.index >= v.lowlink belong to the Scc" -- rather, I would say that all the vertices with v.index > v.lowlink belong to some SCC, and are not the root of it. There *are* no vertices with v.index < v.lowlink -- the min checks ensure that no vertex has a lowlink greater than its own index. There is also no "tree part of the graph" -- every node belongs to an SCC (though nodes which are not involved in cycles form a single-node SCC). Therefore these modifications are not useful. Not an issueMattGiuca (talk) 06:58, 8 February 2011 (UTC)[reply]

Counterexample[edit]

node size : 12
map : 
0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 
0 0 0 0 0 0 0 0 0 0 1 0 
0 1 1 0 0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 0 0 0 0
1 0 1 0 1 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0
0 1 1 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0

In this case the whole graph is strongly connected

But the result of algorithm is not —Preceding unsigned comment added by 119.202.92.81 (talk) 07:58, 23 October 2010 (UTC)[reply]

  • What output did the algorithm give you? I ported the algorithm to Python and ran it on precisely this input, assuming your adjacency matrix rows are sources and columns are sinks, and naming each vertex after its row/column ID starting from 0. The output I got was a single SCC: "6 3 11 4 7 2 8 1 10 5 9 0". I don't think this is a valid counterexample. Not an issueMattGiuca (talk) 05:34, 8 February 2011 (UTC)[reply]

Status of the algorithm[edit]

After seeing the large number of bug discussions on this page, I thought it would be prudent to put up a dispute-section template on that page. I intend to address each of the above concerns. Can anybody who is more of an expert in the field check that the algorithm is correct so we can remove the box? I have ported it to Python and it seems to be working okay. —MattGiuca (talk) 05:37, 8 February 2011 (UTC)[reply]

  • Well, I just went through every one of the above claims and added a comment there. I don't think any of them present a problem for the algorithm in its current state. However, there are a number of small differences between the algorithm as it currently stands on Wikipedia, and the original Tarjan paper (see below). I don't think any of them actually change the behaviour of the algorithm though.
    • Tarjan surrounds the "was successor on the stack" clause with "ELSE IF NUMBER (w) < NUMBER (v) DO", which in our algorithm would be written as "else if v'.number < v.number". I don't see a reason for this, though. By definition, all nodes which already have an index will have one less than the current node, so I think we can leave this out.
      This is because Tarjan in his paper states that he will only consider (i) edges that are in the search tree, (ii) edges that connect back to some search tree ancestor (fronds), and (iii) edges between sibling branches in the search tree (cross-links); the fourth class of edges that go to an already visited direct descendant are to be ignored, and that is what this test achieves. But it is indeed a pointless test, as the min in that fourth case is just the current lowlink anyway. 130.243.94.123 (talk) 13:47, 9 December 2015 (UTC)[reply]
    • Tarjan's "pop" loop is "WHILE w on top of point stack satisfies NUMBER (w) >= NUMBER (v) DO", which in our algorithm would be written as "while v'.index >= v.index". First, it is a while loop rather than a do-while loop (it is required to succeed on the first iteration). Second, it succeeds for v' == v (hence the do-while loop is not needed). Third, it fails if a node is encountered with an index smaller than the current node. I think you could reason that this accomplishes the same thing as popping all nodes up to and including the current node: the nodes on the stack should always be sorted by index, so popping until a node is found with a smaller index than the current is the same as popping until the current node is found.
    • Therefore, I would consider the Wikipedia version to be functionally equivalent to the original. I will therefore remove the "dispute" box I put up earlier.

For reference, the exact algorithm as listed in Tarjan, Depth-first search and linear graph algorithms:

BEGIN
    INTEGER i;
    PROCEDURE STRONGCONNECT (v);
        BEGIN
            LOWLINK (v) := NUMBER (v) := i := i + 1;
            put v on stack of points;
            FOR w in the adjacency list of v DO
                BEGIN
                    IF w is not yet numbered THEN
                        BEGIN comment (v, w) is a tree arc;
                            STRONGCONNECT (w);
                            LOWLINK (v) := min (LOWLINK (v), LOWLINK (w));
                        END
                    ELSE IF NUMBER (w) < NUMBER (v) DO
                        BEGIN comment (v, w) is a frond or cross-link;
                            if w is on stack of points THEN
                                LOWLINK (v) := min (LOWLINK (v), NUMBER (w));
                        END;
                END;
            If (LOWLINK (v) = NUMBER (v)) THEN
                BEGIN comment v is the root of a component;
                    start new strongly connected component;
                    WHILE w on top of point stack satisfies NUMBER (w) >= NUMBER (v) DO
                        delete w from point stack and put w in current component
                END;
        END;
    i := 0;
    empty stack of points;
    FOR w a vertex IF w is not yet numbered THEN STRONGCONNECT (w);
END;

MattGiuca (talk) 10:36, 8 February 2011 (UTC)[reply]

Where to store "lowlink"[edit]

As I see, we read the parameter "lowlink" only from child nodes and current node and we write it only to current node. Therefore, we can have this parameter as a local variable and return it from "strongconnect". I. e. we can store "lowlink" on the stack for recursive calls (= DFS algorithm stack). In such a way, this algorithm is connected to path-based strong component algorithm. From the stack of lowlinks we can compute the P stack. --Beroal (talk) 14:55, 24 June 2013 (UTC)[reply]

Constant time check for a node (vertex) on the stack[edit]

A recent edit, 01:55, 15 February 2015‎, attempts a technique to determine in constant time if a node is on the stack. The cited original paper by Tarjan mentions, "Testing to see if a vertex is on the point stack can be done in a fixed time if a Boolean array is kept which answers this question for each vertex." The recent edit however, introduces an "isRemoved" value per node, which is the opposite sense of "on the stack." The algorithm may work with this change, but it's difficult to see. It does not "answer the question for each vertex" as Tarjan suggested because the value is meaningless until the node is visited. Much more clear and following Tarjan's suggestion would be an "onStack" value. Further, if all onStack values were initialized to false at the start of the algorithm, then it would be clear that all nodes have valid values. —Fifteen Minutes Of Fame (talk) 04:26, 27 February 2015 (UTC)[reply]

The pseudocode didn't work for me[edit]

I've carefully implemented the pseudocode and it produced incorrect results for me. I found a link above (web.cecs.pdx.edu) that helped me out. It seems that an extra condition is missing here:

      else if (w.onStack) then
        // Successor w is in stack S and hence in the current SCC
        v.lowlink  := min(v.lowlink, w.index)
      end if

This version worked for me:

      else if (w.index < v.index and w.onStack) then
        // Successor w is in stack S and hence in the current SCC
        v.lowlink  := min(v.lowlink, w.index)
      end if

--Kalmankeri (talk) 15:15, 10 March 2015 (UTC)[reply]

Well that would bring the pseudocode back to Tarjan's original. MattGiuca considered this above under Status of the algorithm and reasoned that v, just numbered, would have to be greater than any existing number. But while numbering v happens before looping over successors, it does seems that a recursive call to strongconnect on one successor of v could number a later successor of v. I'm inclined to agree that this looks like a problem but a test case might show it clearly. Do you have an example of a graph that shows this failure? —Fifteen Minutes Of Fame (talk) 15:56, 28 March 2015 (UTC)[reply]
v.lowlink can't be bigger than v.index, so if w.index > v.index nothing will happen anyway. --Igor Yalovecky (talk) 15:52, 3 September 2022 (UTC)[reply]

Relation to Kosaraju's algorithm[edit]

Thinking about the matter a bit more, I find it highly unlikely that Tarjan's algorithm can be derived from Kosaraju's algorithm. Unless anyone can come up with a source for the claim

Although proposed earlier, [Tarjan's algorithm] can be seen as an improved version of Kosaraju's algorithm

then it should be removed. It could simply have been meant as a claim that Tarjan's algorithm is superior to Kosaraju's algorithm, but in that case this needs to be clearer, and such a statement probably shouldn't appear that early in the article. 130.243.94.123 (talk) 13:58, 9 December 2015 (UTC)[reply]

Is there a minimal set of test cases somewhere?[edit]

A set of test cases should include a simple root, an SCC root (no incoming edges), interior SCCs (with and without multiple incoming and or outgoing edges, and a terminal SCC (no outgoing edges). — Preceding unsigned comment added by Encyclopedant (talkcontribs) 03:54, 23 April 2020 (UTC)[reply]

Year of the invention?[edit]

It was published by Tarjan in 1972, but I am pretty sure it was known to him and other before that. Just hard to pin point it. 2A02:168:2000:5B:CC4D:BB9A:938:B537 (talk) 03:59, 25 June 2020 (UTC)[reply]

Links to implementations[edit]

Previously this article listed a number of external links to implementations in various programming languages. Last time in this revision: https://en.wikipedia.org/w/index.php?title=Tarjan%27s_strongly_connected_components_algorithm&direction=prev&oldid=898857841 It is hard to find these on the internet. I know how hard it was to develop the PHP version, for example. Could we either reinstate these links or have a related page about implementations of this Tarjan's algorithm? Or perhaps there are better ideas? — Preceding unsigned comment added by Vacilandois (talkcontribs) 17:29, 6 November 2020 (UTC)[reply]

Rosetta Code has become established as a site for "implementations in various programming languages". I have added a link to their site, which hopefully will not be deleted. I suggest you contribute your implementation there. --Mathnerd314159 (talk) 17:47, 25 February 2022 (UTC)[reply]
I was looking at your code links today, and it struck me that it did not look like the Tarjan algorithm. I think it might be the path-based strong component algorithm, as there is no "point_stack" in Tarjan, like there appears to be in your code. 50.170.44.210 (talk) 21:00, 9 December 2022 (UTC)[reply]

...and pseudocode is still buggy[edit]

For a graph with a single vertex with no edges (i.e. adjacency matrix 0 -> []), fails at w := S.pop(), because stack is empty at that moment. 87.241.189.187 (talk) 04:07, 6 December 2022 (UTC)[reply]

It shouldn't fail, since the single vertex is pushed at line 16 of the algorithm before it is popped. PoleoMenta (talk) 14:25, 21 February 2024 (UTC)[reply]