Talk:Vector clock

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

German version[edit]

It'd be great if somebody who speaks German could try and synchronise the two articles, the German one seems much more complete.

attempt for translation[edit]

Probably not the best possible translation (I was strugling a bit with some of the technical terms), but please find below an attempt to translate the german version. Please feel free to improve the wording and merge/move the text below to the actual article.



A vector clock is a software component (or a protocol) to assign unique timestamps to messages. Therefore it is a logical clock that allows to put the events in a distributed system in a causal order based on their time stamps (bring into order). Additionally it allows to identify the concurrency of events. It represents the generalisation of Lamport-Clocks which also adheres to the strong consistency criteria of clocks. Vector clocks have been developed independently by different scientists, in particular Colin J. Fridge, Friedemann Mattern and Frank Bernhard Schmuck.


Example for a system of vector clocks

Content 1 Mechanics 2 Partial order 2.1 Concurrency 3 Literature

Mechanics The approach to operate vectors clocks is as follows: Similar to the Lamport-Clock every process keeps a counter which is incremented with every event (especially when sending or receiving messages). Unlike Lampert-Clocks the clock of every process consists not only of a single counter but of a vector (or an array / associative list) of counters: every keeps track of the counter value of all other processes (to the extent these are known). The current value of the clock is attached to every emitted message. For all events, only the own counter of the process is incremented. When a message is received, the element-wise maximum is used to build the new value of the clock.

The following pseudo code shows the implementation of a vector clock when sending a message: clock[PID] = clock[PID]+1; timestamp = clock; send(message, timestamp); The PID is the predefined unique designator of the process – such as the process ID, the IP adress or a combination of both. Fields for which no value was received with prior messages, the value is set to be null.

The following pseudo code show the implementation when receiving a message: (message, timestamp)= receive(); clock[PID]= clock[PID]+1;

for (process P) do begin

   clock[P]= max(clock[P],timestamp[P]);

end; Partial Order In order to determine which messages (or which events) are causally dependent from others, a relation of partial order is defined over the values of vector clocks. An event A is the cause of event B when the counter of every field of A is less or equal to the respective field of B. Additionally at least on field of A must be less than the respective field of B. [… formula …] As the implication is valid in both directions for vector clocks, they fulfill the strong consistency criteria of clocks. An implementation of the relation of partial order stated above in pseude code is: procedure

procedure is_cause_of(A,B):

   at_least_one_element_is_strictly_smaller = NO;
   for (process P) do begin
       if ( A[P] > B[P] ) then return NO;
       if ( A[P] < B[P] ) then at_least_one_element_is_strictly_smaller := YES;
   end;
    
   return at_least_one_element_is_strictly_smaller;

end procedure; Concurrency If neither A-->B nor B-->A is the case, then the procedure above will yield „NO“ for both is_cause_of(A,B) is_cause_of(B,A) and are thuse concurrent – also written as A || B. It is the key advantage of vector clocks compared to Lamport-clocks they it is possible to determine which events occured concurrent to which others. This is caused by the strong consistency criteria of clocks. It needs to be taken into account, that in contrary to the cause relation, concurrent is not transitive. Literature C. J. Fidge, Timestamps in message-passing systems that preserve the partial ordering, In K. Raymond, editor, Proc. of the 11th Australian Computer Science Conference (ACSC'88), pp. 56–66, February 1988. Reinhard Schwarz und Friedemann Mattern, Detecting Causal Relationships in Distributed Computations: In Search of the Holy Grail (PDF; 278 kB), in Distributed Computing, Nr. 3 Vol. 7, Springer 1994. — Preceding unsigned comment added by 82.217.24.101 (talk) 21:32, 27 January 2015 (UTC)[reply]

Lamport Timestamps article says "[timestamps] provide a starting point for the more advanced Vector clock method."[edit]

but article doesn't say in what way these are more advanced, nor what there advantages are relative to plain timestamps. Could someone elaborate? Thanks 92.29.175.136 (talk) 19:24, 10 January 2010 (UTC)[reply]

+1 please elaborate on that point —Preceding unsigned comment added by 90.80.39.42 (talk) 12:45, 5 October 2010 (UTC)[reply]

Lamport Timestamps seem to provide you with an ordered sequence that will tell you in which order the system processing the messages/events has done so. What the Lamport Timestamps will not tell you is the causal chain that resulted in the interaction from multiple systems. This is what I understand the Vector clock advantage to be: It gives you a causal sequence of messages/events accross several systems, instead of only having the sequence of events it emitted. Strictly speaking, neither algorithm actually provides a causal chain, but rather temporal order. 109.193.70.243 (talk)

Request[edit]

What does the first line of sending a logical clock mean? Is that the current process's clock that says how many events its seen from every other process? — Preceding unsigned comment added by 130.58.68.214 (talk) 17:28, 31 October 2014 (UTC)[reply]

Could someone please rephrase Each time a process receives a message, it increments its own logical clock in the vector by one and updates each element in its vector by taking the maximum of the value in its own vector clock and the value in the vector in the received message (for every element). - I find it confusing and perhaps in conflict with how I interpret the SVG diagram. Alecmuffett (talk) 21:11, 1 January 2013 (UTC)[reply]

It is probably better to say "every time it receives a message, it increments the field of its own process ID (its own logical clock) in the vector by one ..."

What do the boxes in the diagram mean?[edit]

Each box in the figure contains three values (sometimes blank). What do those values actually mean? If the caption to the figure explained that, I think it could aid understanding. Thanks! 103.1.70.133 (talk) 01:22, 27 July 2013 (UTC)[reply]

The box is the vector clock per process — Preceding unsigned comment added by 96.245.110.81 (talk) 00:06, 27 March 2014 (UTC)[reply]

Logical not binds stronger than "less than"[edit]

By convention it is the case that unary operators bind stronger than binary operators, thus where it says VC(x) < VC(y) <=> ~VC(y) < VC(x), it reads as if the author tried to negate a vector of integers to the utter amusement of the public. 185.110.110.45 (talk) 15:25, 9 October 2017 (UTC)[reply]

A compatible approach[edit]

May I bring my posts to your attention: DAGs, Ledgers and Distributed Ledger Technology and from Vector Clock to Distributed Ledger.
I would be thankful for your feedback. Especially, whether it is worth the effort to put the idea in a formal paper.

--Enis.olgac (talk) 08:14, 9 December 2018 (UTC)Enis Olgac[reply]

Recent revert of Bloom clocks[edit]

I don't understand, what is wrong with my edit? Sure, the original edit was perhaps self-promotional, but I completely rewrote it. At this point we would delete half of the content on Wikipedia because it is quoting people who are not objective. Mathnerd314159 (talk) 17:04, 17 June 2023 (UTC)[reply]

@Mathnerd314159: Thank you for your contribution, and I apologize for the revert! My intention is not to start an edit war. Let's bring the discussion here. A few data points:

  • If you check the history, you will see that the reference is added by a user Lumramabaja (talk · contribs) -- sharing the name of the sole author of the paper. A strong red flag and indicator of a self-promotional addition.
  • This paper is posted only an arXiv, not in a reputable conference in systems or data structures.
  • With only 6 citations, this isn't a very established reference, and there are thousands of papers about vector clocks -- is this really among the top 4-5 most important references to mention here?

I don't doubt that Bloom Clocks are an interesting research contribution -- but the question is what are the most canonical references. Which requires a deep/detailed literature search. Caleb Stanford (talk) 17:04, 17 June 2023 (UTC)[reply]

In reverse order:
First, the section is a list of alternative versions of vector clocks. All of the clocks in that section are somewhat rare and under-referenced. That is the point - in the absence of a reliable "Handbook of vector clocks" reference or similar which sorts through the various alternatives and picks out the good ones, the list on Wikipedia is most useful when it is as complete as possible. As far as "thousands of papers about vector clocks", I guess there are a number of variants (matrix clocks, plausible clocks, resettable clocks) , but I only see a few pages of Google Scholar results (~70 results) before it starts getting off-topic. If there really was that much to say about vector clocks, I think this would be a much more popular article. But if you actually did find and add enough material to bring this article up from the 1.3k bytes it is now to something sizable like 40k or 50k I suppose you would have a point.
As far as ArXiv, it is defined as containing self-published material, but self-published materials is allowed under WP:SPS so long as it is a subject matter expert. I guess from his blog, he is a master's student, maybe borderline, but the citations are peer-reviewed and actually treat the material quite substantially. For example this conference paper (author version) was reviewed by the conference editors and essentially restates the whole Bloom clock construction, and Ajay Kshemkalyani is a full professor at UIC.
As far as history, are you saying that experts are not allowed to edit articles (as seems to be a common criticism of Wikipedia, WP:SCUM)? WP:SELFCITE says an expert citing material they have written or published is "allowed within reason". Mathnerd314159 (talk) 17:50, 17 June 2023 (UTC)[reply]
@Caleb Stanford: Mathnerd314159 (talk) 20:14, 17 June 2023 (UTC)[reply]
Thanks for weighing in here. The state of the article itself isn't so relevant -- the state of the literature is. And that state is (as you found above) there are are a lot of papers related to vector clocks! To prove your point, what would be needed from you would be a convincing argument that these specific four papers are the canonical references on follow up work to vector clocks.
An author of a new, non-peer-reviewed result editing to add their own paper is absolutely an abuse of the site. If everyone did that, most research-related pages would be full of self-promotion of every researcher's latest paper (and in fact, this is a huge problem...) As to the specific policies, I think you're cherry-picking a bit. By my reading it violates all of WP:SELFCITE, WP:PROMO, and WP:SPS due to the specific clauses in those policies. Caleb Stanford (talk) 04:39, 18 June 2023 (UTC)[reply]
I don't think you are right that I need a "convincing argument that these specific four papers are the canonical references". Per WP:NPOV the simple goal is to present "all the significant views that have been published by reliable sources on a topic." These have been published, they're reliable, therefore they should be presented. It's really that simple. I guess you will say that I am cherry-picking the policy again, but I really don't know what to say since it's literally the first paragraph. Regarding the other policies, there are plenty of arXiV sources on Wikipedia, see [1], I don't know what your "reading" of WP:SPS is but it's clearly not accurate.
As far as the SELFCITE/PROMO, they don't really apply anymore because I'm the one advocating the edit (and I'm not connected to the author), I just was pointing out that as a general principle self-citation is allowed (that was why I didn't delete it many months ago). Mathnerd314159 (talk) 19:29, 19 June 2023 (UTC)[reply]
> the simple goal is to present "all the significant views that have been published by reliable sources on a topic."
I agree with this goal but I am not sure whether the paper meets the bar for "significant". By canonical I mean that the reference is significant enough to list on a general encyclopedia article about vector clocks.
> These have been published, they're reliable, therefore they should be presented.
You omitted the "significant" criteria.
For "reliable", WP:SPS clearly requires the article to be "produced by an established subject-matter expert" and though the article is interesting, the author does not appear to be an established subject-matter expert.
There might be a policy that allows us to use the Kshemkalyani articles to justify including the self-published source despite literally speaking violating this clause.
There are only 6 citations which is a bit borderline.
> I'm the one advocating the edit
I agree. Therefore, the argument about the original bad-faith edit is not relevant.
We're probably not converging here -- if you still disagree, please solicit WP:3O. Caleb Stanford (talk) 04:12, 21 June 2023 (UTC)[reply]
I thought about wp:3o but I don't think we really need it, because the way I see it is that we're coming into agreement. You've agreed that the Kshemkalyani articles are reliable. So presumably simply removing the arxiv cite and replacing it with the Kshemkalyani 2020 paper would satisfy your concerns with regard to sourcing. I generally include primary sources under wp:aboutself, just because secondary sources often get dates and citations wrong and having a direct link in the article is handy, but I suppose it could be left out in the interest of sticking to secondary sources. (sorry about forgetting about aboutself, i just sort remember the gist of the guidelines and what's allowed and not really which guideline allows what)
So the next question is whether it is worth including the discussion of Bloom clocks at all. I argued previously that the article is short (1.3k bytes of readable prose) because there is simply not much to say about vector clocks, so padding it out with any kind of relevant information is desirable. Maybe you disagree, but in that case the whole section or at least the item on chain clocks should probably be deleted, as the other papers also have relatively few citations (41,9,26). It's worth noting wp:not paper that there are not really any considerations of significance beyond notability of the general topic and article length. I am not sure what exactly "significant" refers to in the npov policy but Wikipedia:WHATSIGCOV and WP:SIGNIFICANT imply that, as long as there are reliable sources on the topic, it is generally reasonable to include. It seems from perusing the npov policy that its concerns about undue weight and minority viewpoints are mainly about proven false theories or theories that question basic tenets of science, such as pseudoscience, rather than theories (or in this case new data structures) that are on the bleeding edge and have insufficient information to determine usefulness or importance. Mathnerd314159 (talk) 16:57, 21 June 2023 (UTC)[reply]
I'm not 100% in agreement, but perhaps it's OK for now. I guess a final question for you: about how many bullet points do you think should be listed under "Other mechanisms"? Basically, my concern re "significance" (which I also do not know the formal qualification for, only got a lot of red flags from this particular case :) ) is that we should make this a canonical and useful list of follow-up works, not just a big list of papers which have proposed alternative ideas (WP:NOTDIR). Caleb Stanford (talk) 22:35, 21 June 2023 (UTC)[reply]
I guess you are asking about selection criteria, WP:CSC? I was aiming for a "short, complete list of every item that is verifiably a member of the group." I think by the time there are more than about 10 or 20, some CS professor will write a review of the various vector clock variants, and then the whole list can just be replaced with a summary of that article, "there are more than 10 vector variants. Variant A is good for application A, variant B is useless" and so on. Mathnerd314159 (talk) 00:21, 22 June 2023 (UTC)[reply]