Talk:Skip list

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science articles

Mid

This article has been rated as Mid-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Adversarial User[edit]

Why is the author concerned about 'adversarial users' messing with his data structures? Security through obscurity is not very secure anyway. Doesn't this discussion of security of data structures belong in a separate article?

The "adversarial user" is a hypothetic user (think, evil demon), that knows exactly the order in which to execute operations on a data structure that would incur the worst possible run time of the execution of those operations. It is a common technique in algorithm analysis to think of your algorithm working against such an adversarial user. Sancho McCann 22:00, 10 February 2007 (UTC)[reply]

See Adversary_(online_algorithm) and Competitive_analysis_(online_algorithm). Sancho McCann 01:53, 11 February 2007 (UTC)[reply]

Adversarial User Revisited[edit]

NO! The author is not discussing data security here and has made no mention of data security. The author is discussing the possibility that a user might, accidentally or otherwise, create a worst-case scenario for the data structure's performance. The "adversarial user" is a mnemonic mechanism for the programmer to keep in mind as he programs or designs an algorithm. This is a performance issue and not a security issue as you have misconstrued. Note "worst possible run-time" in the article text.

The above links didn't really help understand why randomizing the indexing would have any beneficial effects. As far as I can see, the randomization only serves to eliminate predictable performance, instead of affecting worst-case complexity. That is to say, the point of the randomization seems to be to vary the worst-case use case such that this data structure is not always inferior to a competing one given that use case, at the cost of also varying best-case use case performance. Did I misunderstand something, or is the point really to just flatten the performance variance probabilistically? 82.203.161.17 (talk) 15:11, 4 October 2007 (UTC)[reply]

Having now read the original article, it seems that that is indeed the case. However, the original motivation for random assignment of the level j nodes is avoiding a costly "rebalancing" step, and the fact that no set input sequence consistently produces worst-case behavior is just a byproduct. 82.203.161.17 (talk) 12:27, 6 February 2008 (UTC)[reply]

Adversarial User Revisited, Part II[edit]

We're still touting the advantages of randomness to combat adversarial users. I think Wikipedia is a perfect example: how many vandals spend their time making minor edits to an article to stealthily vandalize it, and how many just blank the page? Ensuring that the underlying data structure is obscure seems rather pointless when said 'adversarial' user can simply remove the entire list, or fill it with meaningless values. There are other reasons for randomization (covered above); I believe the article should be rewritten to de-emphasize or remove the 'adversarial user' explanations. (As an aside, inserting randomness does not improve the worst possible case—there is still some sequence of operations that will cripple the data structure—so it seems primarily a security-through-obscurity measure, not a performance "guarantee".) – 74 15:35, 12 February 2009 (UTC)[reply]

Other people have already explained that this reasoning has nothing to do with security through obscurity, but rather making sure that there is no worst case for performance in order to avoid this worst case ever happening accidentally. By making sure that your data structures would behave efficiently even in when faced with a user that knows exactly which operations to perform to display the worst case, you make sure that it is impossible to accidentally write an access pattern that would exhibit poor performance. As others have said, this is a standard technique in data structure analysis. Please don't sabotage this article just because you don't understand it. Yogi de (talk) 01:47, 3 August 2009 (UTC)[reply]

Style[edit]

Wikipedia's manual of style has a section on writing in mathematics articles that I think applies here (see Wikipedia:Manual_of_Style_(mathematics)#Writing_style_in_mathematics). If we could remove the conversational style and have the article read more like an encyclopedia rather than a textbook, it would be an improvement to this article. Sancho McCann 22:02, 10 February 2007 (UTC)[reply]

I attempted to remove the conversational tone from the article 192.17.161.202 23:38, 17 October 2007 (UTC)[reply]

ICS 23[edit]

The part about Jacobson could use a rewrite.

Skip Lists with two pointers[edit]

I wonder if it's possible to implement these cute Skip Lists using only TWO fields to navigate in it, i.e. "below" and "next", instead of the usually suggested FOUR including "above" and "before"...? However, I'm not sure how to handle the tower construction if one can't get levels up and nodes before a particular node. How about removing a node? Again, all the tower nodes need to be isolated, isn't that so? How do we do this, using only a "next" and a "below" fields? Could any expert comment on this?

Epsilon1 20:34, 1 April 2007 (UTC)[reply]

Read the original paper; N pointers are required for each node, all pointing "forward" different amounts in the list. No "backward" or "up" pointers are used or needed. If for some reason one wanted to traverse backwards in the list, "backward" pointers would solve that problem. However, since this behavior is not usually needed, it's a lot of storage to use on something that won't be used a lot.

Yes, the original paper (William Pugh 1990) never used "backward" or "up" pointers. However, it claims that "Skip lists are also very space efficient. They can easily be configured to require an average of 1 1/3 pointers per element (or even less)".

But that paper gives an example where the occasional node requires 16 pointers; the simplest way to implement that is to give every node a fixed 16 pointers and make the unused pointers point to NIL. Each node uses an average of slightly less than 2 pointers out of those 16.

Is it possible to make a data structure something like a skip list, except each node has a fixed size with exactly 2 pointers? (If not, how about exactly 3 pointers?). I.e., is it possible to take the stack of 16 pointers that, in a skip list, are all placed in a single "tall" node, and somehow spread those 16 pointers over the following nodes -- "short" nodes that, in a skip list, would have only 1 pointer? --68.0.124.33 (talk) 12:35, 23 March 2009 (UTC)[reply]

You could modify the structure such that the furthest pointer appears in the first node, the second furthest in the second node, etc. Then, instead of walking down the list of advance pointers in one node, you'd walk down the list of nodes checking the advance pointer for each. Your performance would suffer since this is effectively indirect addressing the advance pointers, and modification of the list would become more complex. I suspect the increased simplicity of the node structure is overshadowed by the decreased performance and increased complexity of modification. – 74 13:39, 23 March 2009 (UTC)[reply]

Removed a reference that disputes the quote in the History section[edit]

I removed the linked page because it is based on a bad implementation and bad information. Nowhere does it say what the skip average was and why it would use a height of 24 nodes or how these were allocated. A height of 24 for 6 million entries is ridiculous. For that many items, a height of 8 is more than sufficient. Depending on the allocation scheme, this will greatly reduce the extra memory and reduce the amount of comparisons done. I've actually tested skip lists extensively and can tell you that in C++ at least, skip lists are exactly on par and occasionally faster with the map template for speed and use about the same amount of memory. Skip lists memory usage and speed can be tweaked according to your need and can reproduce a map quite easily. -- V —Preceding unsigned comment added by 74.106.242.11 (talk) 01:17, 27 September 2007 (UTC)[reply]

The paper "Skip Lists: A Probabilistic Alternative to Balanced Trees" by William Pugh 1990 says "If p = 1/2, using MaxLevel = 16 is appropriate for data structures containing up to 2^16 elements." It seems reasonable to me to extrapolate that perhaps MaxLevel=23 would be appropriate for nearly 2^23 elements. But "V" claims a lower height of 8 is adequate for nearly 2^23 entries. So do you use a different constant "p" value, or a varying "p" value, or is Pugh excessively conservative? --68.0.124.33 (talk) 13:01, 23 March 2009 (UTC)[reply]

Yeah, p=1/2 should never be used. You're wasting time and space for nothing. p=1/8 is just as fast, actually slightly faster, and requires less memory at 8 levels. If you need an extra ~5-10% in performance, I'd use p=1/4 or something like that which gives 12 levels. Think about it, with 24 levels at p=1/2, you need to look ahead 1 or 2 nodes before going down. For two levels, 2 to 4 steps. With 12 levels (half as many as before) at p=1/4, you need to look ahead 1 to 4 nodes before going down. p=1/4 has a smaller lower bound. But for p=1/8 (8 levels), you'd have 1 to 8 steps vs. 3 to 6 steps. Average case is still slightly better for p=1/8, so it's about the same. It gets worse for higher p. But there's no reason to use p=1/2. -- V 99.251.236.178 (talk) 18:35, 18 October 2010 (UTC)[reply]

Incomprehensible lead section[edit]

I am a very intelligent native speaker of the English language, and the lead section of this article is completely incomprehensible to me. Highly technical language may be appropriate in later sections of an article on a highly technical subject, but the lead section of an article on ANY subject should be comprehensible to a person who is not an expert in the field, without having to follow links to other articles that may be no more comprehensible than it is. This one fails completely. I am adding a well deserved {{Technical}} to this article, with particular emphasis on the lead section.--Jim10701 (talk) 09:25, 26 October 2011 (UTC)[reply]

The picture doesn't do it for you?99.94.144.123 (talk) 00:53, 19 December 2011 (UTC)David[reply]

Deterministic fast links[edit]

The article is concentrated on skip list with randomized fast links. Does it make sense to add some analysis of list with deterministic logic for fast links? I.e. case where for each node the topmost link points to a node in the middle between current one and the tail. The next one points to the middle between current and above fast link, etc.? Such structure provides more consistent and predictable result in data access. — Preceding unsigned comment added by Azemerov (talk • contribs) 19:35, 28 February 2012 (UTC)[reply]

The issue with such a construction is that you need to add balancing mechanisms to make it work when you insert. It has been done, but is still fairly complicated. — Preceding unsigned comment added by Redmjoel (talk • contribs) 00:20, 14 May 2013 (UTC)[reply]

Gif animation[edit]

I've made an animation to show the process of element inserting to skip list. Do you have any comments for animation? Do you have any objections not to include animation to article? Artyom Kalinin (talk) 06:32, 10 December 2013 (UTC)[reply]

Artyom, сould you replace the Russian word "уровни" by "levels"? -- Andrew Krizhanovsky (talk) 20:22, 20 December 2013 (UTC)[reply]

Fixed. Artyom Kalinin (talk) 17:45, 21 December 2013 (UTC)[reply]

Indexable skip list diagram[edit]

Shouldn't the widths of the head nodes in the diagram for indexable skip list be 0 instead of 1? (Or not set at all?) Gvanrossum (talk) 16:27, 31 March 2014 (UTC)[reply]

Is skiplist a probabilistic data structure?[edit]

Though the paper claims it is a probabilistic data structure, it actually isn't! A probabilistic data structure does not keep the data itself and therefore when queried, replies will have some error rate. A skip-list, on the other hand, uses a random function for the structure itself but all data is kept and since there is no data loss, there is no error-rate. I suggest removing the probabilistic tag! Kigelim (talk) 21:33, 6 January 2020 (UTC)[reply]

I agree that most of the data structures listed in Category:Probabilistic data structures are "lossy", while skip lists keep all the data.

I feel that the answer to your question, "Is skiplist a probabilistic data structure?", depends on the definition of "probabilistic data structure".

Do you know of a reference we could use in Wikipedia that defines "probabilistic data structure"?

Perhaps Talk:Randomized algorithm#probabilistic algorithm is a better place for this discussion. --DavidCary (talk) 23:00, 28 December 2020 (UTC)[reply]

Implementation details needs to actually have implementation details[edit]

Currently the article only has information about 'search'. The only sentence about 'erase' and 'insert' is

"Insertions and deletions are implemented much like the corresponding linked-list operations, except that "tall" elements must be inserted into or deleted from more than one linked list."

Then the other 90% of that section is about the quasirandom variant, which is incomprehensible without an explanation of the standard random implementation. — Preceding unsigned comment added by 73.42.179.192 (talk) 03:47, 11 January 2021 (UTC)[reply]