Talk:Binary search algorithm/GA1

GA Review[edit]

Article (edit | visual edit | history) · Article talk (edit | history) · Watch

Reviewer: David Eppstein (talk · contribs) 05:32, 27 March 2016 (UTC)[reply]

Reviewing. But this is likely to be a quick fail unless something is addressed relating to criterion 3(a) (all essential aspects of the topic must be covered). Namely: the article as it stands describes only a version of binary search that finds elements when they are already in the array, but it returns nothing useful when the query value is not in the array. But to my understanding, that is not a useful version of the algorithm. If what you want is only equality-based lookups, you would be better off with a hash table, or maybe a bitmap for sets with small ranges and no data associated with the values. What binary search is more useful for is something hash tables can't easily do: predecessor, successor, or nearest neighbor queries. Currently the article mentions nothing of this: nothing about using hashing instead of binary search when you need only exact matches, nothing about predecessor, successor, or nearest neighbor queries, and very little about how to choose between binary search of sorted arrays versus other alternatives (all I see is a uselessly-short warning that the cost of sorting may make a linear search a better choice in some unspecified circumstances). Without a more complete and accurate coverage of the types of abstract data type that binary search can be a good match for, there is no way this can pass 3(a). —David Eppstein (talk) 05:32, 27 March 2016 (UTC)[reply]

I've started to expand the article per your feedback. Esquivalience ^t 20:32, 27 March 2016 (UTC)[reply]

Done hash tables and approximate matches, on to bitmaps and binary search versus other searching algorithms. Esquivalience ^t 03:33, 28 March 2016 (UTC)[reply]

I certainly agree that the article should describe the special value of binary search as opposed to (among others) hash tables. However, what User:Esquivalience has mostly added is more detailed description of hash table algorithms, almost all of which is better covered in that article, and is completely irrelevant to binary search or the comparison of binary search to hash tables. I removed that material, but Esquivalience put it back.... Other editors' comments are welcome. --Macrakis (talk) 19:33, 28 March 2016 (UTC)[reply]

What I think the binary search article needs is not a description of hash tables, binary search trees, etc., but a comparison of the relative advantages and disadvantages of these solutions: what do you need to know to decide whether a sorted array with binary search is the right or wrong solution to your programming problem? —David Eppstein (talk) 20:01, 28 March 2016 (UTC)[reply]

@David Eppstein and Macrakis: I have removed most of the description of the individual data structures, instead focusing on comparing them to binary search of a sorted array of records. Esquivalience ^t 01:36, 29 March 2016 (UTC)[reply]

Second reading[edit]

Good article criteria: 1a. Good prose: some grammatical errors and run-on sentences detailed below.; 1b. Complies with MOS: some issues with lead, figure sizing and section layout detailed below.; 2a. References well formatted: mostly, but some are missing important data.; 2b. The references used are reliable and cover all claims: There are a couple of self-published sources but they don't look problematic. However, there are many claims in the article that are missing sources.; 2c. No original research: mostly true, but there are some statements in the article that are just wrong, possibly through editors adding OR.; 2d. No copyvio or plagiarism: none found.; 3a. Addresses the main aspects of the topic: Now much better, but we're still missing any discussion of how to use binary search to find non-exact matches.; 3b. No unnecessary detail: I think the "Exclusive or inclusive bounds" section is problematic wrt this criterion.; 4. Neutral: No major problems here.; 5. Stable: Hmm. Although much of the editing was in response to my first round of review, there have been some major edit-revert cycles recently, both in response to my review and earlier.; 6a. Images are properly licensed: yes.; 6b. Images are relevant: yes, but some improvement in legibility is needed. Images have good captions: again, there is room for improvement.

Lead

The first and second paragraphs accurately summarize the first and second sections (algorithm and performance), but after that the remaining lead paragraph does a poor job of summarizing the rest of the article.

Expanded the lead to summarize the article. Esquivalience ^t 23:28, 7 April 2016 (UTC)[reply]

"that are stored contiguously": is there any other kind of array?

Removed. Esquivalience ^t 23:28, 7 April 2016 (UTC)[reply]

The image in the infobox has text that is too small to be legible. Why is the image scaled to below the default image size? And the use of an image size specified as an absolute number of pixels rather than as a relative scaling factor violates MOS:IMGSIZE.

Done. Esquivalience ^t 23:28, 7 April 2016 (UTC)[reply]

The image caption omits some key information (that the search key is 4) that would help make the image more understandable (once it is redrawn or resized to make the search key in the image legible).

The space complexity claim in the infobox is confusing or wrong. The working space for an iterative binary search is indeed constant. The working space for a recursive binary search is logarithmic. And both kinds of search use linear space for the sorted array itself. This should either be stated more clearly or the term omitted from the infobox.

Algorithm

"This method can be described recursively or iteratively": this is now just a throwaway line with no source and no later expansion.

Removed. Esquivalience ^t 23:28, 7 April 2016 (UTC)[reply]

Why are we using 1-based array indexing when all modern programming languages use 0-based indexing?

Converted to 0-based indexing. Esquivalience ^t 23:28, 7 April 2016 (UTC)[reply]

What is a "logical procedure" and how does it differ from some other kind of procedure?

Likely some needless fluff; removed. Esquivalience ^t 23:28, 7 April 2016 (UTC)[reply]

The "inclusive or exclusive" variation subsection is far out of proportion to its small importance (just a coding detail). It also appears to be mostly unsourced.

Removed entirely. Esquivalience ^t 23:28, 7 April 2016 (UTC)[reply]

This algorithm still has the major major flaw that I already complained about: it only works for finding exact matches, a problem for which binary search is the wrong solution. There is still no description of how to use binary search to solve successor, predecessor, or nearest neighbor queries, nor even (except later hidden in the hash table subsection) any mention that it can do so.

Added a new section on approximate matching. Esquivalience ^t 23:28, 7 April 2016 (UTC)[reply]

Re the link on "place": see WP:SUBMARINE.

Removed link. Esquivalience ^t 23:28, 7 April 2016 (UTC)[reply]

Performance

Using wiki-formatting for the floor operation is very ugly. If you're going to use that operation, do it in <math>.

Converted all math to TeX. Esquivalience ^t 01:18, 8 April 2016 (UTC)[reply]

How is it possible that comparison tree is a redlink? Maybe we should instead link to Decision tree model, rather than writing here as if this model is unworthy of a link.

The use of comparison trees to analyze binary search needs a source.

cited TAOCP §6.2 as well as Timothy 1997. Esquivalience ^t 01:33, 10 April 2016 (UTC)[reply]

The image caption could use more detail: what is the array being searched? And what is the search value?

The average case analysis needs a statement of assumptions (maybe, we are only searching for things that are in the array, and they are all equally likely?).

Done. Esquivalience ^t 01:33, 10 April 2016 (UTC)[reply]

The best case claim has no source.

Added ref. Esquivalience ^t 01:33, 10 April 2016 (UTC)[reply]

The sentence "In practice, for all but large n, ..." is too long, and should be split into multiple shorter sentences.

Split. Esquivalience ^t 01:33, 10 April 2016 (UTC)[reply]

Most importantly the average case analysis here is incorrect, because it is based on a different version of binary search, one that does the equality test second in each iteration. The version actually described here does the equality test first, so most iterations (all but the last one) take two full comparisons.

Fixed. Esquivalience ^t 01:33, 10 April 2016 (UTC)[reply]

What does small vs large mean here? What is the cutoff for which this optimization is worthwhile? 10? 100? 100000?

Knuth 1998 gives a value of 2⁶⁶; added. Esquivalience ^t 01:33, 10 April 2016 (UTC)[reply]

Fractional cascading is worth mentioning somewhere in this article, but is this section the right place for it? And it needs a source.

Moved to variations section.

Binary search versus other schemes: Thanks for adding this badly-needed section.

Binary search versus other schemes — Hash tables

Hash tables is plural, not matching the singular verb "is".

Fixed.

Re "the only information given on a failed search is that the key is not present in any record": but that's also true of the pseudocode for binary search given here!

Added approximate matching section to Binary search algorithm#Algorithm.

Some kinds of hash table (in particular cuckoo hashing) have constant worst case search time, not merely constant average case

Changed to "most implementations", mentioned perfect hashing as footnote.

"is retained": awkward passive wording.

Reworded.

Binary search versus other schemes — Binary search trees

"The main advantage of BSTs": here and later I think it would be less WP:TECHNICAL to just spell out binary search trees rather than using an acronym.

Done

The same sentence has yet another singular-plural mismatch.

Fixed.

The first paragraph needs a source.

Added (Sedgewick & Robert)

No mention of the space penalty for storing the items in a collection of separate tree node objects instead of a single array.

Added mention.

Where does the 1.386 constant factor come from??? Perhaps this is the expected time for binary search trees created by random insertion without rebalancing, but that's a very specific choice of assumptions that, if you're making them, should be stated explicitly. And it needs a source. But it would be better just to stick to generalities. There are many types of balanced binary trees, they generally have logarithmic search time and update times (not just the vague statement here about updates being slow), the constant factor in the search time is generally greater than one, but for weight-balanced trees it can be made arbitrarily close to one.

Replaced with "slightly worse", which conveys the message clearly enough.

The claim that traversing the nodes of a binary search tree takes linearithmic time is flat-out false. An in-order traversal is a linear-time operation.

Fixed. Esquivalience ^t 02:05, 10 April 2016 (UTC)[reply]

Binary search versus other schemes — Linear search

What does "non-trivial" mean?

Likely some chaff; removed.

"amortized" is a technical term that needs an explanation.

Clarified using "spread".

For a single search, sorting + binary search is slower than not sorting + linear search, so I think more explanation is needed here.

The new wording should hopefully be clear enough that the cost of sorting needs to be spread over all searches.

It might also be worth mentioning that linear search works for linked lists and dynamic arrays and that those structures allow significantly simpler addition and deletion of elements (constant time) than sorted arrays or even binary search trees.

Added mention.

The second half of the paragraph needs sources.

Added.

Variations — Uniform binary search

The second half of the first sentence has a singular-plural mismatch between a verb and its subject.

Fixed.

The second paragraph is completely opaque to me. Either expand and clarify or remove it.

It was referring to actually storing the widths in another array, but pretty obscure technique; removed.

Variations — Fibonaccian search

I don't see the point of including this section; Knuth may have thought it worthy of mention, but only as a curiosity rather than as a useful algorithm. There is no division cost to be saved, because there is no actual division operation in binary search (the operation that appears to be a division, but division by the constant 2, would be replaced by a shift operation by any reasonably intelligent optimizing compiler).

What should be mentioned here instead is Fibonacci search technique, a related algorithm for finding maxima of unimodal functions.

(Response to both) I removed the claim that it reduces the cost of the midpoint, but kept the rest (as the technique is tied to binary search), but added the original purpose of Fibonaccian search.

Variations — Exponential search

What is "the first element in index 2^j" supposed to mean? How about more clearly "the first element whose index is a power of two", and separating the descriptions of what we're looking for and how we look for it into two separate sentences.

Reworded.

The wording implies that the analysis is valid only for positions near the start of the array. This is false. The analysis is valid always; however, for positions greater than sqrt(n) the number of comparisons is larger than binary search.

Clarified.

Why are we putting floor signs around the log? We don't do that elsewhere, and it looks ugly.

Moved floor signs.

Variations — Interpolation search

"arrays whose elements successively increase by a constant or close to constant amount": this is overly specific. What interpolation search actually assumes is that the array values are distributed nearly uniformly across their target range.

Replaced.

The first paragraph has no footnote. Even if the source is the same as the second paragraph the footnote should be repeated.

Added.

The average case number of comparisons is stated both very specifically (without the use of O-notation) and very vaguely ("about", no definition of n, and average case without specifying what it's an average over). The actual bound is at most log_2 log_2 n comparisons, in expectation, for sorted arrays of n real numbers drawn at random from the uniform distribution on an interval.

I believe n should be clear enough, but replaced with asymptotic notation.

There is a significant difference between the assumptions of interpolation search and of binary search that is not mentioned here, and should be: interpolation search requires that the array elements be numbers. Binary search on the other hand requires only that they be drawn from a totally ordered set.

Mentioned.

Variations — Equality test at end of program

The start of this subsection makes no sense. First of all, "dichotomic" is not making two choices, it is making a single choice between two options. Second, the algorithm as presented doesn't do that. We are saying "the algorithm chooses between two options, if you ignore the third option that it also chooses from", but why should we ignore part of the algorithm in making this classification. And does saying it's dichotomic really add to reader understanding? This has all the appearance of a wikilink awkwardly wedged into an article because some other article wanted to avoid being an orphan rather than to actually improve the article.

"dichotomic" seems irrelevant (this variation was not about the number of potential choices), so removed and reworded.

How does this save only half a comparison per iteration rather than a whole comparison? Note that in the example code, the equality test is done first, so it happens every iteration.

Fixed.

For all but large n: how large is large?

Specified in the performance section (2⁶⁶

Don't start a paragraph with a pronoun.

How is returning the last index of a match rather than the first index of a match, in case of duplicate array elements, an advantage?

Non-duplicate successor queries may be sped up slightly.

History

The Inakibit-Anu lists regular numbers (the ones whose reciprocal has a finite segasimal representation), not all numbers. Also, as far as I can tell (e.g. from reading a related source "Ancient Babylonian Algorithms", Knuth 1972) the purpose of listing these numbers in sorted order is unknown and can only be inferred to be for the purposes of making search easier, so saying that the table was "sorted to allow for faster searching" overstates the case. And in any case since the search would be done by hand rather than by machine it would be likely to be some kind of interpolation search rather than binary search.

This would imply that the Babylonians were more clever than the programmers from 1946-60 (where binary search worked onky for a limited number of arrays), so removed altogether.

The first sentence of the second paragraph is written very confusingly as a double negative.

Moot per above.

The formula for 2^n-1 is badly formatted. If you're going to use wikiformatted math instead of <math>, you need to put spaces around the minus sign, use a proper minus sign (−) instead of a hyphen, and prevent it from being wrapped. You might also consider using the {{math}} template to make the formatting more closely resemble <math> formatting.

Replaced with TeX.

Has there really been no history of this subject since 1971?

Added fractional cascading in 1986, but all the other papers I could find were apparently useless variations of binary search, one being binary search in a linked list.

Implementation issues

Don't use pronouns such as "it" at the start of a section, where there is no earlier noun for the pronoun to refer to.

No instance of the above appears in the section after I've addressed this section.

The description of the conditions under which an overflow are in some ways too vague ("wanders toward the upper end"), but in some ways overly technical (why is the bound on last specified as a number divided by two instead of just a number?).

Reworded.

The only error described here is overflow. However, there are other errors (mostly related to handling two-way rather than three-way branching in each iteration, as mentioned earlier in the "Equality test at end of program" subsection) that are more important to avoid, and they should at least be mentioned.

Mentioned.

The Ruggieri source is claimed to support the fact that there are two issues with the median calculation: representability of first-1 and last+1, and overflow in the calculation of first+last. But in fact it supports only the second issue, and its alternative solution.

Cited Programming Pearls (which contains a whole chapter on binary search pitfalls)

Language support

Only four of the ten entries here are sourced

Added refs.

Notes, See also: The section ordering violates MOS:LAYOUT; Reordered.

References

Mostly consistently formatted in CS1.

Many notable authors are missing authorlinks (e.g. Stroustroup, Karlin, Mehlhorn, Fich, Tarjan).

Added some.

Similarly, notable journals such as SIAM Journal on Computing should be linked.

Linked.

References [17] and [18] appear to be the same thing (Knuth 6.2.2) and should be consolidated into one reference or properly differentiated from each other.

I believe that all the Knuth cites have section names on them.

Some references to books (e.g. Gast, Bentley) are missing specific page numbers; this needs to be fixed, as otherwise the reference fails WP:V.

Added.

The Wirth reference is missing any publication data.

Bottenbruch's 1962 paper was enough; removed Wirth ref.

Summary: Getting better, good enough to avoid a quick fail for now, but still quite a ways to go.

Deadline[edit]

That was all from April 3. Two weeks later, major components of this review remain unaddressed; the article is being improved, but slowly; at this rate it will take many weeks to converge, if it does at all. The GA review process is not really supposed to be used for long-term revisions like that, but more for things that can be handled quickly to bring the article into compliance. On the other hand, this is an important topic that I really would like to see covered at the GA level. So, to push things along, how about imposing a deadline: if something is done to handle all of the above comments by next weekend, I will proceed to another round of review, but otherwise we can leave this review as a fail, giving you as much time as you want to bring it up to par before nominating it again. —David Eppstein (talk) 18:27, 19 April 2016 (UTC)[reply]

@David Eppstein: Sorry for the delay, but I believe that I've addressed all the concerns above as of now (except maybe some missing author links, which I will try to find). Esquivalience ^t 01:08, 22 April 2016 (UTC)[reply]

Third reading[edit]

Left over from second reading

The lead now more or less summarizes the rest of the article but it still has major problems; see below.

All requested changes to the "Algorithm" section have been done

Performance section "The image caption could use more detail": not addressed

Done

Comparison section, "Replaced with slightly worse": the replacement sentence is very confusingly written. And "Clarified using spread": Not much of a clarification. At least link to an article on amortized analysis.

Reworded and linked to amortized analysis.

Variations: the suggested replacement of Knuth's uninteresting "Fibonaccian search" (a technique for doing the same thing as binary search using Fibonacci numbers instead of midpoint bisection) by "Fibonacci search" (a technique for finding extreme points of unimodal arrays by doing something resembling binary search, again using Fibonacci numbers) has been done only halfway, resulting in some strange hybrid that still uses one name and the Knuth reference but now mostly talks about the other algorithm that has a different name and is not what Knuth describes. And then it says that it's less efficient, which misses the point (it's not less efficient, for the purpose of finding extreme points).

Removed Knuth Fibonacci search so that it is purely about Kiefer's Fibonacci search.

History and implementation issues: all requests done.

Library support: more of these are now sourced, but not all of them. External links within the main article text are neither a valid way to source things nor allowed by MOS for any other reason.

Done

Duplication of refs 17 and 18 (now 14 and 15) still exists. Both are to Knuth section 6.4, without any more specific subsection or page numbers. They now have different section titles but how is that possible when they have the same section number?

For the {{Sfn}} citations, I meant to place the section number of the entire subchapter containing the specific subsection, then the subsection name itself after the section number. However, this was confusing, so made it clear that the section titles after the section numbers were subsections of a section. Also added subchapter titles.

Lead

"Exact searching is enough for many applications, most notably associative arrays. However, it is difficult or impossible to perform approximate search operations—important for some applications such as priority queues—on many of these data structures, such as finding the next-smallest, next-largest, and nearest element relative to an element, while binary search can perform such searches in logarithmic time or faster." This sort of detailed evaluation is out of place in the lead. And the suggestion that binary searches in sorted arrays are a good way of implementing associative arrays is wrong.

Condensed and fixed.

"variations of binary search that may run faster on some arrays or computers": speed is not the only reason for the variations.

Condensed altogether to just "variations of binary search" (as appropriate for lead).

"fractional cascading extends binary search to efficiently solve various complex searching problems in domains such as computational geometry": unnecessarily vague and technical. All fractional searching does is to speed up searches for the same key in multiple sorted lists. How hard is that to say?

Condensed.

History: all requests done.

Algorithm

"Some implementations may place the comparison for equality at the end": this is ambiguous. Does it mean the end of each iteration or the end of the whole algorithm? (Both are possible and both are improvements.)

I changed the procedure so the equality check per iteration was done last altogether. This leaves only one possible variation that fits under this wording, but I reworded so its says "end of an algorithm".

"Particularly, binary search can be used to compute, relative to a element": No! Finding the successor or predecessor of an element can be done trivially without recourse to any kind of search (just add one or subtract one to its index). What is needed is to find the successor or predecessor of a search target, especially when that target is not an element.

Added rank queries (the number of elements below a certain value) and then provided the procedure for predecessor and successor queries.

Range searches: the calculation of the number of elements is off by one, and needs some case analysis to handle the possibilities that the range endpoints are or are not elements of the array.

Fixed and used rank queries instead of binary search for range searches instead.

Performance

"and vice versa" too brief a shorthand for something like "and symmetrically for the right subtree".

Reworded.

"this is equivalent to a binary search where the element is found or the search is terminated only after the search has reduced to one element": false. Some reductions to a single element may use fewer than the worst case number of comparisons.

Clarified.

"This is equivalent to a binary search that has reduced to two elements": again false. Are you using a source that made a simplifying assumption that the comparison tree has a complete last level? Because that's not true in general.

Fixed.

"No search algorithm that is based solely on comparisons can exhibit better time performance than binary search." Does this refer to the worst case number of comparisons, or some other calculation of performance?

In terms of the number of iterations; changed.

How compatible are the claims that the binary search algorithm described here is absolutely the best possible (in terms of the number of comparisons, presumably) and that the equality-at-the-end optimization that reduces the number of comparisons by a factor of two (but is slower for realistic input sizes because of the other overhead of an additional loop iteration) is not worth it? They seem to contradict each other to me.

Changed to iterations per above.

As long as we're mentioning the equality-at-the-end-of-the-search variation that saves one comparison per iteration but loses an iteration, shouldn't we mention the equality-after-inequality-within-each-iteration variation that saves half a comparison per iteration (on average) for no cost at all? And when Knuth compared equality at the end to the alternative, did he compare it to the alternative with 2 comparisons per iteration or to the one with (average) 1.5?

Knuth compared it to the 1.5 average variations version.

The fractional cascading sentence could use a footnote (re-used from the later more detailed description).

A section link probably would serve the same purpose, but expanded it by a bit anyway.

Binary search versus other schemes

"the key is not present" should be "the target is not present".

Done

The paragraph break in the hashing subsection lands at an odd place.

Removed.

Footnote [b] is written as if these were the only ways to get an unbalanced tree. This is false.

Changed it to clarify that this only applies to perfectly unbalanced binary search trees.

"Balanced binary search trees attempt to balance their own nodes, improving the time performance of searches, but at a cost to insertion and deletion time.": this is wrong. Compared to maintaining a sorted array, insertion and deletion in balanced trees is much faster (logarithmic vs linear). And compared to an unbalanced tree, insertion and deletion in balanced trees is again faster (logarithmic vs whatever the height of the tree is). Also, the words of Yoda are appropriate here: "Do. Or do not. There is no try."

Fixed, and ~~latter advice is against quantum mechanics~~ do.

"a subset of the operations available to sorted arrays": Really? Which operation is not available?

Changed to just "the operations available to sorted arrays".

Variations

For exponential search, "element whose index is a power of two that exceeds the target value" is ambiguous. Does the index exceed the target, or does the element?

Clarified.

For interpolation search, in the sentence "It is most successful as a precursor to binary search, eliminating elements while the search space is large and switching to binary search afterwards.", what is the subject? The obvious guess is that it is the interpolation search algorithm, but this doesn't make sense, because that algorithm doesn't switch anything.

Removed altogether.

"attempts to calculate the position of the target value": Yoda again. What it does is to estimate the position.

Reworded.

"search space of the array": is search space some strange new synonym for length?

Changed to "length".

"time difference between interpolation and binary search does not compensate": circular reasoning, since the time difference is what we're trying to reason about. Maybe you mean the smaller number of comparisons made by interpolation search?

Clarified, and I meant the smaller number of iterations.

"It is most successful when": the most recently used noun was "small arrays". Presumably they are not the intended subject of this sentence. So use a noun, not a pronoun.

The sentence was removed altogether (see above).

"Equality test at end of program": This seems redundant with the paragraph about the same thing in the performance section. And it has nothing to do with whether you return the position of the first or last match, which can be done in either variation.

Removed.

History

The "not of length 2^n − 1 for any natural number n" wording is still a little awkward.

Reworded.

Implementation issues

"When Jon Bentley assigned it": assigned what? Swapping the pronoun here for the noun in the next clause would be less confusing.

Done.

You seem to be missing another more subtle overflow bug. When the initial R is the max representable integer, and the target value is larger than anything in the array, then the pseudocode given will eventually reach a state where L=R=maxint, compare the target to the value at that index, and try to set L to maxint+1. This will either cause an overflow exception or set L to something very small and keep going; in either case that's not the desired behavior.

Added.

Library support

See leftovers, above.

Sourced.

References

The Knuth references are all missing page numbers, important since some are clearly to subsets of the specified sections (unless the same section can have multiple titles).

[1], Williams: Formatted as a journal, but actually a conference, and it would be helpful to spell out the conference name.

[18], Beame&Fich: Link Journal of Computer and System Sciences

[25], Kiefer: Link Jack Kiefer (statistician)

[35], Lehmer: Another conference paper formatted as a journal paper.

[36], Bottenbruch: Should list the whole page range of the published paper (161–221), with a note at the end of the reference for the page number actually used as a source (Section 43, "Program for Binary Search", p. 214). Also add the doi, 10.1145/321119.321120.

[40], Pattis: Link Richard E. Pattis

[41], Bloch: Link Joshua Bloch

[42], Ruggieri: Link Information Processing Letters

All done.

Summary

The previous summary is still accurate: "Getting better, good enough to avoid a quick fail for now, but still quite a ways to go."

@David Eppstein: I (or at least believe that I) have responded (or acted in response) to all the comments above. Esquivalience ^t 03:43, 1 May 2016 (UTC)[reply]

Fourth reading[edit]

Quite a lot closer this time. I only found the following issues:

"Each iteration of the binary search algorithm defined above makes 1.5 comparisons": No, either 1 or 2. But for targets that are equally likely to match any of the keys, the average is 1.5/iteration.

Clarified.

"binary search trees will most likely be imbalanced, resulting in slightly worse performance than binary search: I think this overstates the issue. "Imperfectly balanced", maybe?

Imperfectly balanced sounds just right; replaced.

"Binary search is faster than linear search for sorted arrays": unless they are very short.

Mentioned such, and added footnote with more details on this.

"A variation of the algorithm forgoes the lower and upper pointers or variables, instead storing the index of the middle element and the width of the search; i.e., the number of elements around the middle element that have not been eliminated yet.": This sentence is too long.

Shortened sentence.

Footnote 9: Harv error: link from #CITEREFGoldmanGoldman2007 doesn't point to any citation.

The right year was 2008; fixed.

—David Eppstein (talk) 05:24, 9 May 2016 (UTC)[reply]

@David Eppstein:: Fixed above issues and added some more detail. Esquivalience ^t 01:26, 10 May 2016 (UTC)[reply]

I made some small copyedits today, after which I can't find anything to complain about. The prose is uniformly clear, the section structure makes sense, and obeys the section structuring criteria of MOS (WP:GACR #1). Everything is appropriately sourced, without original research or copyvios (#2). After all this revision, the article now adequately addresses all the main aspects of the topic, and doesn't go into excessive detail about any aspect (#3). Its evaluative components (e.g. comparisons with other data structures) are neutral, accurate, and based on reliable sources (#4). Although it has changed significantly since the start of the review, the disputes that were apparent at that time seem to have settled down, and there are no major disagreements still evident, so I think it is stable enough (#5). The article is illustrated appropriately (if a little minimally still), the images are appropriately licensed, and the image captions are accurate and informative (#6). I will go ahead and pass this now. —David Eppstein (talk) 22:51, 11 May 2016 (UTC)[reply]