## 7 Answers

Reset to default

# Trending sort

**Trending** sort is based off of the default sorting method — by highest score — but it boosts votes that have happened recently, helping to surface more up-to-date answers.

It falls back to sorting by highest score if no posts are trending.

This question was just asked again recently. Apart from Knuth’s quote that “Although the basic idea of binary search is comparatively straightforward, the details can be surprisingly tricky”, there is the staggering historical fact (see TAOCP, Volume 3, section 6.2.1) that binary search was first published in 1946 but the first published binary search *without bugs* was in 1962. And there is Bentley’s experience that when he assigned binary search in courses for professional programmers at places like Bell Labs and IBM and gave them two hours, everyone reported they had got it right, and on examining their code 90% of them had bugs — year after year.

Perhaps the basic reason why so many programmers make mistakes with binary search, besides Sturgeon’s Law, is that they are not careful enough: *Programming Pearls* quotes this as the “write your code, throw it over the wall, and have Quality Assurance or Testing deal with the bugs” approach. And there is a lot of scope for error. Not just the overflow error that several of the other answers here mention, but logical errors.

Below are some examples of binary-search errors. This is by no means exhaustive. (As Tolstoy writes in *Anna Karenina* — “All happy families are alike; every unhappy family is unhappy in its own way” — every incorrect binary search program is incorrect in its own way.)

## Pattis

The following Pascal code is taken from the paper *Textbook errors in binary searching* (1988) by Richard E Pattis. He looked at twenty textbooks and came up with this binary search (BTW, Pascal uses array indexes starting from 1):

```
PROCEDURE BinarySearch (A : anArray,
Size : anArraySize,
Key : INTEGER,
VAR Found : BOOLEAN;
VAR Index : anArrayIndex);
Var Low, High : anArrayIndex;
BEGIN
LOW := 1;
High := Size;
REPEAT
Index := (Low + High) DIV 2;
If Key < A[Index]
THEN High := Index - 1
ELSE Low := Index + 1
UNTIL (Low > High) OR (Key = A[Index]);
FOUND := (Low <= High)
END;
```

Looks ok? This has more than one error. Before reading further, see if you can find them all. You should be able to guess what the code does even if you’re seeing Pascal for the first time.

He describes five errors that many programs have, and in particular the above has:

**Error 1**: It doesn’t run in O(log n) time, where n = Size. In their enthusiasm for Proper Programming Practice, some programmers write binary search as a function/procedure, and pass it an array. (This is not specific to Pascal; imagine in C++ passing a vector by value rather than by reference.) This is Θ(n) time just to pass the array to the procedure, which defeats the whole purpose. Even worse, some authors apparently give a *recursive* binary search, which passes an array each time, giving a running time that is Θ(n log n). (This is not far-fetched; I’ve actually seen code like this.)

**Error 2**: It fails when size = 0. This may be ok. But depending on the intended application, the list/table being searched *may* shrink to 0, and it must be handled somewhere.

**Error 3**: It gives a wrong answer. Whenever the final iteration of the loop starts with Low=High (e.g. when Size=1), it sets Found:=False, even if `Key`

is in the array.

**Error 4**: It causes an error whenever `Key`

is less than the smallest element of the array. (After `Index`

becomes 1, it sets `High`

to 0, etc.; causes an out-of-bounds error.)

**Error 5**: It causes an error whenever `Key`

is greater than the largest element of the array. (After `Index`

becomes `Size`

, it sets `Low`

to Size+1 etc.; causes an out-of-bounds error.)

He also points out that some obvious ways to “fix” these errors turn out to be wrong as well. Real-life code also often has this property, when a programmer wrote something incorrect, found an error, and then “fixed” it until it *seemed* correct without thinking carefully enough.

Of the 20 textbooks he tried, only 5 had a correct binary search. In the remaining 15 (he says 16, ironically), he found 11 instances of Error 1, six instances of Error 2, two each of Errors 3 and 4, and one of Error 5. These numbers add up to much more than 15, because several of them had multiple errors.

## More examples

Binary search is used for more than just searching an array to see if it contains a value, so here is one more example for now. I may update this list as I think of more.

Suppose you have an increasing (non-decreasing) function f: R->R, and (because you want a root of f, say), you want to find the largest `t`

such that `f(t) < 0`

. See how many bugs you can find in the following:

```
float high = INF, low = 0;
while(high != low) {
float mid = (low + high)/2;
if(f(mid)>0) high=mid;
else low=mid;
}
printf("%f", high);
```

(Some: There may be no such t in [0,INF], if `f`

is 0 on an interval then this is wrong, never compare floating point numbers for equality, etc.)

## How to write binary search

I used to make several of those errors — the first few dozen times I wrote binary search (which was during programming contests with time pressure), about 30% of the time there was a bug somewhere — until I found the simple way to write it correctly. I haven’t got a binary search wrong since (as I recall). The trick is very simple:

Maintain an invariant.

Find/decide and make explicit some invariant property that your “low” and “high” variables satisfy throughout the loop: before, during and after. Make sure it is never violated. Of course you also need to think about the termination condition. This is explained in good detail in Chapter 4 of *Programming Pearls* which *derives* a binary search program from semi-formal methods.

For instance, to slightly abstract out the condition being examined, suppose you want to find the largest integer value `x`

for which some condition `poss(x)`

is true. Even this explicitness of problem definition is more than many programmers start with. (For instance, `poss(x)`

may be `a[x] ≤ v`

for some value `v`

; this is to find out how many elements in the sorted array `a`

are grater than `v`

, say.) Then, one way to write binary search is:

```
int lo=0, hi=n;
//INVARIANT: poss(lo) is true, poss(hi) is false
//Check and ensure invariant before starting binary search
assert(poss(lo)==true);
assert(poss(hi)==false);
while(hi-lo>1) {
int mid = lo + (hi-lo)/2;
if(poss(mid)) lo = mid;
else hi = mid;
}
printf("%d \n",lo);
```

You can add more assert statements and other checks, but the basic idea is that because you update `lo`

to `mid`

*only* when you know that `poss(mid)`

is true, you maintain the invariant that `poss(lo)`

is always true. Similarly, you set `hi`

to `mid`

only when `poss(mid)`

is false, so you maintain the invariant that `poss(hi)`

is always false. Think about the termination condition separately. (Note that when `hi-lo`

is 1, `mid`

is the same as `lo`

. So don’t write the loop as `while(hi>lo)`

, or you’ll have a infinite loop.) At the end of the loop, you’re guaranteed that `hi-lo`

is at most 1, and because your always maintained your invariant (`poss(lo)`

is true and `poss(hi)`

is false), it cannot be 0. Also, again because of your invariant, you know that `lo`

is the value to return/print/use. There are other ways to write binary search of course, but maintaining an invariant is a trick/discipline that always helps.

5

Here are some I can think of:

**Off-by-one errors**, when determining the boundary of the next interval**Handling of duplicate items**, if you are suppose to return the first equal item in the array but instead returned a subsequent equal item**Numerical underflows/overflows**when computing indices, with huge arrays**Recursive**implementation, a design choice you should consider*vs*non-recursive

Are these what you have in mind?

2

Read this. Java’s binary search implementation hid a bug for almost a decade before anybody found it.

The bug is integer overflow. It didn’t cause people problems because hardly anyone was searching big enough data structures.

0

One crucial reason why people can’t implement binary search correctly is that we **don’t characterize binary search well, it is a well-defined problem but usually one doesn’t define it well.**

One universal rule is learn from failure. Here, thinking about ‘invalid’ cases help clarifying the problem. What if input is empty? what if input contains duplicates? should I implement it by one conditional test or two test (plus an extra test for early termination) per iteration? and other technical issues such as numerical overflow/underflow in computing indices, and other tricks.

Errors that could be avoided by characterize the problem well is Off-by-one errors,and handling of duplicate items, as @Zach Scrivena pointed out.

Many people view binary search as find a target value given an sorted array.

That’s how binary how is being used, not binary search it per se.

**I’ll try to give a relative rigorous definition/formulation of binary search, and show one way to get off off-by-one error and duplicate issues,** by conforming to one specific approach, which, of course is not new.

```
# (my) definition of binary search:
input:
L: a 'partially sorted' array,
key: a function, take item in L as argument
prerequisite:
by 'partially sorted' I mean, if apply key function to all item of L, we get a
new array of bool, L_1, such that it can't be partitioned to two left, right blocks,
with all item in left being false, all item in right being true.
(left or/and right could be empty)
output:
the index of first item in right block of L_1 (as defined in prerequisite).
or equivalently, the index of first item in L such that key(item) == True
```

this definition naturally resolves the duplicate issue.

There are commonly two way to denote arrays, [] and [), I prefer the latter, an equivalence of [) approach is using (start, count) pair instead.

```
# Algorithm: binary search
# input: L: a 'partially sorted' array, key: a function, take item in L as argument
while L is not empty:
mid = left + (right - left)/2 # or mid = left + count/2
if key(mid item) is True:
recede right # if True, recede right
else:
forward left # if False, forward left
return left
```

Thus, if you make your **“if True, Recede (end)”** and **“if False, Forward (start) “** part right, you are almost done. I call it **the “FFTR rule” of binary search.** If are going to find the first item, as in the definition above, left will be start, however right will be start if you are going to find the last item.

I you conform to the [) fashion, then a possible implement is,

```
while left<right:
mid = left + (right - left)/2
if key(L(mid)) == True:
right = mid
else:
left = mid+1
return left
```

lets validate it further, by first show convergence, and then show correctness.

**convergence:**

```
whenever left == right, we exit loop (also true if being empty at the first)
so, in the loop, if denote,
diff = (right - left)/2,
lfstep = 1 + diff/2, 'lfstep' for 'left index forward step size'
rbstep = diff - diff/2, 'rbstep' for 'right index back (recede) step size'
it can be show that lfstep and rbstep are alway positive, so left and right
will be equal at last.
both lfstep and rbstep are asymptotically half of current subarray size, so it's
of logarithm time complexity.
```

**correctness:**

```
if the input array is empty:
return the left index;
else:
if key(mid item) is true:
"recede right"
so mid and all item after mid are all true, we can reduce the search range
to [left, mid), to validate it, there are two possible cases,
case 1:
mid is the first item such that key(item) is True, so there are no true items
in new search range [left, mid), so the test will always be false, and we
forward left at each iteration until search range is empty, that is
[finalleft,mid), since we return finalleft, and finalleft == mid, correctly done!
case 2:
there are item before mid such that key(item) is True,
in this case we just reduce to a new problem of smaller size
else:
"forward left"
mid and all item before mid is false, since we are to find the first true one,
we can safely reduce to new search range [mid+1, right) without change the result.
```

equivalent (start, count) version,

```
while count>0:
mid = start + count/2
if key(L[mid]) == True:
right = mid
else:
left = mid+1
return left
```

**to summarize**, if conforming to [) convention,

```
1. define your key function of your problem,
2. implement your binary search with "FFTR rule" --
"recede (end) if True ( key(item) == True) else forward (start)"
examples:
if to find a value target, return index or -1 if not found,
key = lambda x: x>=target,
if L[found_index] == target: return found_index
else: return -1
```

As for the early termination by an extra test, I don’t think it worth the pay, but you can try.

Failing to consider that when calculating the midpoint between two indices summing the high and low values may result in integer overflow.

1

The pipelining architecture of modern processors is much more suited to do linear searches than to do binary searches which have lots of decisions and branches.

Therefore a common **performance bug** and a **premature optimization** (you know, these are the root of all evil) is using binary search at all when a simple linear search would be faster and of course simpler.

Depending on the number of reads, even sorting the data at all can make things slower. A break-even point between linear and binary can easily be at 1000 elements for simple keys (e.g. 32 bit integers).

One bug which I encountered while implementing Binary Search is having integer range overflow while calculating the mid element. As we all know, the steps of a Binary Search implementation are:

- Find the Mid element.
- Check if:

- Target element is greater than the mid element, search from mid+1 to end index
- Target element is smaller than the mid element, search from start to mid-1 index
- Target element is equal to the mid element, return the index.

In all of the above steps, recalculation of mid element is taking place. Now, mid element is found out as:

mid = (start+end)/2

now, if the end index is too high(for a very big array), it might overflow the INT range.

So, a better way of doing it is:

mid = start + (end-start)/2

if you check carefully, ultimately, its giving us the same value, but we get rid of the overflow issue which we might encounter.

Also, as we know, binary search can be implemented on a sorted list only. Suppose in the problem it is stated that the input list is sorted, but not mentioned whether it’s in Ascending or Descending order. in that case, you might need to add an extra condition to check that.

63