Back to index

Binary Indexed Segment Tree

“Binary Indexed Tree” is a bad name for Fenwick trees.


I'll begin with a rundown of Fenwick trees and segment trees.

Fenwick trees

A Fenwick tree keeps track of prefix sums of an array of size n. You can update an element of this array, and get a prefix sum up to some index, both in O(log(n)) time.

The problem with explicitly storing every prefix sum is that any operation to change an index will have to update every prefix sum after this index. So, instead of storing the prefix sums, we will break up each prefix sum into the sum of a bunch of smaller ranges, and we store only the smaller ranges.

Traditionally, the way to do this is to break up a prefix sum of length i starting with the biggest range of length 2^something, and then break up the remaining range recursively.

 1: |1|
 2: | 2 |
 3: | 2 |1|
 4: |   4   |
 5: |   4   |1|
 6: |   4   | 2 |
 7: |   4   | 2 |1|
 8: |       8       |
 9: |       8       |1|
10: |       8       | 2 |
11: |       8       | 2 |1|
12: |       8       |   4   |
   ...
   i: |          2^something          | smaller prefix sum  |
    

This makes the actual range sums you have to store look like this:

...
|---------------|               ...
|-------|       |-------|       ...
|---|   |---|   |---|   |---|   ...
|-| |-| |-| |-| |-| |-| |-| |-| ...
    

We need to store the range sums somewhere, and to avoid pointers, we'll just store them contiguously in an array. The question then becomes, given one of these ranges, what index should it belong to. One neat observation is that when increasing the number of elements in the original array by 1, all of the ranges we need to store remain the same, and only one extra range has to be stored. Therefore, we should always add the extra range to the end of our array, giving rise to this indexing:

...
|----------------------8|                      ...
|----------4|           |---------12|          ...
|----2|     |----6|     |---10|     |---14|    ...
|-1|  |-3|  |-5|  |-7|  |-9|  |11|  |13|  |15| ...
    

We can then start to work out some more observations to figure out how to properly maintain this data structure for each update and query operation.

First, a range stored at index i in the Fenwick tree ends at the original array's index i. The question becomes, where does this range start? Well, remember that we broke up a prefix sum by repeatedly subtracting off ranges with length being the largest power of 2 we could fit. The last range to remain therefore has length being the smallest power of 2 in the binary representation of i. If you pull out your book of bit hacks, you'll find that isolating this least significant set bit can be done by i & -i. So, if we clear the least significant bit of i using i := i - (i & -i), we get the length of our range, which we can use to find the starting index.

This gives our method for calculating the prefix sum up to index i: Add the value at index i in the Fenwick tree, clear the least significant set bit of i, and keep repeating until i is 0.

Then comes the question of how to update an index i. We know that all ranges ending at an index before i will never contain i. Therefore, the first range we want to update is at index i in the Fenwick tree. Then, if we can always figure out the next smallest index we want to update, we'll be able to enumerate all of the ranges we want to update.

Let's consider the range stored at Fenwick tree index i + j. We want to update this range if the length of this range is greater than j. If j is less than the least significant set bit of i, then i + j is equal to i | j, which tells us that the least significant set bit of i + j is at most the least significant set bit of i, which in turn is less than j. So, if j is less than i & -i, the length of the range stored at index i + j is less than j, and we don't want to update this range.

Let's consider the next range, j = i & -i`. Then the least significant set bit of i + j is greater than the least significant set bit of i, which is equal to j. Therefore, length of the range stored at index i + (i & -i) is greater than i & -i, so it is a range we want to update. We have found our next smallest index to update, so wen can keep using i := i + (i & -i) to get all the indices we want to update in our Fenwick tree.

Segment trees, and an unorthodox indexing scheme

A Segment tree allows for O(log(n)) range queries and modifications over an array of size n. They support more operations than standard Fenwick trees; while you could do range sums by subtracting prefix sums, you couldn't do range minimums. The drawback is that segment trees take twice the memory than Fenwick trees.

A node in a segment tree stores the sum of the elements in some range. We break this range in two, and assign the left part to the left child, and the right part to the right child. The left and right parts recursively break apart their ranges, until they reach ranges of size 1.

To implement updating an element, you find the leaf node for the range containing just this singular element, modify it, then rebuild the ranges going back up to the root. To implement a range query, you sum up the appropriate ranges in the segment tree, which you can find either recursively or iterating from the bottom up. There's better resources on how to do this elsewhere, and the specifics aren't pertinent to the point I'm trying to make, so I'll leave them out. If you're interested in other people's better implementations/explanations of segment trees, there's This cp-algorithms article. and This Codeforces blog

Interesting fact: no matter how you break up the ranges, there will always be a total of 2 * n - 1 nodes in the subtree for a range of length n. We can prove this inductively. Let f(n) be the number of nodes in the subtree for a range of length n. Let f(1) = 1 be our base case. Then to show that f(n) = 2 * n - 1, we have to consider all ways to break up this range of length n. That is, we want to show that 2 * n - 1 = 1 + f(l) + f(r) for l,r>0, l + r = n. Using strong induction, 1 + f(l) + f(r) = 1 + 2 * l - 1 + 2 * r - 1 = 2 * k - 1. Therefore, there always is a total of 2 * n - 1 nodes in the subtree for a range of length n.

We have a few choices to make, which will affect the structure of our segment tree. I could describe the normal choices people make, but the point of this writeup is to talk about why I dislike the name “Binary Indexed Tree”. Let's do something else and make unorthodox choices.

First, notice that the choice of using powers of 2 as our range lengths in a Fenwick tree led to a neat similarity between Fenwick trees of increasing sizes, which we used to derive our indices. Since segment trees require 2 * n - 1 nodes, we can try to do a similar construction, except we add 2 segment tree nodes each time we increase n by 1.

|------------------------------8/9------------------------------|
|--------------4/5--------------|-------------12/13-------------|
|------2/3------|------6/7------|-----10/11-----|-----14/15-----|
|---1---|--2/3--|--4/5--|--6/7--|--8/9--|-10/11-|-12/13-|-14/15-|
    

That bottom row already is guaranteed one odd number in it, we may as well make them all odd. Let's also take a look at some binary representations of the indices.

|--------------------------1000-------------------------|
|------------0100-----------|------------1100-----------|
|-----0010----|-----0110----|-----1010----|-----1110----|
|-0001-|-0011-|-0101-|-0111-|-1001-|-1011-|-1101-|-1111-|
    

Let's try making some observations about this thing we've just created. First, the least significant set bit of a range's index is its length. Second, if we remove this least significant set bit (and shift the more significant bits down by one), we get the range's starting index. Since we have the range's starting index and its length, we know its end as well.

Let's concretely work out the parent/child relations. We can represent index i as a binary string

a100000000000
  '-b times-'
      
Then the starting index is
a00000000000
 '-b times-'
      
and the length is
100000000000
 '-b times-'
      
. The left child shares this starting index, but the length is halved. Therefore, the left child is at index
a01000000000
   '- b-1 -'
      
To get the right child, we add the half the length of the original index to the starting index, giving
a11000000000
   '- b-1 -'
      
So, with rmsb = i & -i, the left child is at (i ^ rmsb) | (rmsb >> 1), and the right child is at i | (irmsb >> 1).

Similar reasoning will give expressions for more tree traversal operations. Getting the parent is (i ^ rmsb) | (rmsb << 1). Getting the sibling is (i ^ (rmsb << 1)). Getting the successive/preceding range is i ± (rmsb << 1).

So, I coded up some C++ templated segment tree code based off of the Codeforces blog mentioned above, and gave it both the normal indexing/traversal method that the blog entry uses, and my funky binary-indexing-inspired method. I threw a bunch of random data and random range queries at both, and they agreed on all their answers, so I'm pretty confident in my janky hack's correctness.

Timing wise, on my machine using n = 10^6 and 10^7 random updates and range queries, the original code took ~10.3 seconds, while mine took ~16.6 seconds. I could try to analyze cache efficiency, but that'll probably lead to a discussion that could use its own article.

Conclusion

To summarize, here's a quote from Fenwick's original paper:

In recognition of the close relationship between the tree traversal algorithms and the binary representation of an element index, the name ‘binary indexed tree’ is proposed for the new structure.

I have made a segment tree where there is a close relationship between the tree traversal algorithms and the binary representation of an element index. Following the original reasoning behind the name “Binary Indexed Tree”, we now have two distinct data structures that could fall under this label. Thus, this name cannot unambiguously refer to Fenwick trees. I believe this is a fairly conclusive proof that “Binary Indexed Tree” is a bad name for Fenwick trees.

I'm personally going to call Fenwick trees “Prefix-sum trees”, since I think that's way more descriptive of their structure and capabilities.