Table of Contents#### Download Safari Books Online apps: Apple iOS | Android | BlackBerry

### 3-2. Rounding Up/Down to the Next Power of 2

#### Rounding Down

##### Figure 3-1. Greatest power of 2 less than or equal to x, branch-free.

##### Figure 3-2. Greatest power of 2 less than or equal to x, simple loops.

#### Rounding Up

##### Figure 3-3. Least power of 2 greater than or equal to x.

Entire Site

Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

We define two functions that are similar to floor and ceiling, but which are directed roundings to the closest integral power of 2, rather than to the closest integer. Mathematically, they are defined by

The initial letters of the function names are intended to suggest “floor” and “ceiling.” Thus, flp2(x) is the greatest power of 2 that is ≤ x, and clp2(x) is the least power of 2 that is ≥ x. These definitions make sense even when x is not an integer (e.g., flp2(0.1) = 0.0625). The functions satisfy several relations analogous to those involving floor and ceiling, such as those shown below, where n is an integer.

Computationally, we deal only with the case in which x is an integer, and we take it to be unsigned, so the functions are well defined for all x. We require the value computed to be the arithmetically correct value modulo 2^{32} (that is, we take clp2(x) to be 0 for x > 2^{31}). The functions are tabulated below for a few values of x.

Functions flp2 and clp2 are connected by the relations shown below. These can be used to compute one from the other, subject to the indicated restrictions.

The round-up and round-down functions can be computed quite easily with the number of leading zeros instruction, as shown below. However, for these relations to hold for x = 0 and x > 2^{31}, the computer must have its shift instructions defined to produce 0 for shift amounts of -1, 32, and 63. Many machines (e.g., PowerPC) have “mod 64” shifts, which do this. In the case of -1, it is adequate if the machine shifts in the opposite direction (that is, a shift left of -1 becomes a shift right of 1).

Figure 3-1 illustrates a branch-free algorithm that might be useful if number of leading zeros is not available. This algorithm is based on right-propagating the leftmost 1-bit, and executes in 12 instructions.

unsigned flp2(unsigned x) { x = x | (x >> 1); x = x | (x >> 2); x = x | (x >> 4); x = x | (x >> 8); x = x | (x >>16); return x - (x >> 1); } |

Figure 3-2 shows two simple loops that compute the same function. All variables are unsigned integers. The loop on the right keeps turning off the rightmost 1-bit of x until x = 0, and then returns the previous value of x.

y = 0x80000000; do { while (y > x) y = x; y = y >> 1; x = x & (x - 1); return } while(x != 0); return y; |

The loop on the left executes in 4nlz(x) + 3 instructions. The loop on the right, for x ≠ 0, executes in 4pop(x) instructions,
^{[1]}
if the comparison to 0 is zero-cost.

^{[1]}pop(x) is the number of 1-bits in x.

The right-propagation trick yields a good algorithm for rounding up to the next power of 2. This algorithm, shown in Figure 3-3, is branch-free and runs in 12 instructions.

unsigned clp2(unsigned x) { x = x - 1; x = x | (x >> 1); x = x | (x >> 2); x = x | (x >> 4); x = x | (x >> 8); x = x | (x >>16); return x + 1; } |

An attempt to compute this with the obvious loop does not work out very well:

y = 1; while (y < x) // Unsigned comparison. y = 2*y; return y;

This code returns 1 for x = 0, which is probably not what you want, loops forever for x ≥ 2^{31}, and executes in 4n +3 instructions, where n is the power of 2 of the returned integer. Thus, it is slower than the branch-free code, in terms of instructions executed, for n ≥ 3 (x ≥ 8).