Bit-parallel algorithms are based on the fact that individual bits of numbers can be manipulated in parallel using bit operations. Thus, a way to design an efficient algorithm is to represent the steps of the algorithm so that they can be efficiently implemented using bit operations.

## Hamming Distances:

The Hamming distance hamming(a, b) between two strings a and b of equal length is the number of positions where the strings differ. For example, hamming(01101, 11001) = 2.

Consider the following problem: Given n bit strings, each of length k, calculate the minimum Hamming distance between two strings. For example, the answer for [00111, 01101, 11110] is 2, because

- • hamming(00111, 01101) = 2,
- hamming(00111, 11110) = 3, and
- hamming(01101, 11110) = 3.

A straightforward way to solve the problem is to go through all pairs of strings and calculate their Hamming distances, which yields an O(n2k) time algorithm. The following function calculates the distance between strings a and b:

int hamming(string a, string b) {

int d = 0;

for (int i = 0; i < k; i++) {

if (a[i] != b[i]) d++;

}

return d;

}

However, since the strings consist of bits, we can optimize the solution by storing the strings as integers and calculating distances using bit operations. In particular, if k ≤ 32, we can just store the strings as int values and use the following function to calculate distances:

int hamming(int a, int b) {

return __builtin_popcount(a^b);

}

In the above function, the xor operation constructs a string that has one bits in positions where a and b differ. Then, the number of one bits is calculated using the __builtin_popcount function.

A comparison of running times of the original algorithm and the bit-parallel algorithm on a modern computer. In this problem, the bit-parallel algorithm is about 20 times faster than the original algorithm.

## Counting Subgrids:

As another example, consider the following problem: Given an n × n grid whose each square is either black (1) or white (0), calculate the number of subgrids whose all corners are black. For example, two such subgrids in a grid.

The running times of the algorithms when calculating minimum Hamming distances of n bit strings of length k = 30

There is an O(n3) time algorithm for solving the problem: go through all O(n2) pairs of rows, and for each pair (a, b) calculate, in O(n) time, the number of columns that contain a black square in both rows a and b. The following code assumes that color[y][x] denotes the color in row y and column x:

int count = 0;

for (int i = 0; i < n; i++) {

if (color[a][i] == 1 && color[b][i] == 1) {

count++;

}

}

Then, after finding out that there are count columns where both squares are black, we can use the formula count(count − 1)/2 to calculate the number of subgrids whose first row is a and last row is b.

To create a bit-parallel algorithm, we represent each rowk as an n-bit bitset row[k] where one bits denote black squares. Then, we can calculate the number of columns where rows a and b both have black squares using an and operation and counting the number of one bits. This can be conveniently done as follows using bitset structures:

int count = (row[a]&row[b]).count();

A comparison of the original algorithm and the bit-parallel algorithm for different grid sizes. The comparison shows that the bit-parallel algorithm can be up to 30 times faster than the original algorithm.

## Reachability in Graphs:

Given a directed acyclic graph of n nodes, consider the problem of calculating for each node x a value reach(x): the number of nodes that can be reached from node x. For example, Fig. 8.2 shows a graph and its reach values.

The problem can be solved using dynamic programming in O(n2) time by constructing for each node a list of nodes that can be reached from it. Then, to create a bit-parallel algorithm, we represent each list as a bitset of n bits. This permits us to efficiently calculate the union of two such lists using an or operation. Assuming that reach is an array of bitset structures and the graph is stored as adjacency lists in adj, the calculation for node x can be done as follows:

reach[x][x] = 1;

for (auto u : adj[x]) {

reach[x] |= reach[u];

}

Some running times for the bit-parallel algorithm. In each test, the graph has n nodes and 2n random edges a → b where a < b. Note that the algorithm uses a great amount of memory for large values of n. In many contests,

the memory limit may be 512MB or lower.