In this section, we present some results concerning the practical efficiency of the data structures presented in this chapter. While time complexities are a great tool, they do not always tell the whole truth about the efficiency, so it is worthwhile to also do experiments with real implementations and data sets.
Set Versus Sorting:
Many problems can be solved using either sets or sorting. It is important to realize that algorithms that use sorting are usually much faster, even if this is not evident by just looking at the time complexities.
As an example, consider the problem of calculating the number of unique elements in a vector. One way to solve the problem is to add all the elements to a set and return the size of the set. Since it is not needed to maintain the order of the elements, we may use either a set or an unordered_set. Then, another way to solve the
problem is to first sort the vector and then go through its elements. It is easy to count the number of unique elements after sorting the vector.
Shows the results of an experiment where the above algorithms were tested using random vectors of int values. It turns out that the unordered_set
The results of an experiment where the number of unique elements in a vector was calculated. The first two algorithms insert the elements to a set structure, while the last algorithm sorts the vector and inspects consecutive elements
The results of an experiment where the most frequent value in a vector was determined. The two first algorithms use map structures, and the last algorithm uses an ordinary array
algorithm is about two times faster than the set algorithm, and the sorting algorithm is more than ten times faster than the set algorithm. Note that both the set algorithm and the sorting algorithm work in O(n log n) time; still the latter is much faster. The reason for this is that sorting is a simple operation, while the balanced binary search tree used in set is a complex data structure.
Map Versus Array:
Maps are convenient structures compared to arrays because any indices can be used, but they also have large constant factors. In our next experiment, we created a vector of n random integers between 1 and 106 and then determined the most frequent value by counting the number of each element. First, we used maps, but since the upper bound 106 is quite small, we were also able to use arrays.
Shows the results of the experiment. While unordered_map is about three times faster than map, an array is almost a hundred times faster. Thus, arrays should be used whenever possible instead of maps. Especially, note that while
unordered_map provides O(1) time operations, there are large constant factors hidden in the data structure.
The results of an experiment where elements were added and removed using a multiset and a priority queue
Priority Queue Versus Multiset:
Are priority queues really faster than multisets? To find out this, we conducted another experiment where we created two vectors of n random int numbers. First, we added all elements of the first vector to a data structure. Then, we went through the second vector and repeatedly removed the smallest element from the data structure and added the new element to it.
Shows the results of the experiment. It turns out that in this problem a priority queue is about five times faster than a multiset.