A set is a data structure that maintains a collection of elements. The basic operations of sets are element insertion, search, and removal. Sets are implemented so that all the above operations are efficient, which often allows us to improve on running times of algorithms using sets.

### Sets andMultisets:

The C++ standard library contains two set structures:

- set is based on a balanced binary search tree and its operations work in O(log n)

time. - unordered_set is based on a hash table and its operations work, on average,1

in O(1) time.

Both structures are efficient, and often either of them can be used. Since they are used in the same way, we focus on the set structure in the following examples.

The following code creates a set that contains integers and shows some of its operations. The function insert adds an element to the set, the function count returns the number of occurrences of an element in the set, and the function erase removes an element from the set.

The worst-case time complexity of the operations is O(n), but this is very unlikely to occur.

set<int> s;

s.insert(3);

s.insert(2);

s.insert(5);

cout << s.count(3) << “\n”; // 1

cout << s.count(4) << “\n”; // 0

s.erase(3);

s.insert(4);

cout << s.count(3) << “\n”; // 0

cout << s.count(4) << “\n”; // 1

An important property of sets is that all their elements are distinct. Thus, the function count always returns either 0 (the element is not in the set) or 1 (the element is in the set), and the function insert never adds an element to the set if it is already there. The following code illustrates this:

set<int> s;

s.insert(3);

s.insert(3);

s.insert(3);

cout << s.count(3) << “\n”; // 1

A set can be used mostly like a vector, but it is not possible to access the elements using the [] notation. The following code prints the number of elements in a set and then iterates through the elements:

cout << s.size() << “\n”;

for (auto x : s) {

cout << x << “\n”;

}

The function find(x) returns an iterator that points to an element whose value is x. However, if the set does not contain x, the iterator will be end().

auto it = s.find(x);

if (it == s.end()) {

// x is not found

}

Ordered Sets The main difference between the two C++ set structures is that set is ordered, while unordered_set is not. Thus, if we want to maintain the order of the elements, we have to use the set structure.

For example, consider the problem of finding the smallest and largest value in a set. To do this efficiently, we need to use the set structure. Since the elements are sorted, we can find the smallest and largest value as follows:

auto first = s.begin();

auto last = s.end(); last–;

cout << *first << ” ” << *last << “\n”;

Note that since end() points to an element after the last element, we have to decrease the iterator by one.

The set structure also provides the functions lower_bound(x) and upper_bound(x) that return an iterator to the smallest element in a set whose value is at least or larger than x, respectively. In both the functions, if the requested

element does not exist, the return value is end().

cout << *s.lower_bound(x) << “\n”;

cout << *s.upper_bound(x) << “\n”;

**Multisets** A multiset is a set that can have several copies of the same value. C++ has the structures multiset and unordered_multiset that resemble set and unordered_set. For example, the following code adds three copies of the value

5 to a multiset.

multiset<int> s;

s.insert(5);

s.insert(5);

s.insert(5);

cout << s.count(5) << “\n”; // 3

The function erase removes all copies of a value from a multiset:

s.erase(5);

cout << s.count(5) << “\n”; // 0

Often, only one value should be removed, which can be done as follows:

s.erase(s.find(5));

cout << s.count(5) << “\n”; // 2

Note that the functions count and erase have an additional O(k) factor where k is the number of elements counted/removed. In particular, it is not efficient to count the number of copies of a value in a multiset using the count function.

### Maps:

A map is a set that consists of key-value pairs. A map can also be seen as a generalized array. While the keys in an ordinary array are always consecutive integers 0, 1, . . . , n −1, where n is the size of the array, the keys in a map can be of any data type and they do not have to be consecutive values.

The C++ standard library contains two map structures that correspond to the set structures: map is based on a balanced binary search tree and accessing elements takes O(log n) time, while unordered_map uses hashing and accessing elements takes O(1) time on average.

The following code creates a map whose keys are strings and values are integers:

map<string,int> m;

m[“monkey”] = 4;

m[“banana”] = 3;

m[“harpsichord”] = 9;

cout << m[“banana”] << “\n”; // 3

If the value of a key is requested but the map does not contain it, the key is automatically added to the map with a default value. For example, in the following code, the key “aybabtu” with value 0 is added to the map.

map<string,int> m;

cout << m[“aybabtu”] << “\n”; // 0

The function count checks if a key exists in a map:

if (m.count(“aybabtu”)) {

// key exists

}

Then, the following code prints all keys and values in a map:

for (auto x : m) {

cout << x.first << ” ” << x.second << “\n”;

}

### Priority Queues:

A priority queue is a multiset that supports element insertion and, depending on the type of the queue, retrieval and removal of either the minimum or maximum element.

Insertion and removal take O(log n) time, and retrieval takes O(1) time. A priority queue is usually based on a heap structure, which is a special binary tree.

While a multiset provides all the operations of a priority queue and more, the benefit of using a priority queue is that it has smaller constant factors. Thus, if we only need to efficiently find minimum or maximum elements, it is a good idea to use a priority queue instead of a set or multiset.

By default, the elements in a C++ priority queue are sorted in decreasing order, and it is possible to find and remove the largest element in the queue.

The following code illustrates this:

priority_queue<int> q;

q.push(3);

q.push(5);

q.push(7);

q.push(2);

cout << q.top() << “\n”; // 7

q.pop();

cout << q.top() << “\n”; // 5

q.pop();

q.push(6);

cout << q.top() << “\n”; // 6

q.pop();

If we want to create a priority queue that supports finding and removing the smallest element, we can do it as follows:

priority_queue<int,vector<int>,greater<int>> q;

### Policy-Based Sets:

The g++ compiler also provides some data structures that are not part of the C++ standard library. Such structures are called policy-based structures. To use these structures, the following lines must be added to the code:

#include <ext/pb_ds/assoc_container.hpp>

using namespace __gnu_pbds;

After this, we can define a data structure indexed_set that is like set but can be indexed like an array. The definition for int values is as follows:

typedef tree<int,null_type,less<int>,rb_tree_tag,

tree_order_statistics_node_update> indexed_set;

Then, we can create a set as follows:

indexed_set s;

s.insert(2);

s.insert(3);

s.insert(7);

s.insert(9);

The speciality of this set is that we have access to the indices that the elements would have in a sorted array. The function find_by_order returns an iterator to the element at a given position:

auto x = s.find_by_order(2);

cout << *x << “\n”; // 7

Then, the function order_of_key returns the position of a given element:

cout << s.order_of_key(7) << “\n”; // 2

If the element does not appear in the set, we get the position that the element would have in the set:

cout << s.order_of_key(6) << “\n”; // 2

cout << s.order_of_key(8) << “\n”; // 3

Both the functions work in logarithmic time.