More interesting from the GPU context is a merge sort. And it’s really interesting to discuss how to map this efficiently to a GPU because it’s a great instance of a divide-and-conquer approach to the problem.

So here’s a tree that represents our merge sort, and what’s particularly interesting about mapping this to a GPU is that the problem starts off with tons of little tiny parallel problems, and then as the algorithm proceeds, we eventually end up with only 1 very large problem to solve, and that’s the final merge.

This is more challenging many of the algorithms we’ve discussed, where we have the luxury of being able to solve lots of little parallel problems independently throughout the whole computation.

So what we do at each stage of this tree is the same. The only operation that we need is merging 2 sorted lists together into 1 sorted list.

We begin with n items, which we treat as n sorted 1-element lists, then we run n over 2 merges to create n over 2, sorted 2-element lists. Then we run n over 4 merges to create n over 4, sorted 4-element lists, and so on.

Overall this is log n stages, and in each stage, we merge a total of n elements, so the overall number of operations that were complexity is in order of n log n.

This algorithmic exposes a lot of parallelism within each step, each individual merge: this merge, this merge, this merge, this merge, and so on can proceed in parallel.

Now the hard part about mapping this to the GPU is that the number and size of merge as we do on each step differs greatly between the first step and the last. So I’m going to talk about a GPU implementation that has 3 stages.

The first stage, this blue stage here, is merging a very large number of very short sequences. Because we have many, many tasks and each is very small, we might choose to assign 1 merge to 1 thread, which can perform each individual merge using a serial algorithm on that thread.

We might get better memory coalescing performance if we use shared memories as a staging area to read many input elements or store many output elements at the same time.

I’d say it’s more common, though, at the start of a large merge sort, to just sort a block of elements, say, 1024 within shared memory, and then start the merge sort with sorted chunks of size 1024.

In other words, if we’re actually doing this in practice, we’re probably going to use a different algorithm to do this blue stage here with lots of little tiny tasks and then use merge sort to do the last 2 stages here, and we’re going to see a good algorithm for an in block sort after we finish the discussion on merge sort.

```
MergeSort(arr[], l, r)
If r > l
1. Find the middle point to divide the array into two halves:
middle m = (l+r)/2
2. Call mergeSort for first half:
Call mergeSort(arr, l, m)
3. Call mergeSort for second half:
Call mergeSort(arr, m+1, r)
4. Merge the two halves sorted in step 2 and 3:
Call merge(arr, l, m, r)
```

```
// C++ program for Merge Sort
#include <iostream>
using namespace std;
// Merges two subarrays of arr[].
// First subarray is arr[l..m]
// Second subarray is arr[m+1..r]
void merge(int arr[], int l, int m, int r)
{
int n1 = m - l + 1;
int n2 = r - m;
// Create temp arrays
int L[n1], R[n2];
// Copy data to temp arrays L[] and R[]
for (int i = 0; i < n1; i++)
L[i] = arr[l + i];
for (int j = 0; j < n2; j++)
R[j] = arr[m + 1 + j];
// Merge the temp arrays back into arr[l..r]
// Initial index of first subarray
int i = 0;
// Initial index of second subarray
int j = 0;
// Initial index of merged subarray
int k = l;
while (i < n1 && j < n2) {
if (L[i] <= R[j]) {
arr[k] = L[i];
i++;
}
else {
arr[k] = R[j];
j++;
}
k++;
}
// Copy the remaining elements of
// L[], if there are any
while (i < n1) {
arr[k] = L[i];
i++;
k++;
}
while (j < n2) {
arr[k] = R[j];
j++;
k++;
}
}
void mergeSort(int arr[],int l,int r){
if(l>=r){
return;//returns recursively
}
int m = (l+r-1)/2;
mergeSort(arr,l,m);
mergeSort(arr,m+1,r);
merge(arr,l,m,r);
}
void printArray(int A[], int size)
{
for (int i = 0; i < size; i++)
cout << A[i] << " ";
}
int main()
{
int arr[] = { 12, 11, 13, 5, 6, 7 };
int arr_size = sizeof(arr) / sizeof(arr[0]);
cout << "Given array is \n";
printArray(arr, arr_size);
mergeSort(arr, 0, arr_size - 1);
cout << "\nSorted array is \n";
printArray(arr, arr_size);
return 0;
}
```

Now we move on to stage 2. Now we have lots of small sorted blocks, and we need to merge these small sorted blocks together. On the GPU for these intermediate merges, we would usually assign 1 merge to 1 thread block.

Now, the obvious way to merge 2 sorted sequences is a serial algorithm. So let’s take a little bit of a closer look at the algorithm that we choose to use here, and we’ll come back to this diagram a little bit later.

The obvious way to merge 2 sorted sequences is a serial algorithm, and here’s our little serial processor here. The input to this algorithm is 2 sorted sequences, and the output is 1 sorted sequence.

And so the algorithm is that the serial processor looks at the head of each one of the sorted sequences, chooses whichever element is smaller, outputs it on to the tail of the output sequence, and then advances the input sequence from which he took the previous element.

However, this would be a poor match for the GPU because this is a serial algorithm and it wouldn’t keep all of our hardware busy. So it’s instructive to look another way, a better way, a more parallel way to merge 2 sorted sequences.

## Merge Sort C++ STL

C++ offers in its STL library a merge() which is quite useful to merge sort two containers into a single container.

Sets are a type of associative containers in which each element has to be unique, because the value of the element identifies it. The value of the element cannot be modified once it is added to the set, though it is possible to remove and add the modified value of that element.

The values always remain in a sorted order. To insert a value into a set, we use the insert function. If that value already exists in the set, it won’t be added. If it doesn’t exist, it will be inserted and the container will be sorted.

Here, I am inserting a few elements into a set and printing the set using iterators. Just like vectors, we can get the first and the last iterator using begin and end function.

Let’s run this code, we can see that all the elements are unique and sorted. We can use the size, max_size, empty functions like we do in vectors. Like this.

Let’s run this code, we can see that the output is as expected. We can remove elements from the set using erase function. There are two ways of doing so, first by directly passing the value to be deleted. Like, here I am deleting 100.

We can also delete elements by passing its iterator. Like, here, I am deleting all elements from beginning till 30. Let’s run this code, we can see that first 100 is deleted, then all the values less than 30 are deleted. To clear the entire set, we can use the clear function.

Let’s run this code, we can see that the size of set is 0 now. To create a set that stores values in reversed, we can do it like this, now the set will work in the similar fashion just the only difference being the values will be stored in a reverse order.

Let’s run this code, we can see that the set is reversed sorted.

## Merge Sort C++ Vector

### Implementation

```
I am using C++ to implement the mergesort algorithm using vectors instead of arrays. I worked step by step from the original algorithm but when I compile, I get no output or error messages. I think there is an issue with my "merge" function but I cannot find it. I am new to sorting algorithms so if there are any fundamental misunderstandings or errors in my code please explain them to me.
#include <iostream>
#include <vector>
using namespace std;
void mergeSort(vector<int>& numvec, int left, int right){
if(left < right){
int middle = left + (right - left) / 2;
mergeSort(numvec, left, middle);
mergeSort(numvec, middle + 1, right);
merge(numvec, left, middle, right);
}
}
int merge(vector<int>& numvec , int left, int mid, int right){
int i, j, k, temp[right-left+1];
i = left;
j = right;
k = mid + 1;
while(i <= mid && j <= right){
if(numvec[i] < numvec[j])
{
temp[k] = numvec[i];
k++;
i++;
}
else
{temp[k] = numvec[j];
k++;
j++;
}
while(i <= mid){
temp[k] = numvec[i];
k++;
i++;
}
while( j <= right){
temp[k] = numvec[j];
}
for(i = left; i <= right; i++){
numvec[i] = temp[i - left];
}
}
}
```

*there is not need to create new space at each call of merge. std::vector<int> L(&array[0], &array[lo]); will actually create space*

## Merge Sort Complexity

**Merge Sort** is a stable **sort** which means that the same element in an array maintain their original positions with respect to each other. Overall time **complexity** of **Merge sort** is O(nLogn). It is more efficient as it is in worst case also the runtime is O(nlogn) The space **complexity** of **Merge sort** is O(n).

The run time complexity of merge sort so this is really the meat of merge sort so what’s the idea here the idea is you sort using your recursive call to merge sort.

you sort the first couple valve and you sort a second cup of L and then you merge the two together using merge so merge is the function that they send two sorted lists and it returns a sorted version of both of the firsts together step together and it’s sorted and it can do it.

Length of a 1 plus length of l2 time like we saw in lecture now the length of this is n over 2 and the length of this sent over to so the runtime of the code to merge here is open all of the length of L.

So merge this merge runs in o of n where n is the length about so if you break the list up into halves and then assume that the halves are sorted so you run merge sort to make sure that this house is worth in this half is sorted you get distorted versions then to run merge that’s oh that where n is the length of the list.

Now we’ll just say that rather than just saying that it’s all bad we’ll say assume that merge sort and so on takes eight and operations so we get to do that because we know that it runs in open time that means it performs a number of operations that’s proportional to K and.

## Merge Sort Algorithm Steps

**Merge Sort**

- Divide the unsorted list into sublists, each containing element.
- Take adjacent pairs of two singleton lists and
**merge**them to form a list of 2 elements. N. will now convert into lists of size 2. - Repeat the process till a single
**sorted**list of obtained.

We will see the recursive algorithm for merge sort. This algorithm usage divide and conquer technique. So let’s see what this algorithm is. First the list to be sorted is divided into two sublists of almost equal size.

Then the left sublist is sorted recursively using merge sort. Then the right sublist is recursively sorted using merge sort. After that the two sorted sublists are merged to get the final sorted list.

The terminating condition for a recursion is the sublist form should contain only one element. Because we can consider a one element sublist as sorted list. So now let’s take an example and see how this algorithm works. This is a list of 11 elements.

It is divided into these two sublists. This one contains 6 elements and this one contains 5 elements. These two sublists are to be sorted recursively using merge sort.

So this is divided into two. This is divided into two. Now here we get a sublist of only one element and we can consider it as a sorted sublist. So the recursion will stop here.

This is again divided into two. Here also the recursion stops and now these two sorted sublists are merged. So we will get this list. You can see that this list has become sorted. Now this sorted list and this sorted list both of them are merged.

So now we can see that this sublist has become sorted. Now we come to this sublist, this is divided, this is divided. These two sorted sublists are merged. Then this sorted list and this sorted list, are merged.

So we can see that this sublist has become sorted. Now these two sorted sublists are merged and we can see that this list has become sorted. Now we come to this sublist. This is divided into two. This is divided into two. This is divided.

These two sorted lists are merged, then this and this are merged, so this list becomes sorted. This is divided and merged. Now this sorted list and this sorted lists are merged. So we can see that this sublist has become sorted. Now these two sorted sublists are merged to get the final sorted list.