Divide-and-Conquer Sorting

nlogn Sorting

In this section we discuss two sorting algorithms that takes Θ(nlogn) time. Both use divide-and-conquer technique.

4. Merge sort
5. Quick sort

4. Merge sort

The merge sort algorithm recursively divides the given array into two (almost) equal parts, sorts each half and merges the sorted partition into one. The recursion stops once the partition is small enough (i.e has exactly one element) at which point the recursion rolls back and merging of the sorted lists happen. It is easy to infer that the recursion depth is logn. The algorithm is given below.

  // Algorithm: Merge sort

  mergesort(Input A[low..high]) {	// Initially low = 1, high = n
  	if (low == high)				// Array index starts from 1 to n
  	  return A[low];
  	mid = (low+high)/2;
  	B[1..mid] = mergsort(A[low..mid]);
  	C[1..mid] = mergesort(A[mid+1..high]);
  	return merge(B[0..mid], C[0..mid]);
  }

The following example demonstrates the working of merge sort for the example: {24, 13, 26, 1, 2, 27, 38, 15}

Divide

{24 13 26 1 2 27 38 15}

{24 13 26 1} {2 27 38 15}

{24 13} {26 1} {2 27} {38 15}

{24} {13} {26} {1} {2} {27} {38} {15}

Small enough to conquer now

Conquer (merge)

{13 24} {1 26} {2 27} {15 38}

{1 13 24 26} {2 15 27 38}

{1 2 13 15 24 26 27 38}

Since this is a recursive algorithm, we need to solve the recurrence equation. Before writing out the recurrence equation, we need to get a sense of how long does it take to merge two sorted lists. This can be done in linear time. This is achieved by using two pointers one pointing to the start of each sorted list and advancing them appropriately as the elements are compared.

To see that consider the sorted lists B[] = {1,13,24,26} and C[] = {2,15,27,28}. The pointed elements are denoted by bold.

B[] = {1,13,24,26} C[] = {2,15,27,28} 1 < 2, add 1 to A[], B-ptr++ A[] = {1}

B[] = {1,13,24,26} C[] = {2,15,27,28} 13 > 2, add 2 to A[], C-ptr++ A[] = {1,2}

B[] = {1,13,24,26} C[] = {2,15,27,28} 13 < 15, add 13 to A[], B-ptr++ A[] = {1,2,13}

B[] = {1,13,24,26} C[] = {2,15,27,28} 24 > 15, add 14 to A[], C-ptr++ A[] = {1,2,13,15}

B[] = {1,13,24,26} C[] = {2,15,27,28} 24 < 27, add 24 to A[], B-ptr++ A[] = {1,2,13,15,24}

B[] = {1,13,24,26} C[] = {2,15,27,28} 26 < 27, add 26 to A[], B-ptr++ A[] = {1,2,13,15,24,26}

B[] = {1,13,24,26} C[] = {2,15,27,28} B-ptr completes, add 27, 28 to A[] A[] = {1,2,13,15,24,26,27,28}

The number of comparisons done is at most n-1 where n is the total number of elements in both sorted lists. If all the elements in one list is less than any element in another list, n/2 comparisons are enough. Either way it takes O(n) comparisons. In addition, it takes linear time to add each element to the A[]. i.e. 2n operations. Therefore, the time taken to merge can be treated as n ignoring the constants.

The algorithm for merge is given as follows. Note, array index starts from 1.

	merge(B[1..n/2],C[1..n/2]) {
		i = 1; j = 1; k = 1;
		while (i <= n/2 or j <= n/2) {
			if (B[i] <= C[j])
				A[i+j-1] = B[i]; i++;
			else
				A[i+j-1] = C[j]; j++;
		}
		if (i > n/2) 
			copy C[j..n/2] to A[i+j-1..n];
		else
			copy B[i..n/2] to A[i+j-1..n]; 
	}

Analysis of merge sort

(a) Running Time

The time complexity is given by the following recurrence equation.

T(1) = 1

T(n) = 2T(n/2) + n

Applying Master's method,

₂

Intuitively, it takes logn steps to divide a given array of n elements into n single element array. The merging takes again logn steps and at each level at most n-1 comparisons are done. Thus, the time complexity is Θ(nlogn). In the above example, the number of elements is 8 and therefore the recursion depth is log8 = 3.

Is there a benefit if input array A[] is already sorted?

No. Divide takes logn operations. Merge takes (n/2)* logn and hence the recurrence equation remains the same.

(b) Memory (space) requirement

The merging process requires another array of size n at each step. i.e. merge 2 sorted lists of n/2 size, 4 sorted lists of n/4 size, ... (upto logn steps). Hence the space complexity is given by

S(n) = Θ(nlogn)

However, by managing the indices appropriately and using one temporary array of size n, merging can be done with 2n space. Thus, space complexity can be reduced to Θ(n).

(c) In-place or not

No. Merge sort is not in-place. The merging step requires an extra 'n' space to combine the sorted lists. It is quite tedious to merge without an extra array.

(d) Stable or not

Merge sort is stable. While merging, the if (B[i] <= C[j]) condition ensures that stability is maintained.

Note: Though merge sort is Θ(nlogn) it is hardly ever used for main memory sorts, since merging and the additional work spent in copying to the temporary array and back at each stage has the effect of slowing down the algorithm considerably.

5. Quick sort

Quick sort is the fastest known sorting algorithm in practice. Its average running time is Θ(nlogn). Like merge sort it uses the divide-and-conquer technique, but in a slight different way. It is very fast due to a very tight and highly optimized inner loop.

In quick sort, a pivot is chosen and all the elements that are less than the pivot are moved to left of pivot while the rest of the elements moved to right of pivot. The quick sort is recursively invoked on the left and right partitions. Whenever a partition contains 0 or 1 element, the recursion stops. The algorithm for quick sort is given below.

	// Algorithm: Quick Sort

	quicksort(Input A[1..n]) {
		if (A.size <= 1)
			return A[];
		pivot = A[1];		// Pivot is the first element		
		for (i = 2; i <= n; i++) {
			if (A[i] < pivot)
				Move A[i] to L[]
			else 
				Move A[i] to R[]
		}
		return ( quicksort(L[]), A[1], quicksort(R[]) );
	}

The following example demonstrates the working of the algorithm for {65,13,81,92,43,31,57,26,75,0}.

{13,43,31,26,57,0} {65} {81,92,75}

{0} {13} {43,31,26,57}

{75} {81} {92}

{31,26} {43} {57}

{26} {31} {}

Divide is done. Now conquer by merging the lists in left-to-right order.

{0,13,26,31,43,57,65,75,81,92}

In contrast to merge sort, quick sort does the comparison workduring the 'Divide' phase rather than the 'Conquer' phase. In the 'Divide' phase, the pivot is compared with every other element to decide which partition it has to go.

Some questions arise with respect to the algorithm.

(1) How is the pivot chosen?

The pivot can be any element in the array and can be chosen in an arbitrary manner. It could be the first, last or any element of the list. It could be chosen in a random manner too. However, if we know the nature of the list, a better choice can be made.

(2) Does the choice of pivot affect the efficiency of the algorithm?

Yes. If the pivot chosen happens to be the median, the left and right partitions will be of almost equal size. Choosing the median as the pivot all the way through the recursion results in recursion depth of about logn. On the other hand, if the pivot chosen happens to be the smallest or the largest element of the list althrough, one of the partition will be empty and the other containing n-1 elements. Thus recursion depth will be n-1.

(3) How many comparisions needs to be done overall?

The number of comparisons depends on recursion depth. In the best case, when the recursion depth is logn, n*logn comparisons needs be done. In the worst case, when recursion depth is n-1, n*n comparions are needed.

The running time of the algorithm for the best case is given by the recurrence equation:

_bestcase

The running time of the algorithm for the worst case is given by the recurrence equation:

_worstcase

Here a = 1, b = 1 and f(n) = n. Therefore,

_worstcase

(4) What should be the size of left and right partitions?

Since we don't know in advance how many elements are going to be smaller than (or larger than) the pivot, each partition should be of n-1 size to take care of the worst case. Hence, 2n space needs to be reserved. This results in the worst case space complexity of Θ(n²).

(5) Can quick sort be done in place?

Yes. It is possible to implement in-place quick sort. The following example demonstrates this.

Step 1: Bring the pivot to the first position. In our case, the first element was chosen as the pivot. Hence this step was not necessary. Let's say the element at k^th position is chosen as the pivot, swapping A[1] and A[k] can bring it to the first position.

Step 2: Find the leftmost element greater than pivot and rightmost element smaller than pivot. We will use 2 pointers: left and right. The use of these pointers are as follows:

The left pointer initially points to the second element and successively moved towards right and stops at the position that contains the element greater than pivot.
The right pointer initially points to the last element and successively moved towards left and stops at the position that contains the element smaller than pivot.

Similarly,

Step 3: Exchange the elements at the left and right position. Increment left. Decrement right.

Now exchange A[left] and A[right]. Increment left. Decrement right.

Repeat Step 2 and Step 3 until left == right.

If you observe the state of the array, you can now observe the pivot at the first position, followed by elements smaller than pivot which is further followed by elements greater than pivot.

Step 4: Bring pivot to the middle by exchanging A[1] with A[left]. This gives the rearranged array with left partition, pivot and right partition.

57,13,0,26,43,31

92,75,81

The algorithm sketch is given below.

	// Algorithm: partition

	partition(A[1..n], pivotposition) {

	  swap(A[1],A[pivotposition]);		// Step 1

	  while (true) {					// Steps 2 and 3
		for (left = 2; A[left] < A[1]; left++)
		  ; 							// do nothing
		  if (left == right) break;		// break out of the while loop

		for (right = n; A[right]) > A[1]; right--)
		  ; 							// do nothing
		  if (left == right) break;		// break out of the while loop

			swap(A[left], A[right]); left++; right--;
		}
		swap(A[1], A[left])				// Step 4
	}

(6) Is quick sort stable?

No. It is not. To see this consider the list {3 4a 7 9 2 4b 8}.

Let's say our Partitioning strategy is to move all elements '<=' pivot to left.
Suppose we picked 4b as the pivot at any point in time,we would have
{2 3 4a} {4b} {7 8 9} FINE
However if we picked 4a as the pivot at any point in time,we would have
{2 3 4b} {4a} {7 8 9} Breaks stability
Note that changing the condition from '<=' to '<' does not help since picking 4a as pivot would work fine now, but,picking 4b as pivot would not. Hence, Quick-Sort is Stable

(7) Why is quick sort nlogn then?