Prev | Next |
Sorting
Sorting is a fundamental necessity in many application scenarios. Sorting makes further processing easier. For example, binary search is possible only when you have a sorted list. In this and the next few lectures we will discuss eight different ways to sort. The sorting methods are characterized by their (i) running time, (ii) space requirement, (iii) whether it can be done in-place and (iv) if it is stable.
Time and space complexity have already been introduced. An algorithm is called in-place if it can sort without using additional space. An O(1) extra space can be used but the additional space requirement should not grow with the input size.
An algorithm is stable if it preserves the order in case there are elements that are same. For example, if the input array is {2a, 3, 1, 2b}, the algorithm must guarantee that in the sorted array 2a remains ahead of 2b.
Sorting requires comparison of elements and hence the time complexity is a measure of number of comparisons required. The elements to be sorted are usually stored in an array. The size of the array refers to the input size. Conventionally the input array is denoted by A[0..n-1]. For simplicity, we assume that each element occupies O(1) space. This need not be true always. For example, the elements could be strings with a size of m. or objects of an user-defined class. The techniques discussed here are generic.
Quadratic-time Sorting Methods
In this section we discuss three sorting algorithms that takes Θ(n2) time.
- 1. Bubble sort
- 2. Selection sort
- 3. Insertion sort
1. Bubble sort
A brute-force approach to sort. This takes n-1 iterations to sort. In each iteration, successive elements are compared and swapped if necessary. At the end of the first iteration, the largest element moves to the last position. At the end of second iteration, the second largest element moves to the second last position. So on and so forth. The algorithm is given below.
// Algorithm: Bubblesort bubblesort(Input A[0..n-1]) { for (i = 0; i < n-1; i++) for (j = 0; j < n-1-i; j++) if (A[j] > A[j+1]) swap(A[j], A[j+1]); }
The following example demonstates the sorting of the list: {50,20,30,10,40}.
Iteration 1 {50,20,30,10,40} {20,50,30,10,40} {20,30,50,10,40} {20,30,10,50,40} {20,30,10,40,50} (n-1 comparisons + atmost n-1 swaps) |
Iteration 2 {20,30,10,40,50} {20,30,10,40,50} {20,10,30,40,50} {20,10,30,40,50} (n-2 comparisons + atmost n-2 swaps) |
Iteration 3 {20,10,30,40,50} {10,20,30,40,50} {10,20,30,40,50} (n-3 comparisons + atmost n-2 swaps) |
Iteration 4 {10,20,30,40,50} {10,20,30,40,50} (n-4 comparisons + atmost n-4 swaps) |
Analysis of bubble sort
(a) Running TimeIn each iteration, n-1-i comparisons and at most as many swaps are done. Since i ranges from 0 to n-2, (n-1) + (n-2) + ... + 1 comparisons and swaps (or inversions) are done. So the worst case time complexity is determined as:
- T(n) = 2(n-1) + 2(n-2) + ..... + 2(1) = 2[n(n-1)/2] = Θ(n2)
Note that even if the number of swaps are lesser, it does not reduce the time complexity since n-1-i comparisons are necessary anyway.
Is there a benefit if input array A[] is already sorted?
No. The worst case cannot be improved since the number of comparisons remains the same.
- T(n)bc = (n-1) + (n-2) + ..... + (1) = [n(n-1)/2] = Θ(n2).
No extra space is required to sort. Ofcourse, we need a 'temp' storage to swap two elements. But the 'temp' takes O(1) space and does not increase with the input size n.
- S(n) = n+1 = Θ(n)
Since no extra space/memory is required the algorithm is in-place.
(d) Stable or notBubble sort is stable. The if (A[j] > A[j+1]) condition ensures that when the successive entries are same no inversions will happen. A swap is done only if an element is clearly greater than the next element.
Note that if > is replaced by >= in the condition, the algorithm becomes unstable. Since the programmer can implement in a stable manner, bubble sort is stable.
OptimizationIf during an iteration, no swaps were performed, then we know that the list is sorted. In such a case, we can break out of the loop abandoning rest of the iterations. While this optimization does not improve the worst case running time, it can speed up the algorithm for some corner cases. For instance, if the input list is already sorted, after the first iteration, the algorithm terminates resulting in Θ(n) running time. However, such cases are rare.
// Algorithm: ImprovedBubblesort improvedbubblesort(Input A[0..n-1]) { for (i = 0; i < n-1; i++) flag = false; // set at the start of every iteration for (j = 0; j < n-1-i; j++) if (A[j] > A[j+1]) swap(A[j], A[j+1]); flag = true; // A swap is done during the iteration if (!flag) // If flag still remains 0 at the return; // end of the iteration break out }
Exercise: Implement the recursive version of bubblesort. The inner loop can be replaced by a recursive code. Extract the recurrence equation and solve it using Master's method to determine the time complexity.
2. Selection Sort
Selection sort employs a slightly different strategy. It traverses the array, finds the largest element and exchanges it with the one at the last position. In the second iteration, it swaps the second largest with the second last position. So on and so forth. As the iterations proceed, the sorted part (right) of the array grows in size while the non-sorted part (left) diminishes. The algorithm is given below.
// Algorithm: SelectionSort selectionsort(Input A[0..n-1]) { for (i = 0; i < n-1; i++) { max = A[0]; lastpos = n-1-i for (j = 1; j < lastpos; j++) { if (A[j] > max) max = A[j]; maxpos = j; } swap(maxpos, lastpos); } }
The algorithm finishes in n-1 iterations as each of the n-1 elements get into their respective positions resulting in the first element to automatically move to its position. The statements 5 ~ 11 including the inner loop determines the max element and its position so that the elements could be swapped.
The following example demonstrates the working of the algorithm for the input array {50,30,40,10,20}.
Iteration i = 0 i = 1 i = 2 i = 3 i = 4 |
State of the array {50,30,40,10,20} {20,30,40,10,50} {20,30,10,40,50} {20,10,30,40,50} {10,20,30,40,50} |
Computation max = 50; maxpos = 0; lastpos = 4; max = 40, maxpos = 2; lastpos = 3; max = 30, maxpos = 1; lastpos = 2; max = 20, maxpos = 0, lastpos = 1; |
Analysis of Selection Sort
(a) Running TimeIn each iteration n-1-i comparisons are done and exactly one swap is done. The running time is given by
- T(n) = [(n-1) + 1] + [(n-2) + 1] + ... + [1 + 1] = n + (n-1) + ... + 2 = [n(n+1)/2] - 1 = Θ(n2)
Even if the input array is sorted, n-1-i comparisons has to be done in each iteration although swapping is not necessary. Hence , it wuld not affect the time complexity in a significant way. It would still be Θ(n2).
(b) Memory requirementNeeds no extra space other than the array itself. Hence S(n) = Θ(n)
(c) In-place or notIn-place for the above mentioned reason
(d) Stable or not- Stable if implemented with '>' condition for swap.
Note: Unlike bubble sort, it is not possible to short circuit the number of iterations, if the list is already sorted.
Selection Sort is preferred over bubble sort since number of swaps required is atmost (n-1) as against n(n-1)/2 swaps for bubble sort. For this reason, it runs faster.
3. Insertion Sort
The insertion sort works based on the following strategy: Pick each element from second position to the last, compare against the previous elements one-by-one and insert it at the appropriate position. The algorithm is given below.
// Insertion Sort insertionsort(Input A[0..n-1]) { for (i = 1; i < n; i++) { temp = A[i]; for (j = i-1; j >= 0; j++) { if (temp > A[j]) // Time to insert, break out of inner loop break; } RightShiftByOnePosition(A[j+1..i-1]); A[j+1] = temp; } }
The following example demonstrates the working of insertion sort for the array {5,2,4,6,1,3}.
Iteration i = 1 i = 2 i = 3 i = 4 i = 5 |
State of the array {5,2,4,6,1,3} {2,5,4,6,1,3} {2,4,5,6,1,3} {2,4,5,6,1,3} {1,2,4,5,6,3} {1,2,3,4,5,6} |
Computation temp = 2; 2 < 5; Right shift 5; A[0] = 2; temp = 4; 4 < 5, 4 > 2; Right shift 5; A[1] = 4 temp = 6; 6 > 5; temp = 1; 1 < 6, 1 < 5, 1 < 4, 1 < 2; Right shift 2, 4, 5, 6; A[0] = 1 temp = 3; 3 < 6, 3 < 5, 3 < 4, 3 > 2; Right shift 4, 5, 6; A[2] = 3 |
Observe that after 'i' iterations, the first (i+1) elements are in a sorted state. However, the first (i+1) elements need not neccessarily be smallest (i+1) items.
Analysis of Insertion Sort
(a) Running TimeDuring each iteration, at most i comparisons needs to be done and as many number of right shift operations is necessary. Hence, the worst case time complexity is given by
- T(n) = [1 + 2 +......+ (n-1)]*2 = n(n-1) = Θ(n2)
Is there an advantage if the input list is sorted?
Yes. If the input list is sorted, the first comparison fails and no shifts need to done in each iteration. Hence, the running time is n-1 comparisons + 0 shifts.
- T(n)Bestcase = [1 + 1 + 1.....+ 1] = n-1 = Θ(n)
In general, insertion sort is good if the input list is in a nearly sorted state. In scenarios where the list dynamically increases and decreases in size, this sorting technique can be useful in maintaining the sorted list with minimal operations.
In practical settings, insertion sort performs faster than many algorithms.
(b) Memory requirementThe entire sorting can be done without the need of additional space. Hence, S(n) = Θ(n).
(c) In-place or notIn-place for the above mentioned reason.
(d) Stable or notThe if (A[i] > A[j]) condition does not ensure that the algorithm is stable. An element is not brought forward only if it is strictly greater. The condition needs to be changed to if (A[i] >= A[j]) in order to ensure that no element is brought forward unless it is strictly lesser than the element against it which is compared.
Exercises:
1. Will use of binary search to identify the position to insert help in improving the time complexity?
Justify.
2. Does the use of linked list along with the binary search improve the time complexity? Also, discuss the
practical implications?