Prev Next

Selection

The selection problem is concerned with finding the kth ranked element in a given array A of size n. Selection is trivial if the array is already sorted accessing the element at index k gives the kth ranked element instantly. We assume that the array is not sorted already.

  • k = 1: Minimum
  • k = 2: 2nd Minimum
  • k = n/2: Median
  • k = n-1: 2nd Maximum
  • k = n: Max element
  • In general, k could be any value between 1 and n

In Why algorithm design matters? we already saw how to use divide-and-conquer technique to find the max and 2nd max element. We found that although divide-and-conquer did not help in determining the max any faster than the brute-force approach (of scanning left to right), it did help in reducing the the time taken to find the 2nd max. i.e. from 2n - 3 to n + logn - 1. Finding min and 2nd min are analogous.

Finding max and min

Consider the problem of finding max and min. As noted previously, max takes n-1 comparisons. However, this information cannot be used find the min faster (unlike determining 2nd max). Hence, it takes n-2 comparisons again to find the min, so making it an overall 2n-3 comparisons. Is it possible to do better?

Yes. 2n-3 can be reduced to 1.5n-2. Although, both are linear, the latter is better than the former.

Key idea: Divide the array recursively until the size is 2 (base case). At this stage, by making a single comparison, both max and min can be determined. From this point onwards, as the recursion reverts, max of left subarray is compared with the max of the right subarray. Similarly, min of left and right subarray can be determined. Whichever is bigger (or smaller) is the new max (or min). Thus, two comparisons between the left and right subarrays (regardless of their size).

           {5     3     8     4     1     7     2     6}          n = 8
                   /                           \
                  /                             \
        {5    3    8    4}               {1    7    2    6}       n = 4
           /          \                     /          \
          /            \                   /            \
      {5   3}        {8   4}           {1   7}        {2   6}     n = 2 (base case)
    -----------------------------------------------------------
      (5 < 3)?       (8 < 4)?          (1 < 7)        (2 < 6)?    4 * 1 = 4 comparisons
    max=5,min=3    max=8,min=4       max=7,min=1    max=6,min=2
            \         /                    \         /
             \       /                      \       /
         (5 < 8)?  (3 < 4)?              (7 < 6)?  (1 < 2)?       2 * 2 = 4 comparisons
          max=8     min=3                 max=7     min=1
                         \               /
                          \             /
                        (8 < 7)?  (3 < 1)?                        1 * 2 = 2 comparisons
                         max=8     min=1

Total number of comparisons made is 4 + 4 + 2 = 10 = 12 - 2 = 1.5n - 2

Finding the median

To find the max it is sufficient to make one comparison at any point. This was possible since max has a special property that it is larger than every other element of the array. But for computing the median, it is not easy since it is smaller than one half of the elements and larger than the rest. So, one comparison will not suffice. We need a different strategy. Hence, we try to solve the general problem of finding the kth ranked element and use it for median too.

Finding the kth ranked element

A brute-force approach to find the kth element is to sort the array and pick the element at index k. This takes O(nlogn) time. But wait! Do we really need to sort the entire array? Can't we stop after some iterations? If we were to do so, which sorting algorithms would be suitable?

Bubble sort & Selection sort: After each iteration of the outer loop, the ith largest element comes to (n-i)th position. To determine kth (smallest) element, bubble sort can be stopped after n-k iterations and (n-k)th element can be returned. This results in O(kn) time complexity. Since k can be any value from 1 to n, it is still O(n2).

Insertion sort: After each iteration of the outer loop, insertion sort ensures that the first i+1 elements are in a sorted state. However, these elements need not the smallest i+1 elements. Hence, the iterations cannot be stopped until the entire sorting completes. The time complexity is O(n2).

Merge sort: In merge sort there is no guarantee that the kth element will come to its position after certain number iterations. The entire sorting has to complete to pick the kth element resuling in the time complexity of O(nlogn).

Heap sort: Since max element is at the root of the heap, the heap sort algorithm can be stopped after n-k-1 iterations and the root can be picked to get the kth element. This takes O(klogn) time making it the most suited among the algorithms discussed thus far.

Quick sort: It is worthwhile to explore quick sort since it has got some magical properties. Firstly, it has some 'randomness' in its way of working. Secondly, the size of the partitions can indicate where we need to search.

  • Lets say after the first iteration, the size of left partition is k-1, then we are done. The pivot is the kth element and can be returned.
  • If the size of the left partition is greater than k-1, then we need to search for kth element in the left partition.
  • If the size of the left partition is less than k-1, say m, then we need to search (k-m-1)th element in the right partition.

Determining the time complexity is not straightforward. The best case is n if pivot were to be that element. The worst case, of course, can be n2 if we were to pick a bad pivot k or n-k times, whichever is bigger. As noted in quick sort, worst case happens very rarely. Hence, we are interested in finding the average case. It turns out that the average case works out to O(n). For the actual work out of the average case, standard books on algorithms can be referred.