Prev | Next |
Why Algorithm Design Matters?
This course focuses on designing "good" algorithms and data structures. Basic knowledge of one programming language such as C, C++ or Java is assumed.
A warm up
1) How much time does an HelloWorld program take to run?
Best case time = Worst case time = Average case time = c
for some constant c. More specifically 1 time unit.
2) Given an array A[] of size n finding out if a number x exists in A[]?
Linear search.
- Best case time = 1 (The first element equals x)
- Worst case time = n (Last element or no element equals x)
- Average case time = (1 + 2 + ... + n)/n = n(n+1)/2n = (n+1)/2
(i.e. assuming the x can be present in each of the n positions with equal probability)
3) Will it help if the array is sorted?
It won't if we start searching from the beginning. We need to use a different strategy.
What is the strategy?
Binary search!
- Step 1: Check x against the middle element of the array A[]
- Step 2: If successful, return true
- Step 3: If not, check
- if x < middle element, search for x in 1st half of A[]
- if x > middle element, search for x in 2nd half of A[]
- Repeat the steps until x is found or no A[] cannot be divided further.
Advantage: At every state/iteration, the search space is reduced by half. So the worst case time reduces to log n.
To summarize,
- Best case time = 1
- Worst case time = log n (base 2)
- Average case time = (log n + 1)/2 (approximately)
What difference does it make?
Consider the case when n = 10^9 (which is common in today's real-life applications.
Worst case time = 10^9 with first strategy (linear search)
Worst case time = log 10^9 with second one (binary search)
= log (10^3)^3
= log (2^10)^3 (approximately 10^3 = 2^10)
= log (2^30)
= 30 log 2
= 30
10^9 vs 30. Algorithm design matters!!!
4) Given an array of size n, how do you find the max element?
We have to make n-1 comparisons to find the max element.
- Step 1: Assume max = A[1]
- Step 2: Iteratively check if A[2], ..., A[n] is > max
If so, reassign max = A[i]
Best case time = Worst case time = Average case time = n
5) Does it help if the array is sorted?
Yes. We can pick the last element right away.
Best case time = Worst case time = Average case time = 1
6) How long does it take to pick the max 2 elements?
If the list is sorted, pick the last and second last.
Best case time = Worst cast time = Average case time = 2
7) What if the list is not sorted?
- To pick the max element : n-1 comparisons
- Remove the max element from the list.
- To pick the second max : n-2 comparisons
Total time taken = (n-1) + (n-2) = 2n-3 comparisons
8) Can we do better?
Yes. With a different strategy. See the example below.
Consider the list {8, 4, 6, 2, 1, 7, 5, 3}. Find the max element as follows.
8 4 6 2 1 7 5 3 | | | | | | | | |_______| |_______| |_______| |_______| | | | | | | | | 8 6 7 5 | | | | |_______________| |_______________| | | | | 8 7 | | |________________________________| | | 8
Now, to find the second max, we don't have to do n-2 comparisons. It is enough we compare the numbers 4, 6 and 7. Why?
Because, second max would have only lost to max at some stage. Max was compared to other elements only in 3 stages. So, pick only those 3 elements and find the largest among them. In other words, we make only log n (i.e. log 8 = 3) comparisons to find the second max instead of the n-2 comparisons.
Total comparisons = (n - 1) + log n = n + log n - 1
Certainly better than 2n - 3 !!
9) How long does it take to compute GCD? Lets take an example: GCD(96,60)
School book approach:
2|96 2|60 2|48 2|30 2|24 3|15 2|12 5 2|6 3
Common prime divisors = 2, 2, 3
GCD = 2 x 2 x 3 = 12
Well, what exactly was our algorithm?
- Break the first number into its prime factors
- Break the second number into its prime factors
- Find the common factors
- Multiply them
10) Is there a better approach?
Yes. The 2000 year old Euclid algorithm.
---------------------------------- Step m n m%n (reminder) ---------------------------------- 1 96 60 36 Note: Stop when reminder is 0 2 60 36 24 3 36 24 [12] 4 24 12 0 [STOP] ----------------------------------
The reminder of the last-but-one step is the GCD.
Note, that the number of steps required varies and is not a constant. But certainly it is faster than school book approach by several fold.
So what is the conclusion?
It is important to design "good" algorithms. And we also need a framework to analyze and
judge objectively what is "good". In this course, we foucs on common design strategies that
are proved to be applicable to many scenarios (but not all). Primary basis for analytical
framework would be
Important Problem Types: sorting, selection, searching, string processing, task scheduling, optimization problems, graph problems, combinatorial, numerical and geometric problems.
Algorithm Design Strategies: brute-force, divide-and-conquer, greedy technique dynamic programming, backtracking, branch-and-bound, randomized approaches, specialized strategies.
Data structures: Stack, Queue, List, Trees, Priority queues, Set, Map, Binary Search Trees, Graphs.
Analysis framework Asymptotic notations, worst case analysis, average case analysis, solving recurrence equations, Proof techniques, Amortization, NP completeness.