Introduction to Analysis of Algorithms

by William Shoaff with lots of help


Contents

You can download a postscript version of this file (which is prettier) at

http://www.cs.fit.edu/%7Ewds/classes/algorithms/Intro/intro.pdf

Course Organization

Course Overview

The study of algorithms and the resources (time and space) they use. Also, the design of algorithms (data structures, methods, recurring patterns).

Properties of algorithms

From Knuth [2]:

1.
Finiteness -- an algorithm must terminate after a finite number of steps
2.
Definiteness -- each step must be precisely defined
3.
Input -- an algorithm has zero or more inputs
4.
Output -- an algorithm has one or more outputs
5.
Effective -- all operations can be carried out exactly an in finite time

Models of computation

There are many ways to compute; they are all equivalent in some sense.

1.
Turing machines
2.
Random access machines (RAM)
3.
$\lambda$ calculus
4.
Recursive functions
5.
Parallel RAM (PRAM)

For the analysis of algorithms, the RAM model is most often employed. We need to model to discuss the resources consumed or used in the process of executing an algorithm. Note time does not appear to be reusable, but space often is.

Measuring an algorithm's complexity

We will use the uniform cost model of time and space:

There are Other considerations:

An algorithm solves an instance of a problem. There is, in general, one parameter, the input size, denoted by n, which is used to characterize the problem instance. The input size n is the number of registers needed to hold input (data segment size).

Given n, we'd like to find:

1.
the time complexity, denoted T(n), which is the count of operations the algorithm performs on the given input.
2.
the space complexity, denoted S(n), which is the number of memory registers used by the algorithm (stack/heap size, registers)

Note that T(n) and S(n) are relations rather than functions. That is, for different input of of the same size n T(n) and S(n) may provide different answers.

Worst, average, best, and amortized complexity

Complexities usually not measured exactly: big- $O,\,\Omega$, and $\Theta$notation is used.

Worst case

This is the longest time (or most space) that the algorithm will use over all instances of size n. Often this can be represented by a function f(n) such as f(n)=n2or $f(n) = n\lg n$. We write

T(n) = O(f(n))

for the worst case time complexity. Roughly, this means the algorithm will take no more than f(n) operations. Most of an initial course on algorithms is devoted to worst case analysis.

Best case

This is the shortest time (or least space) that the algorithm will use over all instances of size n. Often this can be represented by a function f(n) such as f(n)=n2or $f(n) = n\lg n$. We write

\begin{displaymath}T(n) = \Omega(f(n))\end{displaymath}

for the best case. Roughly, this means the algorithm will take no less than f(n) operations. The best case is seldom interesting.

When the worst and best case performance of an algorithm are the same we can write $T(n) = \Theta(f(n))$. Roughly, this says the algorithm always uses f(n) operations on all instances of size n.

Average case

This is the average time (or space) that the algorithm will use over all instances of size n. It depends on the probability distribution of instances of the problem. The average case is very interesting, but we'll delay a full discussion.

Amortized cost

This is used when a sequence of operations occur, e.g., inserts and deletes in a tree, where the costs vary depending on the operations and their order. For example, some may take a few steps, some many. The amortized cost is very interesting, but we'll most likely not cover it.

Types of algorithms

1.
Off-line algorithms: all input in memory before time starts, want final result
2.
On-line: input arrives at discrete time steps, intermediate result furnished before next input
3.
Real-time: Elapsed time between two inputs (outputs) is a constant O(1)

We will mostly be concerned with off-line algorithms where all of the input is known before the algorithm starts.

Complexity classes

Collections of problems that required roughly the same amount of resources form complexity classes. Here's a list of the most important.

1.
The class P of problems that can be solved in a polynomial number of operations of the input size on a deterministic Turing machine.
2.
The class NP of problems that can be solved in a polynomial number of operations of the input size on a nondeterministic Turing machine.
3.
The class of problems that can be solved in a constant amount of space (there does not seem to be a recognized standard denotation of this class, I'll call it C)
4.
The class L of problems that can be solved in a logarithmic amount of space based on the input size.
5.
The class PSPACE of problems that can be solved in a polynomial amount of space based on the input size.
6.
The class NC of problems that can be solved in poly-logarithmic time on a polynomial number of processors.

There are lots of other complexity classes. All the problems we will study will have algorithms that belong to the time-based class P. For space-based problems we'd like the algorithms to belong to C or L. Using linear space or more space is generally considered inefficient.

Algorithmic paradigms

Often there are large collections of problems that can be solved using the same general techniques or paradigms. A few of the most common are described below.

Brute force

A straightforward approach to solving a problem based on the problem statement and concepts involved. Brute force algorithms are rarely efficient. Example algorithms include:

Divide-and-conquer

Perhaps the most famous algorithmic paradigm, divide-and-conquer is based on partitioning the problem into two or more smaller sub-problems, solving them (using recursion, or if they are simple enough, directly), and combining the sub-problem solutions into a solution for the original problem. Example algorithms include:

Greedy algorithms

Greedy algorithms always make the choice that seems best at the moment. This locally optimal choice is made with the hope that it leads to a globally optimal solution. Some greedy algorithms may not be guaranteed to always produce an optimal solution.

Greedy algorithms are often applied to combinatorial optimization problems.

Example algorithms include:

Dynamic programming

Richard Bellman [1] is credited for developing dynamic programming. A nutshell definition of dynamic programming is difficult, but to summarize, problems which lend themselves to a dynamic programming attack have the following characteristics:

Notice that we're again solving a combinatorial optimization problem.

Dynamic programming algorithms have the following features:

Example algorithms include:

Local search

Local search is also applied to combinatorial optimization problems. A local search algorithm starts with some initial solution and iteratively searchs a neighborhood for a better solution. Some general classes of local search methods are:

The classic local search algorithm is Newton's rootfinding method.

Tools of the trade

To analyze algorithms we will need to learn how to count using summataions and recurrence relations. Basic sums and series will be studied next.

Problems

Problem 1:

How would you express the fact that the time complexity T(n) of any comparison based sort on a sequential machine requires at least $cn\lg n$ operations for some constant c?

Problem 2:

What is the time complexity of each of the algorithms below? What is the space complexity of each of the algorithms below?

Problem 3:

Classify the algorithms above as belonging to one of the paradigms discussed.

Bibliography

1
RIchard Bellman.
Dynamic Programming.
Princeton University Press, 1957.

2
Donald E. Knuth.
The Art of Computer Programming: Fundamental Algorithms, volume 1.
Addison-Wesley, third edition, 1997.



William Shoaff
2000-08-30