William D. Shoaff
Florida Institute of Technology
Many (most, perhaps all) interesting problems in computer science can
be formulated (and sometimes solved) in terms of graphs.
A graph G is a pair (V, E) where
V is a finite set of vertices (or nodes) and
E is a collection edges between vertices.
Edges may be directed or not
(distinguished by the use of ordered pair (a,b) or set
notation).
Often edges are weighted or otherwise labelled;
nodes can also store state information.
Special types of graphs, such as trees, directed acyclic graphs,
and bipartite graphs are important in some applications.
The problems we want to solve most often involve construction of a graph
with some property or determination of whether or not a graph has
some property.
The graph questions we explore have efficient time and space solutions,
however, many interesting graph problems do not seem to.
Reachability is a classic graph property that asks if node b can be reached from node a; there are many real-world applications of reachability.
A simple search algorithm can be used to solve an instance of reachability
works as follows: Throughout the algorithm a set of vertices, denoted by
S is maintained. Initially,
.
Each node can be either
marked or unmarked. That node i is marked means that i has
been in S at some point in the past or is currently in S.
Initially, only a is marked. At each iteration of the algorithm, choose some
node
and remove it from S.
Each edge
out of i is processed:
if j is unmarked, mark it and add it to S.
Continue this until S becomes empty.
At this point, answer ``yes'' if n is marked and ``no'' otherwise.
The informal statements above can be written in pseudo-code.
We will present Kruskal's algorithm for minimal spanning trees.
The pseudocode for Krusal is at ../kruskal.html Analysis of Kruskal's Algorithm
Find-Set and Union Operations on Sets
`;=3000
![]()
return S[A];
`;=3000
![]()
for i := 0 to n do
if
(S[i] = B) /* If S[i] in B */
then S[i] = A; /* put it in A */
`;=3000
![]()
i := A;
while
;
return i;
`;=3000
![]()
if (A < B) /* put B's subset in A */
then S[B] = A;
else S[B] = A; /* put A's subset in B */
Union By Rank
`;=3000
![]()
if
then
S[B] := A;
else
S[A] := B;
if
then
;
Find-Set with Path Compression
`;=3000
![]()
if A != S[A] then /* A is not the root */
;
/* move to the parent */
return S[A];
Now consider the function
,
,
if i>0 and
.
And define
.
Note that
Ackermann's function is defined by
Define the Ackermann function by
A dictionary or lexicon is a collection of all word in a language organized so each word can be accessed quickly. It is also useful for lexicographers to be able to insert or delete dictionary words. An index is used to find all occurances of a word in text.
Let a [[text]] be fixed. Our interest is to devise a data structure that efficiently represents each factor (substring) of [[text]]. Let [[text.factors()]] be a function returning this data structure, call its type [[Factors]]. We would like to construct an object type [[Factors]] in linear time and space, and we'd like [[Factors]] to provide an O(|w|) answer to the question ``is w a factor in [[text]]?'' Other operations should also be efficient.
Below a heuristic algorithm is presented in C for the traveling salesman problem; you are to alter the algorithm so that (1) it is still correct and (2) it is 2 times faster than the given code. To say the algorithm is heuristic means that it uses a ``good idea'' in attempting to solve the problem, but the algorithm may not always produce the ``correct'' answer.
The traveling salesman problem is a classic example of an NP-complete problem -- NP problems are ones where if you are given the answer you can verify it is correct quickly, but it is often very time consuming to compute the answer. A problem is NP-complete if it is NP and it is as ``hard'' as any other problem in NP (we won't go into the technical definition here).
The traveling salesman decision problem says, given a set of cities
(represented as nodes in a graph) and distances
between cities
ci and cj (represented as weighed edges between pairs of nodes in the
graph), and a maximum cost M, is there a tour of all the cities
which costs M or less?
That is, is there a simple cycle from c0 to c0 that passes
through every other node in the graph exactly once such that the
sum of the weights along each edge of the cycle is less than or equal to M?
Essentially, the only way known to solve the traveling salesman problem
is to compute the cost of all tours and see if one has cost less than or
equal to M.
If there are n cities, then there are (n-1)! possible tours
that start at the first city.
(By Stirling's formula
,
so the
number of tours to check grows very large very quickly.)
Exploring all tours is called a brute force approach.
The heuristic we will use is called the ``nearest neighbor'' rule: starting from c0, find the city closest to c0 and travels to it incrementing the ``cost'' of travel (which was initialize to 0). Then, the process is repeated from the first city visited to all of the unvisited cities. Once all the cities have been visited, return to city c0. The nearest neighbor heuristic executes in O(n2) steps instead of the O(nn) steps needed for the brute force approach; thus, it is a polynomial time algorithm (n2) and is tractable, while the brute force approach is exponential (nn) and is untractable.
Your job is to make the code at least twice as fast, that is, if when you run the code in your computer environment it takes t seconds, then after your improvements it should run in t/2 seconds or less. Of course, your improved code must still execute the same algorithm, so it will still be of order n2, but it must execute the algorithm faster.
A few remarks that should be obvious. First, it is not fair to change computer environments -- if the runs in t on a PC and t/2 seconds on a Cray, you can't say you've optimized the code and fulfilled the requirement. As well, you can't change the compiler, or switch from un-optimized to optimized code to fulfill the requirement.
And a few remarks about the code.
You will need to create a data file that contains the number of cities
and the
coordinates of each city. I've set the maximum number
of cities to 1000, but you need to use at least enough cities to be able
to measure the time of the algorithm, that is,
the initial running time t should be at least, say 40 seconds.
Also, the code starts the tour at the last city cn-1 rather than
the first city c0 (this is not a significant change in the problem).
The code simply prints out the tour, it does not determine if it is
less than a maximal length M, but this could easily added to the code.
Finally, you may translate the code into any other language, just be
certain you implement the same algorithm correctly.
#include <stdio.h>
#include <math.h>
#define TRUE (1)
#define FALSE (0)
#define MAX_CITIES (1000)
#define MAX_DIST (1000)
typedef int bool;
typedef struct location {double x; double y;} loc;
int number_of_cities;
loc PtArr[MAX_CITIES];
main()
{
int i;
void NearNeighborTour();
scanf("%d\n", &number_of_cities);
if (number_of_cities > MAX_CITIES) {
fprintf(stderr, "error: too many cities\n");
exit(1);
}
for (i = 0; i < number_of_cities; i++) {
scanf("%lf %lf\n", &PtArr[i].x, &PtArr[i].y);
}
NearNeighborTour();
}
void NearNeighborTour()
{
int i, j;
bool visited[MAX_CITIES];
int this_city;
int closest_city;
double closest_distance;
double distance();
/* initialize unvisited cities */
for (i = 0; i < number_of_cities; i++) {
visited[i] = FALSE;
}
/* choose number_of_cities as starting point */
this_city = number_of_cities - 1;
visited[this_city] = TRUE;
printf("First city is %d\n", this_city);
/* main loop of nearest neighbor heuristic */
for (i = 1; i < number_of_cities; i++) {
/* find nearest unvisited city to this city */
closest_distance = MAX_DIST;
for (j = 0; j < number_of_cities; j++) {
if (!visited[j]) {
if (distance(this_city, j) < closest_distance) {
closest_distance = distance(this_city, j);
closest_city = j;
}
}
}
/* report closest city */
printf("Move from %d to %d\n", this_city, closest_city);
visited[closest_city] = TRUE;
this_city = closest_city;
}
/* finish tour by returning to start */
printf("Move from %d to %d\n", this_city, number_of_cities - 1);
}
double distance(m, n)
int m, n;
{
double x_squared = (PtArr[m].x-PtArr[n].x)*(PtArr[m].x-PtArr[n].x);
double y_squared = (PtArr[m].y-PtArr[n].y)*(PtArr[m].y-PtArr[n].y);
return sqrt(x_squared + y_squared);
}
To determine where to put forth an effort to optimize the code you should profile your code. The gcc compiler supports profiling with prof and gprof, see the manual pages for gcc.
I'd like to delay giving hints on how to increase the code's efficiency. We can discuss ideas over the mail (cse5081@cs.fit.edu). There are several ideas that should become obvious to you and a few others that are more obscure.
You must turn in:
You should make one improvement at a time using conditional compilation to exclude old portions of the program and including new portions, for example,
#if IMPROVEMENT == NONE original code goes here #elif IMPROVEMENT == FIRST first improvement goes here #endif
Your report can be at most 5 pages of double spaced 10 point type and will be presented at the Annual Computer Conference on Code Optimization which will be held on June 3, 1997.