CSE 4510: Lab Assignment #4

Due: Friday, January 29, 1999

Background

The Soundex system of Margaret K. Odell and Robert C. Russell is covered by U.S. Patents taken out in 1918 and 1922. It reduces all strings to a "Soundex code" of one letter and 3 digits. The first letter of the code is the same as the original string. The goal is to map names that sound alike to the same code.

The Soundex code works by grouping the 26 letters of the alphabet into one of six categories.

0 A E I O U H W Y
1 B F P V
2 C G J K Q S X Z
3 D T
4 L
5 M N
6 R
The code for a string is computed as follows:
  1. Two or more adjacent letters in the same category are reduced to just one. The first letter might begin such a run.
  2. All letters except for the first are replaced by the corresponding digit of the category it is in.
  3. Letters (except for the first) in category 0 are dropped.
  4. Any digits after the first three digits are discarded and zeros are added so that there are exactly three digits.

For example, the names Dickson and Dixon are assigned the same code of D250. Stansifer and Steinseifer are assigned S352. Rodgers is assiged R326 and Rogers is assigned R262. The names Euler, Gauss, Hilbert, and Lee have the codes E460, G200, H416, and L000 respectively.

DicksonD250
DixonD250
KnuthK530
LissajousL222
LloydL300
LukaschowskyL222
LukaschowaL220
LukasiewiczL222
PfisterP236
RodgersR326
RogersR262
EulerE460
GaussG200
HilbertH416
LeeL000
StansiferS352
SteinseiferS352

The Task

Read lines from the standard input until end-of-file is reached. Treat each line as a string and find the corresponding Soundex code. For every line of input, there is one line of output.

For example,

DICKSON
DICKSON D250
Treat upper and lower case letters the same.
Dickson
DICKSON D250

White space at the beginning and end of the line must be ignored (cf. the method trim from the class java.lang.String). The remaining characters must be letters, or the input is bad. Any line with no letters in it is also bad. If the input is bad, the output must be the line "Bad input line."

Dickson is a ____!   ;; bad input
Bad input line.

D1cls0n              ;; more bad input
Bad input line.

D I C K S O N        ;; more bad input
Bad input line.

There should be the same number of output lines as input lines. Do not prompt the user for input; do not print blank lines.

Helpful Stuff

Turning it in

Use the following command on malestrom.cs.fit.edu or zach.cs.fit.edu

~ryan/bin/mfiles ryan@cs.fit.edu "cse4510/lab04" Soundex.java

Be sure your name is in comments at the beginning of your program. With luck a summary of the submissions for this lab exercise will be found in the file lab04-sub.txt.


Ryan Stansifer <ryan@cs.fit.edu>
Last modified: Sat May 22 14:20:48 EDT 1999