In-class and homework exercises for Tuesday, Sept. 4

Assignment due on Tuesday, Sept. 11.

The goal of this assignment is to write some more programs with Python. You may want to consult the Python scripts that were discussed in lecture last week.

Exercise 1. Write a Python script, median, that accepts free-format numerical text input and displays (from left to right) the minimum value, first quartile value, the median, third quartile value, and the maximum value on the standard output stream; the default formatting will do. Assume any number of input values per line and that all input represents a valid floating-point number (i.e., don't worry about handling erroneous input).

Your program should read from the file named on the command line (if any) or from the standard input otherwise. In this way, you program can be used conveniently as either

median file.dat
or in a pipeline as
some_program file.dat | median

The median of a list of n elements is the middle value of the sorted list if n is odd and the average of the middle two values if n is even. The first quartile value is simply the median value of the first (lower) half of the sorted list and the third quartile value is the median of the last (upper) half.

Hint. list = [] creates an empty list and list.append(val) adds the value val to the end of list. See Section 5.1 of the Python tutorial for more details on lists and for the arithmetic operations that you will need. (See the online course supplements for an online tutorial and a PDF file that you can download.) See the sum program to see how to create a standalone executable script with the #! "magic number".

Exercise 2. The grep.py script described in lecture has a flaw: if the file listed as the last command-line argument cannot be opened, then open throws an exception, which, since it is unhandled, halts the script. While this is arguably the correct action, the check

if f:
   grep(sys.argv[1], f)
is not useful. Fix this defect by handling the exception (print an appropriate error message to the standard error stream that names the file that cannot be opened). Your code will look like
try:
   f = open(sys.argv[1], 'r')
except
exception:
   
print an error message
else:
   grep(sys.argv[1], f)
You will need to determine the name of the exception that is thrown. (This is primarily a documentation-reading exercise. Section 8 of the Python tutorial is a useful resource.)

Exercise 3. The sort.py script described in class to sort the Cardinals football roster by player weight has a bug: if two players weigh the same, then only one of them is reported in the output list. This is because the key is the weight, and a hash table (dictionary) can have at most one value associated with a given key.

There are two simple workarounds that I can think of. First is to index the hash table (dictionary) by a tuple of the form

(wt, x)
where x is some quantity that is guaranteed to be unique for each player. Then index and sort the hash table by the tuple. Choose something simple for x, such as an integer.

The second is to use a list instead of a hash table. As above, however, each list entry should to be a tuple of the form

(wt, line[:-1])
When the list is sorted, you'll simply output only the line portion of each tuple. See Section 5.3 of the Python tutorial for more information about tuples.

Exercise 4. Write a simple text preprocessor in Python. Your preprocessor should copy its input to the standard output, except for lines that begin with one of the following sequences of characters:

A line of the form
#include file
is replaced by the contents of file. (Your program simply copies the contents of file to the standard output at the point where the #include line is encountered. Your program does not preprocess the contents of the included file, and file is not enclosed by quotation marks.)

A line of the form

#define THING
defines THING as a preprocessor variable. Preprocessor variables are undefined unless they appear in a #define statement.)

A line of the form

#undef THING
undefines THING as a preprocessor variable if THING has been defined by a previous #define statement. It is not an error if THING has not been previously defined.
Input text of the form
#if THING
block1
#else
block2
#endif
is processed as follows. If THING has been defined, then block1 is copied to the standard output, and block2 is skipped. Otherwise, if THING has not been defined, then block1 is skipped and block2 is copied. The lines containing the #if, #else and #endif are never copied. The #else clause is optional.

Nested #if's are not allowed. However, #include, #define and #undef statements that appear within an if/else block must be processed according to the rules above. For instance, the following code segment defines AA and includes the contents of fileA if A is defined; otherwise, it defines BB and includes fileB:

#if A
#define AA
#include fileA
#else
#define BB
#include fileB
#endif

At least one whitespace character must separate the #include, #define, #undef and #if keywords from their arguments. Exactly one filename must appear in an #include statement, and it is not enclosed in quotation marks. (For simplicity, assume that filenames may not contain whitespace.) Exactly one preprocessor variable must appear in a #define or #undef statement. A preprocessor variable is any sequence of non-whitespace characters.

Your program must print an appropriate error message to the standard error and take appropriate action under any of the following circumstances:

Your error message must include the line number of the input at which the error occurred. The first line of the input is numbered 1.

Your preprocessor must be runnable in either the form

python preproc.py file
or
python preproc.py < file

Suggestions: Use a hash table (dictionary) to keep track of preprocessor variables. Since a preprocessor variable is either defined or undefined, it suffices to check whether the variable is a key in the table. Since nested if's are not allowed, your preprocessor is always in one of three states: either it's reading text that's not in an if/else block (state 0, say); or it's reading text in an if block (state 1); or it's reading text in an else block (state 2). It's an error if an #else line is encountered in state 0 or 2, or if an #endif line is encountered in state 0, etc.

Submission instructions

You will create four files for this assignment, as follows: You must package these files into a tar archive, hw3.tar, using the tar command as follows:
tar -cf hw3.tar median grep.py sort.py preproc.py
Then mail the tar archive to mat420hw at gmail.com with a subject line of the form
Your_name MAT 420 HW3
and include hw3.tar as an attachment. Assignments are due by 5 p.m., Tuesday, Sept. 11.

Copyright(c) 2007 by Eric J. Kostelich. All rights reserved.