This is a take-home exam. There are two problems, each worth 50 points. You may use any books, notes, or Web resources that you like, but you may not discuss the questions or answers with anyone.
Submission instructions:
return (p,q)suffices to return the requested tuple. (Note: if x = (p,q), then x[0] yields p and x[1] yields q.)
You may use any online or textbook reference for a greatest common divisor algorithm, provided that you cite the reference with an appropriate URL or bibliographic reference. Such references must be included in your source code as comments. It's possible to do this problem in about 50 lines of nicely formatted Python code.
The National Program of Cancer Registries (NPCR) is run by the U.S. Centers for Disease Control and Prevention. These data on brain cancer incidence were downloaded from the NPCR website and give statistics on cases diagnosed over the 5-year period from 1999 to 2003.
Each column of data is delimited by a vertical bar. The first column gives the year (or group of years) to which the remaining data in each line correspond. (You are interested only in those lines that describe the individual years from 1999 to 2003.) The next three columns indicate site, age range, and sex. The eleventh column gives the raw number of reported cases. Entries marked by a tilde (~) represent fewer than 16 cases and may be regarded as 0.
Using any combination of Python, sort, grep, tail, etc., generate a list of the form
Incidence Sitesorted in descending order by incidence (i.e., by the sum of the counts for the 5-year period). You will generate four lists: one each for Male 0-19; Male 20+; Female 0-19; and Female 20+. For each site, the incidence will be the total number of cases over the 5-year period.
For instance, to generate the entry for glioblastoma in males 20+, you will sum the number of glioblastoma cases in 1999, 2000, 2001, 2002, and 2003 for this particular cohort. Treat each distinct site as a distinct cancer type. (For example, regard "Neoplasm, unspecified" and "Unclassified Tumors" as distinct.) Also include "Total", which of course is the sum of all tumor types.
In the tables, NOS stands for "not otherwise specified" and CI means "confidence interval." Because the raw numbers are gathered from various sources, they are subject to reporting errors, and these values are statistically adjusted. However, for the purpose of this exercise, you may ignore all data except for raw counts by sex and age category.
Any Python script should be called brain.py. If you use a shell script, call it brain.sh. Any other required files should have an obvious name and be included in the tar file (below).
Programs will be evaluated on correctness and simplicity. You should aim to use a combination of tools that minimizes programming effort. (It's possible to do this problem in about 30 lines of nicely formatted code.) All programs must be scripts! No Java or C programs are permitted.
Some background information. Only about 10 percent of the cells in your brain are neurons. The remainder are mostly support cells, called glia, of which there are two main types: astrocytes and oligodendrocytes. The glial cells manufacture myelin (which provides electrical insulation for nerve fibers) and also regulate the interactions between neurons. Glial cells can become cancerous and form tumors called gliomas. Tumors arising from astrocytes are called astrocytomas (and similarly for the other cell types). Mixed gliomas, as the name suggests, are cancerous mixtures of various glial cells. Glioblastoma is a particularly fast-growing and deadly type of glioma.
The membrane between the brain and the skull is called the meninges. Meningitis is an infection of the meninges, and meningioma is a tumor arising from the meninges.
The Barrow Neurological Institute at St. Joseph's Hospital in downtown Phoenix is one of the largest centers in the world for the treatment of and research on brain tumors.
tar cf midterm.tar rational.py brain.py(and any other files that you generate for Question 2) to generate a tar archive of your work.
Your_name MidtermInclude the tar file as an attachment. Please also include your name as a comment in each file that you include in the tar file.