MAT 420 Midterm 2, Kostelich (Fall 2007)

This is a take-home exam. There are two problems worth a total of 100 points as indicated. You may use any books, notes, or Web resources that you like, but you may not discuss the questions or answers with anyone.

Submission instructions:

Do not put your papers under my office door. Do not email your answers to me in lieu of a paper printout. I cannot be responsible for failure of delivery due to spam filters or server crashes.

Problem 1

(60 points) Write a Fortran module (call it regression), containing three subroutines as follows:
  1. subroutine linear(x,y,n,b,ra,info), which takes two input n-vectors x and y containing (x,y) pairs of data, and fits the model
    y = b1 + b2x
    to it. On return, the 2-vector b contains the parameters b1 and b2, and the scalar ra contains the adjusted coefficient of determination (explained below). info is an integer output flag that is 0 if the computation is successful and a nonzero value of your choice if an error occurs.

    Use double-precision arithmetic (define the appropriate kind variables in your module). The contents of x and y must not be altered. Your code must call the LAPACK routine dgels with appropriate arguments to determine the coefficients and the residual sum of squares.

  2. subroutine exponential(x,y,n,b,ra,info), which is otherwise like linear but fits the model
    y = b1 exp(b2x).
    You should check that all the y's are nonzero and have the same sign. (If all are negative, then b1 is negative.)

    Note that you can make a suitable change of variables to convert the exponential model into a linear one (and that you can call subroutine linear to fit it). On return, the parameters are as for linear, but ra holds the adjusted coefficient of determination as computed by linear for the transformed problem.

  3. subroutine quadratic(x,y,n,b,ra,info), which is otherwise like linear but fits the model
    y = b1 + b2x + b3x2.
    On return, the 3-vector b contains the fitted parameters, and ra contains the adjusted coefficient of determination as usual.
Test your subroutines using collections of data of your choice.

The adjusted coefficient of determination

If you fit a linear model with p parameters to a collection of n data points, the adjusted coefficient of determination is defined as
1 - [(n - 1) / (n - p)] (SSE/SSTO)
where SSE is the residual sum of squares and SSTO is the total sum of squares. Given the n-vector y (consisting of the measurements yi), the total sum of squares is
SSTO = yTy - n[E(y)]2
where E(y) is the mean of the yi's.

The residual sum of squares, SSE, is the sum from i=1 to n of

[yi - b1 - b2xi]2
that is, the sum of squares of the differences after the model is fit. (For the quadratic model, SSE is the sum of [yi - b1 - b2xi - b3xi2 ]2.)

Note that dgels computes SSE for you (read carefully the description of the return value of the argument B on the manual page).

For the linear and exponential models, p=2 since you're fitting only two parameters, b1 and b2. The quadratic model has p=3.

The rationale for the formula is as follows: You can exactly fit any collection of (x,y) data (as long as all the x's are distinct) by using a polynomial of sufficiently high degree. The adjusted sum of squares includes a penalty for an increasing number of terms.

Problem 2

(40 points) The concentration of carbon dioxide in the atmosphere has been measured for many years at various locations on the earth. The longest continuous record is at the Mauna Loa observatory in Hawaii. Its remoteness keeps the measurements from being contaminated by nearby factories, and Hawaii is also in a region where the trade winds provide good mixing. This data set contains the average yearly CO2 concentration, in parts per million (ppm), as measured at Mauna Loa. (The year 1964 is excluded due to technical difficulties.) Write a program that calls your routines from Problem 1 to fit a linear, exponential, and quadratic model to these data. (At your option, you may rescale time so that x is time in years since 1959, that is, x=0 corresponds to 1959.)

The Intergovernmental Panel on Climate Change recently completed a report in which it estimates that at atmospheric CO2 concentration of 450 ppm would be dangerous, insofar as it might breach a climate "tipping point".

Although the carbon cycle is very complex, and extrapolations are always fraught with uncertainty, they nevertheless can provide a back-of-the-envelope estimate of how much time might be left to act on rising CO2 levels in the atmosphere. Use each of your models to estimate, to the nearest year, when the average CO2 level would reach 450 ppm, assuming that current trends continue.

Submission instructions

  1. Use the command
    tar cf midterm2.tar regression.f90 prob2.f90
    (and any other files that you generate) to generate a tar archive of your work.
  2. Email your tar file to mat420hw at gmail.com with a subject line of the form
    Your_name Midterm 2
    Include the tar file as an attachment. Please also include your name as a comment in each file that you include in the tar file.
  3. Make a printout of your programs. Don't forget the statement about not receiving help from any person and don't forget to sign it! Take your printout to the main math office, PSA 216, and have the desk staff place it in my mailbox.
  4. The deadline for receipt of all materials is 5 p.m., Wednesday, Oct. 31.