CS190N/CS290N -- Assignment 1: Finding the Hot Zone

John Brevik and Rich Wolski --- Winter, 2006


Your assignment is to determine the "hot zone" for the killer's place of residence under the assumption that he or she lives at the center of a region in the city of Los Angeles, California, in which the killings have been observed to take place. You should assume that the "center" of the hot zone is where the killer is most likely to live, and that the killings take place at locations having coordinates that are best represented as a bivariate Normal random variable.
For the assignment, you should assume that the killings take place at the following cross streets in the Los Angeles area: You should use these cross streets and a map to determine a coordinate scheme. Write a computer program that takes a list of x-y pairs (separated by spaces) and prints out an equation for the ellipse that you can use to determine the 87% confidence region for the mean. Finally, you should plot the region on the map and give the cross street nearest the center.

Helpful Hints from your Uncle Norman and your Aunt Heloise

The confidence region is going to correspond to an equation that you can graph implicitly. You are probably going to want to write some code to compute the coefficients of the equation from the data points and then use some type of implicit graphing tool to generate your region. When we did it, we used a graphical calculator like gcalc which can be found at http://gcalc.net. You can also use the nifty calculator from Cornell that we used in the lecture notes. You need to be careful about your units when devising your coordinate system so that the region you compute is on the right scale for the map you use.

Start by writing a program that computes the variance-covariance matrix for pairs of numbers. This part should be relatively easy. As a test, consider the set

0 0
1 0.75
4.5 0.54
-2.25 11.0
-2.0 -1.5
-1.7 -3.5
2.0 15.0
7.5 -1.5
3.7 4.3
-3.7 2.5
-1.0 1.5
-0.7 5.2
1.0 -1.0
3.0 2.0
Note that you can think of these as (x,y) coordinates on a grid superimposed over a map, where (0,0) sits on one of the cross-streets.

The sample variance-covariance matrix we calculate from this sample is

9.470838 -1.227701 
-1.227701 25.667546
In this case, we've divided by N - 1 when computing the sample variances and covariance.

Next, you should write a routine to invert the sample variance-covariance matrix. You don't need to write a general matrix-inversion routine -- only one to invert a 2 x 2 matrix.

At this point your Aunt and Uncle need to talk to you about an unpleasant subject: unsightly plagiarism. For many of these calculations you will find algorithms and code on the web or in books. Using these as references is perfectly fine, but you must cite the references you use. Let us reiterate that point. If you use any reference (code, web page, book, paper, etc.) other than the course web pages you must write down who the original author of the work is, and where/how it was published as part of what you turn in. For code, it is fine simply to give the web page as a comment in the source code you turn in.

Okay, so the next step is to write a piece of code that inverts a 2 x 2 matrix so you can invert your sample variance-covariance matrix. Here is what we get when we invert the example matrix above:

0.106246 0.005082 
0.005082 0.039203 

Finally, write a program that prints out the equation for your ellipse symbolically. That is, you program prints out the x^2, y^2, xy, x, y and constant terms for your ellipse with the appropriate coefficients. You might want to multiply by (n-2) / (2 * (n-1)) so that your "constant" on the right-hand side becomes an F critical value. Here is the equation our code gets for this test data:

0.686513 * (x - 0.810714)^2 + 0.065673 * (x - 0.810714)*(y - 2.520714) +
0.253310 * (y - 2.520714)^2
If you set this equal to different F critical values that you can find with the calculator mentioned in the lecture notes, you will then be able to plot confidence ellipses corresponding to these critical values.

What to Turn In

You should turn in a map (a printout from one of the web-based mapping systems is fine) at a sufficient level of detail to be able to compare your answer to the one shown in the television clip with the confidence region traced out appropriately. You should also give the cross street nearest where you believe the center to be. It is fine to draw or trace the region onto the map if you cannot find a way to superimpose it electronically.

Finally should also turn the code you have written to compute the sample variance-covariance matrix, to invert it, and to print out the terms of the equation describing your ellipse.


For the Graduate Student

In the Pilot, Charlie actually uses an algorithm call the Criminal Geographic Targeting (CGT) algoritm. Or at least, we think he does based on the appearance of the the equation in the show. This equation was developed by Dr. Kim Rossmo, a police detective simultaneously working on crime solving and his doctoral dissertation in Vancouver.

As a graduate student assignment, you can compare your answer to that generated by the "CGT" algorithm shown in the Pilot.

While implementing the algorithm is fairly straight forward (your Uncle Norman and Aunt Heloise worked out the details in an afternoon), determining the parameters necessary to make the algorithm work properly on the data will be a bit of a challenge. We spent a few days searching the web for the data necessary to instantiate the CGT equation, and didn't find much.

However.

We did find a thesis or two that at least described enough about the methodology to understand what the parameters represent.

To complete this assignment for Graduate Student credit, you will need to continue this research to the point where you can credibly instantiate the CGT equation and then use it to generate a confidence contour from the data given at the top of this page. If you have questions about what constitutes a "credible instantiation" please feel free to contact us immediately.