CS190N/CS290N -- Assignment 4: Show Me the Monet

John Brevik and Rich Wolski --- Winter, 2006

In episode "In Plain Sight," our hero finds himself caught in a plot line rather fraught with entropy (or perhaps self-similarity, as would be more appropriate for this class) in which the evil operator of a drug lab is also an incestuous pedophile. Clearly "Numb3rs" belongs in the 10:00 time slot on Friday nights. Be that as it may, Charlie helps the FBI to capture the villain by running some of his handy "algorithms," this time for the purpose of clarifying an image found on a damaged hard disk belonging to the culprit. By "adjusting the pixels by one more negative power" (please don't try this at home) he and Larry ultimately extract an image that has been hidden in another image using steganography which, they emphasize, is "very hot right now." This hidden image ultimately leads to the capture of the villain and yet another triumph for today's mathematically endowed super-hero.

Your super-heroism-skills training continues in this assignment as we, once again, ask you to duplicate Charlie's handiwork to extract images that have been hidden in other images using steganographic techniques.

So You Want to be in Pictures...

This image

and this image

each contain a single hidden image that has been encrypted using the techniques discussed in the lecture notes. In case your browser engages in any unintended steganographic activity of its own in attempting to display these images, you can down load them directly from IMAGE 1 and IMAGE 2 respectively.

The image hidden in IMAGE 1 is 70 pixels by 70 pixels and its image data is encoded as 24-bit truecolor without an alpha channel.

The image hidden in IMAGE 2 is 96 pixels high by 66 pixels wide, again encoded as 24bit truecolor, but with an alpha channel this time.

The password for each hidden image has been generated by the methodology also described in the lecture notes using an LCG with a modulus that is no more than 20 bits in length. A sequence of 100 consecutive integers from the LCG is contained in this file. The algorithm we used to generate a pseudorandom number on (0,1) from this algorithm is

double RandLCG()
{
        double r;
        Seed=(Mult*Seed+Add)%Mod;
        r = ((double)Seed + 0.5) / ((double)Mod);

        return(r);
}

Moreover, the seed for each password has been generated using the sum of the ASCII character representations for a single word.

Your Assignment

You are to write the code necessary to extract each hidden image and store it as a standard PNG file that the TA can render using a standard web browser. You must turn in both extracted images, the seed you used to extract them in each case (optionally with the password corresponding to that seed), and the code you used to make these extractions. You should also turn in a README file that explains any and all ideosynchracies associated with your implementation so that we can reproduce your results and so that we can test your code with a different set of image files. Again, it is essential that you cite all reference material you use to produce your implementation.

For the Graduate Student

As a graduate student assignment, devise another way to encrypt images that is less intrusive on the hiding image than the methodology we have outlined is. Rehide the image you extracted from IMAGE 1 with your alternative method (using the same seed and/or password) and turn in your resulting image, your code, and a README describing your method and the way in which your implementation works so that we can perform the extraction. We will post your image solutions to this part of the assignment on the class web page so that all can enjoy your stellar efforts.

Helpful Hints from your Aunt Heloise and Uncle Norman

If it hasn't dawned on you by now, this final assignment can be decomposed into three smaller assignments that can then be completed more or less in parallel and integrated to produce your solution. First, you will want to write a PNG image manipulation facility that allows you to read alter, and write "legal" PNG files. Secondly, you will need to be able to perform the steganographic image extraction, and thirdly, you will most probably need to write a search utility that you will use to crack the password for each image. Thankfully, you have decomposed yourself into rather large teams permitting a division of labor that is likely to yield a successful result by the assignment's due date.

From the lecture notes and your feedback during lecture, it seems to us that you will have little trouble in developing the code necessary to manipulate PNGs -- most likely a lot less trouble than your Aunt Heloise who we must admit is getting a little old. You are certainly free to use any and all of the code presented in lecture as part of your solution but please do not consider yourself limited to this approach. DO NOT rely on a tool, however, for image manipulation that only supports an interactive mode. The TA will be using grading scripts to test your code and an interactive component to your solution is certain to cause disaster. Also, you might be tempted to engage in some form of format conversion using standard converting tools. This is a fine approach but you will be graded on the relative quality of the image you extract. PNGs are not lossy. You will need to extract the hidden image without loss for full credit. Be careful with image conversions as they often use some form of lossy compression and/or transformation by default.

The steganographic code you will need you will almost certainly need to write yourself. It is fairly straight-forward to do (your Aunt Heloise managed both an "image hider" and an "image extractor" during a rather ponderous meeting) but you will need to take care of sizes and indexing rather diligently. "Off by one" in some of your indexing operations, for example, can produce a dramatically incorrect result.

For the third part, you have a couple of options that you can, again, pursue in parallel. First, you will need to crack the LCG. That should be straight-forward enough based on the lecture. With the LCG parameters in hand, you can either try all possible seeds exhaustively, or you can try a dictionary search, or both. Unless you want to look at each and every image you extract, though, you will probably want to write an image filter that can look at a PNG file (or perhaps just at the image data from a PNG file) and give you a "score" that indicates how likely it is that the data encodes an image. You might be tempted to develop a sophisticated likelihood scorer for this part, but it is more important for this assignment that you extract the image. If I were me, I would start with something simple and liberal. That is, something that is more likely to say a non-image is an image than to make the alternative error. Visually, you will most probably be able to pick out your true image from, say, 100 thumbnails of extracted potential candidates. You wouldn't want to have the image "filtered" out, however, so liberal will be better than conservative in this case.

As with the previous assignments, you will want to do a lot of experimenting with your own data so that you can compare your code to known solutions. We'll leave it to you to find an appropriate set of PNGs for your own testing enjoyment.

Start Now -- the Clock is Ticking

Perhaps more than for any other assignment you are going to want to begin this one immediately. Yes, your loving Aunt and Uncle encourage you always to begin these assignments as early as possible and we realize that doing so might impose a cost on your respective social lives. However in this case, our encouragement can be translated into a quantitative warning.

Your Aunt's solution for the first image requires approximately 1.2 seconds on a typical example of the machines to which you have access when the seed is known. That is, with the correct seed, it takes her code 1.2 seconds to do the steganographic extraction and to write out the PNG image. As mentioned previously, she generated her solution while she was participating a meeting so it is written in a style that optimizes for ease of debugging and correctness rather than performance. On the other hand, as you well know by now, she prefers C which compared to potential alternatives, is usually pretty competitive even in a get-it-done-right-the-first-time form.

Let's say, for the sake of argument, that your code is similar in performance. With a 20 bit modulus, at 1.2 seconds per potential solution, the expected time to find the image (assuming your image filter take no time) is about 7.2 days running around the clock on one of the departmental machines. For the second solution, under the same set of assumptions, the expected time to solution is 4.5 days. Thus, with code that has been developed and debugged, and a filter that takes zero time, you will need almost 12 days of continuous CPU time to find the images, on the average.

Here is some other news. The departmental machines do not have physically protected power sources or power switches. As the quarter draws to a close, it is your Aunt's experience that long-running, CPU intensive programs are frequently victimized by spontaneous machine reboots at the hands of students who simply cannot tolerate a background process (even a niced one) while they are reading email.

Thus you should consider beginning early, staying late, writing fault resilient code, and the use of parallelism where ever you can.