Your super-heroism-skills training continues in this assignment as we, once again, ask you to duplicate Charlie's handiwork to extract images that have been hidden in other images using steganographic techniques.
and this image
each contain a single hidden image that has been encrypted using the techniques discussed in the lecture notes. In case your browser engages in any unintended steganographic activity of its own in attempting to display these images, you can down load them directly from IMAGE 1 and IMAGE 2 respectively.
The image hidden in IMAGE 1 is 70 pixels by 70 pixels and its image data is encoded as 24-bit truecolor without an alpha channel.
The image hidden in IMAGE 2 is 96 pixels high by 66 pixels wide, again encoded as 24bit truecolor, but with an alpha channel this time.
The password for each hidden image has been generated by the methodology also described in the lecture notes using an LCG with a modulus that is no more than 20 bits in length. A sequence of 100 consecutive integers from the LCG is contained in this file. The algorithm we used to generate a pseudorandom number on (0,1) from this algorithm is
double RandLCG() { double r; Seed=(Mult*Seed+Add)%Mod; r = ((double)Seed + 0.5) / ((double)Mod); return(r); }Moreover, the seed for each password has been generated using the sum of the ASCII character representations for a single word.
From the lecture notes and your feedback during lecture, it seems to us that you will have little trouble in developing the code necessary to manipulate PNGs -- most likely a lot less trouble than your Aunt Heloise who we must admit is getting a little old. You are certainly free to use any and all of the code presented in lecture as part of your solution but please do not consider yourself limited to this approach. DO NOT rely on a tool, however, for image manipulation that only supports an interactive mode. The TA will be using grading scripts to test your code and an interactive component to your solution is certain to cause disaster. Also, you might be tempted to engage in some form of format conversion using standard converting tools. This is a fine approach but you will be graded on the relative quality of the image you extract. PNGs are not lossy. You will need to extract the hidden image without loss for full credit. Be careful with image conversions as they often use some form of lossy compression and/or transformation by default.
The steganographic code you will need you will almost certainly need to write yourself. It is fairly straight-forward to do (your Aunt Heloise managed both an "image hider" and an "image extractor" during a rather ponderous meeting) but you will need to take care of sizes and indexing rather diligently. "Off by one" in some of your indexing operations, for example, can produce a dramatically incorrect result.
For the third part, you have a couple of options that you can, again, pursue in parallel. First, you will need to crack the LCG. That should be straight-forward enough based on the lecture. With the LCG parameters in hand, you can either try all possible seeds exhaustively, or you can try a dictionary search, or both. Unless you want to look at each and every image you extract, though, you will probably want to write an image filter that can look at a PNG file (or perhaps just at the image data from a PNG file) and give you a "score" that indicates how likely it is that the data encodes an image. You might be tempted to develop a sophisticated likelihood scorer for this part, but it is more important for this assignment that you extract the image. If I were me, I would start with something simple and liberal. That is, something that is more likely to say a non-image is an image than to make the alternative error. Visually, you will most probably be able to pick out your true image from, say, 100 thumbnails of extracted potential candidates. You wouldn't want to have the image "filtered" out, however, so liberal will be better than conservative in this case.
As with the previous assignments, you will want to do a lot of experimenting with your own data so that you can compare your code to known solutions. We'll leave it to you to find an appropriate set of PNGs for your own testing enjoyment.
Your Aunt's solution for the first image requires approximately 1.2 seconds on a typical example of the machines to which you have access when the seed is known. That is, with the correct seed, it takes her code 1.2 seconds to do the steganographic extraction and to write out the PNG image. As mentioned previously, she generated her solution while she was participating a meeting so it is written in a style that optimizes for ease of debugging and correctness rather than performance. On the other hand, as you well know by now, she prefers C which compared to potential alternatives, is usually pretty competitive even in a get-it-done-right-the-first-time form.
Let's say, for the sake of argument, that your code is similar in performance. With a 20 bit modulus, at 1.2 seconds per potential solution, the expected time to find the image (assuming your image filter take no time) is about 7.2 days running around the clock on one of the departmental machines. For the second solution, under the same set of assumptions, the expected time to solution is 4.5 days. Thus, with code that has been developed and debugged, and a filter that takes zero time, you will need almost 12 days of continuous CPU time to find the images, on the average.
Here is some other news. The departmental machines do not have physically protected power sources or power switches. As the quarter draws to a close, it is your Aunt's experience that long-running, CPU intensive programs are frequently victimized by spontaneous machine reboots at the hands of students who simply cannot tolerate a background process (even a niced one) while they are reading email.
Thus you should consider beginning early, staying late, writing fault resilient code, and the use of parallelism where ever you can.