CS190N Numb3rs -- In Plain Sight

The situation:

Charlie finds an image file on the hard drive of a known bad guy. After some seconds of contemplating, he discovers that there is another image hidden within the image! How did he do it?

Do You Take Steganography?

To be sure, there are lots of ways to hide an image in another image. Assuming that the image file is uncompressed, and that the file format presents the red, green, and blue values of each pixel as bytes, the simplest thing to do would probably be to do something like this: Take the image you want to hide and break each of its pixel bytes into four chunks of two bits each. Then, into an image four times the size of the one you're trying to hide, steal the two lowest-value bits from each pixel byte and use them to encode your image. To the eye, the big image won't have changed very much, since the bottom two bits fine tune the color values by only about 2%, and yet your image lives, undistorted, within the bigger image. Similarly, you could hide your picture within a picture of the same size, although you are then limited to one of four values for each of RGB inside the embedded image. Still, the results aren't too bad; this method was used on the following image to embed this image

Note that there are two distinct issues here. The first is simply to hide the one picture within the other picture in a way that makes it difficult to detect; the second is to hide it in such a way that it is difficult to extract, even if you know something is there. We will take on both of these challenges, although ultimately, paralleling the development in the current Numb3rs episode, we are less concerned with unobtrusiveness (remember Charlie's "persistent artifacts"!) than we are with security. In fact, we'll present a general method for hiding files using other files, wherein it's not only essentially impossible for an intruder to read them, it's essentially impossible for an intruder to know they even exist. This method was developed originally to accommodate "plausible deniability"; to lift text directly from the original paper,

The concept of operations is that the user of such a file system could, if placed under compulsion, reveal (say) the three passwords used to protect the directories with his email archive, his tax records and his love letters, but keep quiet about the directory containing his trade secrets. The opponent would have no means of proving that such a directory exists.

What Have You Got to Hide?

Let's set out the goal of our steganographic system first, and then we'll come back and figure out how to construct it. The general framework here is that we'd like to construct a filesystem with a linear hierarchical structure &mdash in other words, we'll assume that the files can be arranged from "highest" to "lowest" in such a way that if any user has access to one file, that user has access to all "lower" files.

First of all, suppose for simplicity that all the files in this discussion are exactly the same size, say n bits, and suppose further that there is one file in each hierarchy level. Call the files we wish to hide F1, ..., Fm. We are going to create a larger number of cover files, say C1, ..., Ck, all of which look random, but for which each of the hidden files is the exclusive OR of some subset of the Ci. Exactly which of the Ci are involved for a given file Fj is determined by that file's password Pj, which needs to be k bits in length. Specifically, the file Ci is to be used for Fj if the jth bit position in Pj is equal to 1.

For my entertainment dollar, the cleanest way to express the construction of the filesystem is using the language of linear algebra rather than the relatively clumsy language of Boolean functions. In order to do this, we need to take a step backward and see how far we can abstract the usual theory of linear algebra. In other words, we need to do us a little abstract algebra.

XOR: F2 Don't Need No Stinkin' Special Notation

Recall that a vector space is a set V of elements, called vectors, together with the operations of addition and scalar multiplication, such that it is closed under these operations and these operations, together with the scalar operations of addition and multiplication, obey all the associative and distributive laws that make sense. Just to spell it all out in math-ese, if u, v, and w are vectors in V and a and b are scalars,

u + v and au are in V (closure)
u + v = v + u (commutative law)
u + (v + w) = (u + v) + w, a(bu) = (ab)u (associative laws)
a(u + v) = au + av, (a + b)u = au + bu (distributive laws)

The set of scalars is typically chosen to be the set of real numbers. Why? What specific properties do the real numbers enjoy that make them a good set of scalars for linear algebra compared to, say, the integers ?

One of the primary uses of linear algebra is to provide a framework for the theory of solving systems of linear equations in several unknowns. If you think about it for a sec, then you'll see that even to solve an arbitrary linear equation in a single unknown, you get into trouble if you're not able to perform division with your scalars. For example, 2x + 3 = 8 is a very nice linear equation over , but it has no solution in , because we can't divide 5 by 2. In other words, for us to be able to solve linear equations, each nonzero scalar needs to have a multiplicative inverse. So to answer the question posed in the above paragraph, the real numbers are superior to the integers as a set of scalars in linear algebra because of this multiplicative-inverse business. Aside from that, there's nothing particularly special about . For example, the set of rational numbers also has the multiplicative-inverse property and therefore makes a perfectly decent set of scalars; in a very concrete way, the complex numbers are even better than .

In general, a set for which there are defined sum and product operations with respective identities, usually called 0 and 1, which obey the usual commutative, associative, and distributive laws and for which every element has an additive inverse is called a ring; the canonical example of a ring is the set . A ring for which every nonzero element has a multiplicative inverse is called a field. Fields are the appropriate structures for doing linear algebra, as observed above. I also note here that Gaussian elimination, the most straightforward process by which to solve systems of linear equations, is possible over any field at all.

So, just how many different fields are there in the world, anyway? It probably will not surprise you that the answer is "infinitely many," although after you have named the reals, the rationals, and the complexes, you might have a hard time thinking of any others. No prob &mdash we can expand your field repertoire in a jif!

Recall the mod operator from somewhere in your education; let's work mod 6 for the time being. A computer scientist would view "mod 6" as a procedure, along the lines of "subtract 5 from your doohickey until you end up with something less than 5, and spit out the result." One says things like "23 mod 6 = 5," and the statement "23 mod 6 = 11" is fairly universally dismissed as false.

To a mathematician, on the other hand, "mod 6" is a whole way of life. To use the phrase "mod 6" in mathematics means that you are doing arithmetic within the set 6, which is the set you get from the integers when you set 6 equal to 0. You write 23 &equiv 5 (mod 6), or, less artificially, just 23 = 5 (mod 6), but equally well 23 = 11 (mod 6). There's nothing privileged about the number 5; it's just a convenient representative for the set of all of the (equal) numbers ..., -7, -1, 5, 11, 17, 23, ... mod 6.

I don't want to spend too much time here; note that for any n, n is a ring, because it inherits this structure from the integers. 6 is not a field, however. For example, you can't divide 3 by 2, since any multiple of 2 is either 0, 2, or 4. On the other hand, 2 is a field! This is extremely easy to verify, since there is only one nonzero element, namely 1, and it obviously has a multiplicative inverse, namely itself. When we're thinking of 2 as a field, we often write it as F2, the (unique!) field of two elements.

Since 1 + 1 = 0 in F2, addition in F2 is the same as the "exclusive or" (XOR) operation you're used to from whatever class you study things like "exclusive or." The reason I made the detour into algebra-land, when we already had one piece of notation to express the problem, is that it's really nice to be able to express this operation algebraically. The XOR of two bit strings then becomes their vector sum, for example; thus, we have the language of matrix multiplication available to us in order to solve Boolean equations with XORs in them, and we have the usual theory from linear algebra to guarantee the existence of solutions of such equations.

Once more, with algebra!

OK, back to our story! Now, every file can be expressed as a vector inside F2n, where, recall, n is the number of bits in each file. We're going to adopt the convention that our vectors are rows &mdash I know, just the opposite of how we did it back in statistics-land, but this seems to be the accepted convention in this space. (I guess the ability to be flexible in one's thinking is valuable, anyway. C'mon, it's good for you!) Our cover files Ci are such vectors; let's form the k × n matrix

C =
C1
C2
.
.
.
Ck

Recall that we wanted to express a given file Fi as the XOR of those Cj for which the jth entry of Pi is equal to 1, where Pi is the password for Fi. In symbols, then, if we write Pi = (pi1, ..., pik), this requirement turns into the equation Fi = &sumpij=1 Cj. It will sometimes be convenient to write this criterion as the matrix equation

Pi &sdot C = Fi.

It may take you a minute to see why this equation is equivalent to the sum equation that came before it. Suppose, for example, that the third entry of Pi is 1. Then as we do our matrix multiply, moving across Pi and down the first column of matrix C, we see that the first entry of C3 is multiplied by 1 and so will contribute to the first entry in the product. Similarly, as we move down the second row of C, we find that the second element of C3 also contributes to the product. And so on. Similarly, any Cj for which the jth entry of Pi is 0 would similarly have all of its entries wiped out in the sum.

So far we've done a good job of wishing, but how do we actually go about encoding our files? Well, let's start with any C1, ..., Ck that we like, and suppose we have a file F and a password P for it. Then we calculate the product P &sdot C, which of course is almost certainly not equal to our desired F. But here's what we're going to do: Choose any one of the files, say C*, that actually did contribute to the sum, and replace it with C = C* + P &sdot C + F. Replacing C* by C in the matrix C yields an updated matrix C. By the way we chose C* in the first place, the row C does contribute to the product P &sdot C, and in fact we can calculate the entire product in terms of the original product P &sdot C. Namely, it's equal to the sum of all the Cj that were there before, plus C. The first summand in C is the old C*, so putting that with the rest of the product gives back P &sdot C; then you get the rest of C, namely P &sdot C + F. In toto, then, P &sdot C = P &sdot C + P &sdot C + F, and since anything plus itself is 0, this last thingy is equal to F, which is just what we were after!

The next step is to understand how to encode multiple files. In order to do this, we need to do a little more linear algebra. Recall that if u and v are row vectors of the same length, their scalar product is just the product u vT, where T stands for transpose. The scalar product of two vectors is, of course, a scalar. Two vectors are orthogonal if their scalar product is 0. I should warn you that orthogonality is a little weird working over F2, because a vector with an even number of 1s is orthogonal to itself! Finally, a set of vectors {ui} is called orthonormal if the scalar product of each ui with itself is 1 and each ui is orthogonal to all the other uj.

Now, to pull off the trick with multiple files, we're going to start by making sure that our passwords form an orthonormal set. If we have m files, and therefore m passwords, we can form the passwords into an m × k matrix, which I'll call P for the password matrix. Note that PPT = Im, the m × m identity matrix.

So suppose the ith file Fi is currently equal to Pi C, and we modify Fi to Fi + D. The goal is somehow to modify the cover files Cj so that we can both extract the new Fi + D using Pi and also still extract all the other files with their passwords. Suppose that the modified file is the vector Fi + D. Let's modify our file C to C + PiT D. (Reality check: PiT is k × 1, and D is 1 × n, so the product is k × n, so it makes sense to add it to C.) Now,

Pi &sdot (C + PiT D) = Pi C + Pi (PiT D).

The first term in the sum is the old Fi. Since matrix multiplication is associative, the second term is equal to (Pi PiT) D. Now, watch this! We chose the passwords to be orthonormal, so Pi PiT = 1. Therefore the above calculation for Pi &sdot (C + PiT D) reduces to F + 1⋅D = F + D, just as desired.

The other thing that we need to check is that if we choose any other file Fr, r &ne i, the product Pr &sdot (C + PiT D) is equal to the product Pr &sdot C, so we still get back our unmodified file Fr. The calculation looks pretty much the same:

Pr &sdot (C + PiT D) = Pr C + Pr (PiT D) = Fr + (Pr (PiT) D.

Orthonormality comes to the rescue again, telling us that (Pr (PiT) = 0, and therefore Pr &sdot (C + PiT D) = Pr &sdot C = Fr.

*

Reference

Anderson, R., Needham, R., and Shamir, A., "The Steganographic File System."

Kandinsky was a Sissy

Unlike your previous assignments in this class, the final assignment -- the one for which this most excellent diatribe is preparing you -- will require you to engage in rather more computer science than perhaps you might expect. In a nutshell, you will be provided with some number of images, each of which will contain an encrypted "hidden" image that you must extract. The sordid details associated with this adventure will be forthcoming, but in preparation for their enjoyment you should begin to understand the image file format known as PNG. PNG, the website tells us, stands for "Portable Network Graphics" and not "Professional Numismatist Guild" which, on balance, I find to be a shame, but so be it.

There are undoubtedly a bunch of good reasons why you should appreciate the full beauty and splendor of PNG by reading its most recent full specification but offhand, I can't think of what those reasons are. We'll try to cover the basics and then get down to brass tacks instead.

First, from a high level, you can think of a PNG as a kind of image communication protocol. A PNG file is a little like a network packet. It has a header that identifies it as containing data that should be interpreted as a PNG image an a series of sections -- called "chunks" -- that describe how the image is represented. Chunks carry different kinds of data and "metadata" that can then be composed into an image. Each chunk has a length, a chunk header, potentially some data if the length is not zero, and a CRC for the chunk occurring in that order. Unlike a network packet, however, chunks are variable length and can come in any order (almost) within the file. Thus a PNG file is meant to be interpreted as an identifying header followed by a "stream" of chunks that can be parsed in the order they arrive to construct an image.

So why have yet another specification? I mean, TIFF, JPEG, GIF? Seems like the last thing we should be spending our time on is another specification for images. I'm not sure what to say about that philosophically, but it is clear that PNG is designed to fill in some "gaps" that its designers perceive these other standards to have. PNG uses lossless compression which makes it suitable as a format for image editing (as opposed to JPEG). It also supports "alpha channels" (these are not TV channels with specific significance levels) so that pixel transparency can be represented, and "gamma correction" that encodes brightness information so that differences in local graphics hardware can be removed.

There are other issues which you can read for yourself, but the bottom line is that it is fairly general and lossless, which makes it an excellent choice for us to use in this class.

Brought to you In Living Color

The feature of PNG that you will need to assimilate is that, ultimately, it represents each pixel in an image as either A PNG image is either encoded as grayscale or color. We'll keep our remarks confined to issues of color but you should note that if you want to engage in PNG-noir, you can.

There appear to be three size choices for pixel representation: 8 bits per pixel, 24 bits per pixel, and 48 bits per pixel. The 8-bit version encodes color differently than the 24-bit and 48-bit versions. In an 8-bit image, the pixel value is used as an index into a color palette (which comes with the PNG file as its own chunk) to indicate what color the pixel should be. Clearly, pixels are thus limited to 256 different colors.

The more common representation (called "truecolor") uses 3 bytes (24 bits in total) to represent the red, green, and blue intensities, one per byte. Thus a pixel can take on one of 256^3 color values, where the color is an R-G-B composition of three 8-bit intensities. We'll stick with truecolor images in this class.

For completeness, the 24-bit representation has a double-precision big sibling in which each R-G-B triplet is represented by three 16-bit values. According to the FAQ, these images are used primarily by scientists and the entertainment industry (sounds like this class to me), and as a result, viewers and editors of images at this precision are hard to come by.

Both the 24-bit and 48-bit representations, also, can come with an alpha opacity byte (in the 24-bit case) or a 16-bit opacity value (in the 48-bit case), or not. Thus your options for color are

So that's the deal with PNG. Armed with this information and a few other details such as the understanding that bytes are stored in network-byte-order format you, a sufficient quantity of computer science machismo, and couple of pots of coffee should be able to whip up a PNG image parser in no time (well, maybe a little time). However, as promised, this class, will not require you to get in touch with your inner hacker quite so intimately. Instead, I urge you to take full advantage of the lovely work your friends in the land of Open Source have deigned to perform.

In particular, there is a PNG manipulation package called libpng that comes pretty much as standard equipment on any Linux system, including those available in the CSIL and GSL.

Allow me to briefly apologize to those of you who are Java programmers. I expected that since the primary use for PNG is to encode images for the web, it would most certainly be a Java package since we all know that Java is the web programming language of choice. I'll leave it to you to decide whether it is instructive to realize that the the gold standard for web image processing is written in C (not even C++) and has a C interface by default. Now back to our regularly scheduled program.

The libpng library is fairly high quality and portable enough to work out of the box on both Linux and OSX. It provides a fairly straight-forward interface to implementations of the various PNG parsing and manipulation functions that are so painstakingly discussed in the specification. If you read the documentation for libpng and you like what you see, feel free to use its interface directly.

However.

The maintainer of the PNG home page, Greg Roelofs, has written a set of simple wrappers for libpng functions that allow you to read and write PNG images without getting too caught up with the specifications semantics. With a small set of modifications that I needed to make my code work, I've placed this code HERE and also in this tarfile. Using Greg's interface, I've written a sample program that reads and write PNG images as an example that should prove useful as you develop your code. We'll spend a little time understanding this code in preparation for your assignment.

Simple PNG I/O

The program rw_png.c reads a 24-bit image in (either with or without an alpha channel) and then either displays the R-G-B values to stdout (if no output file is specified), or writes out a PNG file (either with or without alpha). From this example, I built my solutions to the assignment. The example code is also quite useful for either adding or removing an alpha channel from an image. Let's take a look at selected highlights from the code.

The first thing to understand is that Greg's interface expects the PNG file be be opened "rb" -- read binary when reading.

        fd = fopen(Fname,"rb");
        if(fd == NULL)
        {
                fprintf(stderr,"couldn't open file %s\n",Fname);
                exit(1);
        }
No big deal. The next step is to read up the header information, primarily, so that you can allocate the space needed to hold the image in memory. His code is clever enough to allow all sorts of options including a line-at-a-time option, but we'll skip those fancy ideas for now.
        rc = readpng_init(fd,&width,&height);
        if(rc != 0) 
        {
                fprintf(stderr,"readpng_init error %d\n",
                                rc);
                exit(1);
        }
The variable rc is an error code, and width and height are out parameters indicating the dimension of the image in pixels. Greg's code uses an internal static buffer to hold the header information you you thread heads need to be careful here.

To read up the image bytes, the next step is

        image = readpng_get_image(display_exp, &channels, &rowbytes);
        
        if(image == NULL)
        {       
                fprintf(stderr,"readpng_get_image fails\n");
                fclose(fd);
                readpng_cleanup(FALSE);
                exit(1);
        }
Here, Greg gives you the number of bytes in a row which could could also calculate as width * channels in our case. These values, however, come through the libpng interface, however, which makes me think that if there is fancy business going on, the rowbytes computation is not that simple.

The fact that rowbytes comes back also gives you a clue as to how libpng stores the image. As you might expect from a C programmer, image is a pointer to a one-dimensional array that holds a two-dimensional matrix of values stored in row major order (sorry FORTRAN fans).

And there you go. Your image is in memory in row major order and and the fun can now begin.

The first bit of fun is to print out the R-G-B and alpha values just so you can get your eyeballs around what they look like. Your Aunt Heloise found this feature to be particularly useful in developing her recipe for steganography. Perhaps you will too.

First -- a little typing (pun obviously intended with regard to your previous assignment). The pointer variable image points to an array of unsigned characters. We'll just make red, green, blue and alpha unsigned char as well. Next, if we don;t have an output file name specified, print the values to stdout and quit.

        if(Oname[0] == 0)
        {
                for(i=0; i < height; i++)
                {
                        curr = &(image[i*rowbytes+0]);
                        for(j=0; j < width; j++)
                        {
                                red = curr[0];
                                green = curr[1];
                                blue = curr[2];
                                if(channels == 4)
                                {
                                        alpha = curr[3];
                                        curr += 4;
                                printf("r: %d, g: %d, b: %d, alpha: %d\n",
                                                        red,green,blue,alpha);
                                }
                                else
                                {
                                        printf("r: %d, g: %d, b: %d\n",
                                                        red,green,blue);
                                        curr += 3;
                                }
                        }
                }

                fclose(fd);
                readpng_cleanup(FALSE);
                exit(0);
        }
What could be simpler? Writing out your own PNG? Let's see...

Greg's write interface is a little bit more clunky than his read interface for our problem. In Greg's defense, I've lifted the interface calls from two different codes he's writtten: one to display a PNG in an X-window and one to write a PNG from a PNM file. In the latter, he uses a large structure to pass the values to his interface and for reasons that are not entirely clear to me he defines it as a global. So be it. Near the top of the file, we find

mainprog_info Wpng_info;        /* global in Greg's code */

int main(int argc, char *argv[])
{       
        int i,j;
	.
	.
	.
and later
        Wpng_info.infile = NULL; 
        Wpng_info.outfile = NULL;
        Wpng_info.image_data = NULL;
        Wpng_info.row_pointers = NULL;
        Wpng_info.filter = FALSE;
        Wpng_info.interlaced = FALSE;
        Wpng_info.have_bg = FALSE;
        Wpng_info.have_time = FALSE;
        Wpng_info.have_text = 0;
        Wpng_info.gamma = 0.0;
Notice all of the goodies one gets to consider with respect to writing one's own PNGs. In this class, we'll keep it pretty simple. These are the initial default values that Greg uses. We'll modify them a bit later.

First, though, we open the file "wb" and tell Greg's code that is the file we want to use for output.

od = fopen(Oname,"wb"); if(od == NULL) { fprintf(stderr,"couldn't output file %s\n", Oname); fclose(fd); readpng_cleanup(FALSE); exit(1); } Wpng_info.outfile = od; Now here we have an artifact from whence Greg comes. The write code assumes the file you are writing is in PNM format which uses the same representation as PNG's image data format. The identifier numbers, though, looks like 2 * channels for some reason. Thus, if you want to write out a 4-channel PNG, you need to set the pngtype field to 8 in the output structure. That you can set it, though, is kind of cool. It means, for example, that you can add an alpha channel to your favorite non-alpha image. Similarly, you can remove an alpha channel from a 4--channel image if so desired. The PNMtype variable in my code is set via a command-line argument to indicate whether you want a 4-channel or 3-channel PNG as an output file. In the case where you are adding a channel, the opacity is set to 255 (fully opaque) in this code, but you could change that.

        if((PNMtype == 8) && (channels == 3))
        {
                printf("converting 3 channel to 4 channel\n");
                new_image = (unsigned char *)malloc(height*width*4);
                new_rowbytes = width*4;
                if(new_image == NULL)
                {
                        exit(1);
                }
                memset(new_image,0,height*width*4);
                for(i=0; i < height; i++)
                {
                        curr = &(image[i*rowbytes+0]);
                        new_curr = &(new_image[i*new_rowbytes+0]);
                        for(j=0; j < width; j++)
                        {
                                new_curr[0] = curr[0];
                                new_curr[1] = curr[1];
                                new_curr[2] = curr[2];
                                new_curr[3] = 255;
                                curr += 3;
                                new_curr += 4;
                        }
                }
                free(image);
                image = new_image;
                rowbytes = new_rowbytes;
        }
        else if((PNMtype == 6) && (channels == 4))
        {
                printf("converting 4 channel image to 3\n");
        }
And then we tell Greg's code what kind of image we want
        Wpng_info.pnmtype = PNMtype;
        Wpng_info.height = height;
        Wpng_info.width = width;
        Wpng_info.sample_depth = 8;

        wc = writepng_init(&Wpng_info);
        if(wc != 0)
        {
                fprintf(stderr,"writepng_init: error %d\n",
                                wc);
                fclose(fd);
                fclose(od);
                readpng_cleanup(FALSE);
                exit(1);
        }
And then we twll Greg's code where to find the data and to write it out
        Wpng_info.image_data = image;
        Wpng_info.row_pointers = (unsigned char **)malloc(height*
                                                sizeof(unsigned char *));
        if(Wpng_info.row_pointers == NULL)
        {
                fclose(fd);
                fclose(od);
                writepng_cleanup(&Wpng_info);
                exit(1);
        }

        for(i=0; i < height; i++)
        {
                Wpng_info.row_pointers[i] = &(image[i*rowbytes+0]);
        }


        wc = writepng_encode_image(&Wpng_info);
        if(wc != 0)
        {
                fprintf(stderr,"writepng_encode_image error %d\n",
                                wc);
                fclose(fd);
                fclose(od);
                writepng_cleanup(&Wpng_info);
                exit(1);
        }
There are a couple of subtleties here, but not many. First, his interface expects a 1-D array of row pointers. I believe the reason for this interface decision has to do with interlaced files which can be generated a row at a time. No matter.

And finally, we clean up.

        free(Wpng_info.row_pointers);
        free(image);
        writepng_cleanup(&Wpng_info);

	exit(0);

Piece of cake, no? This interface, in all of its simplicity, is quite powerful. Testing it a bit, it understands all of the 24-bit images I've tried with it, including those that are heavily compressed. Certainly for the class assignment you won't need anything more sophisticated.

Hiding in Plain Sight

So now what? You have what you need to see the insides of of PNG, but how are we going to use this information to hide an image? The answer is twofold. First, we are going to need a way to make a set of cover files from the R-G-B and alpha values of the image and vice versa. Secondly, we are going to need a way to generate a password. Let's talk about the first thing first.

Recall that the steganography algorithm we have discussed assumes that the original set of cover files contains random data. That "random" data, however, could be somewhat less than random and the algorithm will still work. Indeed, if, say, each cover file contained an equal portion of the bits in a PNG image, that would be fine. Moreover, if we had a way to disseminate the bits from a PNG into a set of covers, perform the magic dance of the F2, and then reform the image, we'd just about have it. As long as we could choose a password carefully, we'd be able to

read in a PNG image for the hiding file
read in the PNG image of the file we wish to hide
form a set of cover files by distributing the bits from the hiding PNG
perform the steganographic encryption using the covers and the hidden file
reform the image from the resulting set of covers
There are a few details here that might bear discussion. The first concerns how the bits are to be disseminated from the hiding image to the cover files and back again. You could tesselate the hiding image with cover files, but then one "tile" would be scrambled while the others would remain unaltered.

A better way -- and the one we plan to pursue here -- is (assuming with have k cover files in all) to put every kth bit in cover file i starting with bit i in the file.

For example, let's say that there are four cover files and completely cover the image. We can put bits 0, 3, 7, 11,... in cover file 0 (this is a computer science class after all), bits 1, 4, 8, 12,... in cover file 1, etc.

Recall that to pull the steganography trick off, your cover files all have to be the same size and the same size as the file you are trying to hide. Since the division is not likely to come out even, you should assume that your cover files will be padded with zeros whenever necessary.

When would that be necessary? Clearly, for this method to work, the size of each cover file has to be the smallest integer larger than the size of the hidden file that evenly divides the size of the hiding file. In short, for this assignment, the sizes matter.

Speaking of passwords, the next issue that we need to get clear is how it is that we'll be determining a password. Your favorite aunt and uncle have given this some thought and the answer is that we will use a linear congruential generator (see below) that we will seed with the product mod a maximum value of the bytes that make up the ASCII code for a string containing the password. Thus the algorithm for generating a k bit password might be represented as

acc = 1.0;
for(i=0; i < strlen(s); i++)
{
	v = (double)s[i];
	acc = acc * v;
}

v = fmod(acc, (double)mod);
seed = (unsigned int)v;
SEED(seed);

for(k=0; k < number of cover files; k++)
	if(RAND() > 0.5)
		cover_file_password[k] = 1;
	else
		cover_file_password[k] = 0;
where mod defines the maximum possible value + 1 for the seed.

The only wrinkle left is to run the steganographic algorithm using this password and then back the bits out of the cover into their proper place and you have a password protected hidden image (albeit not one that is too protected, as we shall see).


Talkin' Bout Pseudo-Random Number Generation

Let's examine the generation of "random" numbers using a computer. It is by now a hoary old chestnut that a computer is inherently deterministic &mdash although I honestly feel that some of my experiences with Microsoft products gainsay that assertion &mdash and therefore it is unsuitable for the generation of a random sequence. On top of that, if you are after after random numbers from a continuous distribution such as our favorite uniform distribution on (0, 1), you run up against the limitations of the precision of your machine.

At the same time, it's pretty clear that random numbers are useful. In this very class, we do the Monte Carlo thing enough that rolling a fair die or pulling black and white balls out of an urn would be rather onerous. (Sorry, I've always wanted to use the words "urn" and "onerous" in the same sentence.) Heck, your favorite programming language probably has a "random number generator" built into it, in spite of all those naysayers and their chestnuts.

So where does that leave us? What the random number generators really do is generate pseudo-random numbers, which pretty much means "stuff computers spit out because they can't do really random numbers." The pseudo-random numbers in question are almost invariably designed to imitate random numbers drawn from U(0, 1). So how "random" is pseudo-random? It's often expressed that you want your numbers to "be random enough" for your purposes, but to me a better way to express it is that you want them to look random enough &mdash but the whole idea of randomness is very slippery. It's not as though you can look at sequences of numbers and definitively identify them as "random" or "not random"; randomness is more about the process that generates the values, and I don't want to get into a philosophical discussion about whether truly random processes even exist! (We also typically want them to look independent enough, which is a totally different issue, but nobody seems to mention that little fact. The world is a perplexing place.) And the question of how random the numbers need to look is also a point of consideration: For example, if you're doing Monte Carlo simulations for some dingbat class based on some TV show, you probably don't have to be too fussy; on the other hand, if you're basing a cryptographic scheme for sending sensitive government secret messages on your pseudo-random number generator, your standards should be pretty high. Fortunately, there are various tests that one can use on a sequence of pseudo-random numbers that can at least identify glaring differences between its statistical properties and those of a (hypothetical) sequence coming from a "true" random process.

Ya down with LCG?

Pseudo-random number genarators are typically designed to start with a seed and then generate each successive number as a relatively easy-to-compute function of a relatively small number of previously generated numbers. As a rule, the generator will actually produce an integer, which it then reduces modulo m for some large integer m and finally divides it by m to produce a number between 0 and 1.

The most straightforward approach, and one that happens to work decently if you do things right, is the linear congruential generator, or LCG. (Not to be mistaken for the Lira Collectors Guild, which is the Italian version of the PNG.) In this case, the acronym is beautifully descriptive: We generate integers, mod some modulus m, by starting with a seed X0 and using the (linear) iterative rule

Xi+1 = a Xi + c   (mod m)

In this equation, a is called the multiplier (catchy, no?), and c is the increment. There are a few things to note about sequences of numbers generated in this way. First and most glaringly, there are only m different possible values for Xi, and each Xi determines Xi+1, so the whole thing will go into a repeating cycle of length at most m. This issue is endemic to all congruential number-generating schemes; even fancier mod-m methods using more than one predecessor to generate each new number suffers from a similar problem &mdash there are still only m different values, so there are only a finite number of different combinations of inputs before you start re-using them and get into a loop. So first off, your m should be large enough so that your LCG doesn't repeat itself too quickly for your purposes. I don't see a huge amount of theory as far as exactly what kind of number m should be for a good LCG. Often, m is just a large power of 2, maybe the biggest integer your machine can handle, but that's not too hard a value to guess. On the other hand, as we'll see in the next section, a prime is not a super choice, nor is a product of distinct primes.

The question of "how good" a random number generator is can mean at least two different things, depending on which branch of the government you're working for. The first interpretation of this question is "Does the sequence of numbers that my generator produces have enough properties in common with a 'truly' random sequence that it's possible to use the random numbers for Monte Carlo simulations?" In this space, some LCGs are pretty darn good, while others are not; by using a few more or less straightforward tests, which I'll go into a little bit later, we can fairly reliably separate out the bad from the good.

The second way to interpret the question is "Does the sequence of numbers that my generator produces provide enough unpredictability so that I can be sure that an interloper can't figure out the next number it will generate?" For these purposes, LCGs are downright lousy, as we will show.

Cracked!

By nature, the numbers generated by an LCG are closely related geometrically. For example, consider the sequence of ordered pairs (Xi, Xi+1) produced by the LCG described above. Then, forgetting the (mod m) part for a minute, all the points lie on the line y = ax + c. Even after we reduce mod m, you still end up with each of the points satisfying y = ax + c &minus km for some value of k, so all of the points live on a relatively small number of parallel lines; this phenomenon persists after you divide by m to get a number between 0 and 1. The same is true in higher dimensions as well: Ordered triples of points, graphed in 3-space, tend to fall on a relatively small number of parallel planes.

This flaw in the numbers generated by LCGs can be exploited fairly directly using our good friend Linear Algebra. The fact is that if you have four consecutive integer outputs from an LCG, say x, y, z, w, then the matrix

x y 1
y z 1
z w 1

has determinant 0 (mod m). By now you know me well enough to know that I can't just present you with a fact like this and not prove it! We can actually do it by expanding the determinant along the last column, scratching our heads, finding all the silly multiplication errors, and showing that everything cancels, but with a little thought, we can be lazier &mdash always a good tradeoff in my book. Notice that here we're doing linear algebra over a ring and not a field, but two important facts persist:

  1. Changing a row by adding to it a multiple of another row does not change the value of the determinant.
  2. A matrix with two identical rows has determinant 0.
If we recall that z = ay + c, w = az + c, then we see that subtracting a times the second row from the third gives the row
−c −c 1 &minus a.
Similarly, y = ax + c, so subtracting a times the first row from the second gives the row
−c −c 1 &minus a.
Now, by the first of the above properties of determinants, these two operations didn't change the determinant of the matrix, and by the second property, the determinant ended up 0, so it was 0 to begin with.

So what does that mean to you the cracker? Well, that means that if you just calculate the determinant of the above matrix as integers, what you end up with is a multiple of m, and not too big a multiple at that. Chances are that if you have, oh, six or seven outputs in a row, so that you can perform this procedure three or four times with different matrices, and you take the greatest common factor of the determinants you get, you'll find m, or else a really small multiple of m. Given m, you've essentially reduced the problem to the one-parameter problem of finding a, since a and m determine c from just two outputs. If you have, say, a 20-bit modulus, checking all (m, a) pairs might be computationally challenging, but finding m and then checking somewhere around 220 c-values by brute force is eminently possible.

Having more or less discussed the security problems with LCGs, I'd like to spend a bit of time on how to make sure that yours at least delivers reasonably high-quality quasi-random product.

Building a Better LCG

After you've chosen m, you'll want to choose the other parameters a and c of your LCG in such a way as to make a nice long cycle length. In fact, it's always possible to choose a and c so that the cycle length is equal to m, the maximum possible. The first easy observation is that a should be relatively prime to m; otherwise, there aren't enough different values of aXi. Also, c should be relatively prime to m as well; otherwise, once you get one aXi that has a factor in common with c and m, so will aXi+1 and therefore so will every subsequent term in the sequence.

But even that doesn't get you out of the woods. For example, let's take m = 7, a = 2, c = 3, X0 = 5. Then the sequence looks like

5, 2 &sdot 5 + 3 = 6, 2 &sdot 6 + 3 = 1, 2 &sdot 1 + 3 = 5, 6, 1, ...

(It's even worse if you use X0 = 4.) This little example illustrates the general principle that if m is prime, the only way to construct an LCG with cycle length m is to choose a = 1 &mdash which doesn't make for good pseudo-random numbers.

Using a modulus that is the product of distinct primes suffers from the same problem. Unless you use a = 1, a is going to fail to be 1 mod at least one of those primes, and this shortens up the cycle. In his weighty tome Seminumerical Algorithms: The Art of Computer Programming, Vol. 2, Donald Knuth is kind enough to tell us the following:

The LCG
Xi+1 = a Xi + c   (mod m)
has cycle length m if and only if
  • c is relatively prime to m;
  • for all prime factors p of m, a &minus 1 is divisible by p;
  • if m is even, a &minus 1 is divisible by 4.

Therefore, it's easy enough to construct LCGs with cycle length m: you can choose m to be a power of 2, for example, and suitable values of a and c abound. If you don't want to be so predictable, though, just make sure that one of your primes occurs to a power higher than 1 and do some arithmetic. For example, if you take m to be 3² ⋅ 5 = 45, you can make The Donald happy if you use a = 16.

But even that isn't enough!

Taking that LCG for a test spin

Suppose your LCG has cycle length m. Knuth has a few suggestions for determining whether it's any good. The first thing is to test its potency which is defined as the smallest integer n such that (a &minus 1)n = 0 (mod m). Such an integer will always exist, since we've chosen a &minus 1 to be a multiple of all the prime factors of m. LCGs with potency 1 or 2 end up giving outputs that follow very simple formulas; DK suggests that your LCG have potency at least 5. So be it.

The next thing to do is to examine the distribution of outputs that you get. Donny goes bolting for the &chi² test &mdash you divide the set of integers mod m into d bins, calculate the expected frequency of each bin as bin sizem and then take as your sample a run of at least 5d or so and calculate the &chi² statistic, using d−1 degrees of freedom in the table. The Knuthster notes that if you make your sample size really really big, &chi² can hide "locally nonrandom behavior" and goes on to suggest, "Perhaps a chi-square test should be made for a number of different values of n." I'd say n = 100 and d = 10 is a good choice at the "low" end, on up to n = 1000 and maybe d =50 at the other &mdash but that's just me. Knuth makes no specific suggestions.

The next thing you can do is to test whether there is some dependency between the bins into which adjacent numbers fall. Accordingly, let's take adjacent pairs from a run (Xi, Xi+1), (Xi+2, Xi+3), ... (Notice that we don't re-use any values as we did in the geometric examination we did earlier!!) and place them in a 2 × 2 table according to which bin they each fell into. The expected frequencies of the cell entries can be calculated simply from the number of values in each bin, so we can calculate from our run of values a &chi² statistic to compare against a critical value with (d &minus 1)² degrees of freedom, where d is again the number of bins. This time there are lots more bins, so you need a larger sample for a given value of d.

As per my rant back in the days at the Typing Pool, &chi² methods aren't all that appropriate for numerical data. On the other hand, in some sense this LCG thingy really does generate categorical data, with categoreies numbered 0, ..., m−1. On the other other hand, the values it generates often get divided by m and used to imitate the U(0,1) distribution, which is numerical. If we choose to treat the numbers as numerical, we can use the Kolmogorov-Smirnov test, which in addition to being the first experiment in cross-marketing between statistics and the distilled-spirits industry, is also a test of goodness of fit between a sample and a proposed distributional model. The idea for Kolmogorov-Smirnov is very simple: Calculate the cumulative distribution function Fsamp from your sample and that for the proposed model Fmodel. The test statistic is just

D = maxx |Fsamp(x) &minus Fmodel(x)|.

Note that the max here is taken over all x and not just the values that occur in the sample. The CDF is defined everywhere, after all, for both the sample and the model distributions. For large samples, say above about 80, which are the kind we'll be dealing with, reject the null hypothesis that the sample is from the hypothesized distribution at levels &alpha = .1, .05, .01, if Dn is greater than 1.22, 1.36, 1.63 respectively.

Of course, in this case the distribution we're hypothesizing is U(0,1) for the outputs of our LCG all divided by m. Again there is the question of how big a sample to take in order to detect both "local nonrandomness" and "global nonrandomness," so a couple of different sample sizes are probably in order, just as for the &chi² test given above. The inscrutable Knuth actually suggests doing a bunch of K-S tests for a fixed length, say 1000, recording the values of the test statistic you get, and then doing a K-S test with those values, using the asymptotic distribution function F(x) = 1 &minus e2x2. This is a weird enough idea to appeal to me, but DK doesn't really give much insight as to why he thinks it's a good idea.

Somewhere Over the Rainbow...

Of course, the above tests are not foolproof. It's tough to be foolproof in this space, but I did want to mention a test which, according to Don, not only do good random number generators pass, any known lousy one fails. This is called the spectral test, and my aim here is just to make you aware of it and its basic idea. Code for this thing is fairly readily available.

Recall the above observation that ordered t-tuples of consecutive outputs of a LCG fall on a relatively small number of hyperplanes. The idea of the spectral test is that the distance between these hyperplanes tells you something about how good &mdash the term he uses is "accurate" &mdash your random number generator is. He calls the accuracy &nut, which is defined by the equation

1&nut = maximum distance between hyperplanes, in the t-dimensional unit cube, taken over all families of equidistant parallel hyperplanes covering all your points.

You get various accuracies for different values of t. The point is that if the planes are very close together, your random number generator has a pretty fine granularity, and the &nut-value is correspondingly large. Big Don says we can take the leftmost log2(&nut) bits as generating essentially random t-tuples of U(0,1) numbers. He gives the practical heuristic that if your random number generator has &nut > 230t, it's a good random number generator.

Reference

Knuth, D. Seminumerical Algorithms Vol. II.

***