CS170 Lecture notes -- The Third Letter of the Alphabet

  • Rich Wolski
  • Source Code Directory: /cs/faculty/rich/public_html/class/cs170/notes/C
  • Lecture notes: http://www.cs.ucsb.edu/~rich/class/cs170/notes/C/index.html

    Compilation

    C is a file oriented language which means that things written in the same file actually are assumed to be related and things outside the file are essentially "libraries" that can be loaded with the file. For example, consider the program contained in p1.c.
    
    #include < unistd.h >
    #include < stdlib.h >
    #include < stdio.h >
    
    #include "int_add.h"
    
    int
    main(int argc, char *argv[])
    {
    	int first_num = 10;
    	int second_num = 20;
    
    	printf("the sum of %d and %d is %d\n",
    			first_num,
    			second_num,
    			IntegerAdd(first_num,second_num));
    
    	return(0);
    }
    
    
    You probably already know where printf() comes from (Unix supplied it, of course), but how it got into your program might be a bit of a mystery. In this program, there is another function called IntegerAdd() which is defined externally. In C, the idea is that your program can call lots of externally defined functions, much like in Java or C++. The difference is that they are linked in statically (for the most part) when you compile the program. In this case, the code for IntegerAdd() (contained in int_add.c) I've written
    
    #include < unistd.h >
    #include < stdlib.h >
    #include < stdio.h >
    
    int IntegerAdd(int a, int b)
    {
    	return(a+b);
    }
    
    
    It is pretty simple, but how did it get into p1.c? Let's look at the relevant lines of the makefile.
    
    
    CC = gcc
    
    EXECUTABLES = p1
    
    CFLAGS = -g
    
    int_add.o: int_add.c int_add.h
    	$(CC) $(CFLAGS) -c int_add.c
    
    
    p1: p1.c int_add.o int_add.h
    	$(CC) $(CFLAGS) -o p1 p1.c int_add.o
    
    The line that begins with int_add.o and the next line says the following

    The lines that begin p1: mean the following

    That is, the call to IntegerAdd() that is in p1.c will be resolved from the objectfile int_add.o.

    Now let's look at a slightly more complicated version in p2.c.

    
    #include < unistd.h >
    #include < stdlib.h >
    #include < stdio.h >
    
    #include "int_add.h"
    #include "int_sub.h"
    
    int
    main(int argc, char *argv[])
    {
    	int first_num = 10;
    	int second_num = 20;
    
    	printf("the sum of %d and %d is %d\n",
    			first_num,
    			second_num,
    			IntegerAdd(first_num,second_num));
    	printf("the difference between %d and %d is %d\n",
    			first_num,
    			second_num,
    			IntegerSub(first_num,second_num));
    
    	return(0);
    }
    
    
    Not hugely different, but you'll notice that there is a new external function called IntegerSub() that is called in p2.c. The makefile needs to specify, when p2.c is cmpiled, that some of the external references may need to come from int_add.o and some may need to come from int_sub.o.
    p2: p2.c int_add.o int_add.h int_sub.o int_sub.h
            $(CC) $(CFLAGS) -o p2 p2.c int_add.o int_sub.o
    
    The C compiler will first compile p2.c and then it will hunt through all of the object files listed after it to try and resolve the external references.

    C was developed to write things like operating systems which may have many developers each doing a small piece, but who have to link their routines together to make a larger whole. As a result, the number of .o files you might need to specify on a single compile line could be very large. To help with this problem, Unix contains a way to build libraries consisting of one or more .o files. To create one of your own, you can use the ar command. Let's look, again, at the makefile.

    
    libmylib.a: int_add.o int_sub.o
            ar cr libmylib.a int_add.o int_sub.o
    
    
    says to create a library called libmylib.a and to put in it all of the code that is in int_add.o and int_sub.o. Now look at how the binary for the program p3 is compiled from the source code for p2.c.
    
    p3: p2.c libmylib.a
            $(CC) $(CFLAGS) -o p3 p2.c libmylib.a
    
    
    The programs p2 and p3 are identical. The only difference is that when the C compiler built p3 it pulled the necessary external references from libmylib.a.

    One last thing. Much of the software you use from Unix in your programs (printf() for example) comes from libraries that are implicitly included by the C compiler automatically. These libraries come in two forms: statically linkable libraries and dynamically linkable libraries. The static versions have the suffic ".a" and the dynamically linkable ones have the suffix ".so).

    For example, on CSIL, the C compiler automatically appends /usr/lib/libc.so.6 to the end of all compilations. If you want to see what is in libc.so.6 try typing the following command

    
    objdump -T /usr/lib/libc.so.6
    
    
    The output is the list of relocatbale object files that were included when libc.so.6 was built.

    Alternatively, for a statically linked library, the command is "ar -t." Try

    ar -t libmylib.a
    
    on the library built by the make file and you should see
    int_add.o
    int_sub.o
    
    as the contents of the library.

    Unix observes a strange convention with respect to libraries and the "-l" flag to the C compiler. Much like for include files, there are a set of standard locations in which system libraries (like libc.a) are located. If you use the "-l" the system makes a library name that starts with "lib" and ends with ".a" or ".so". So "-lc" translates into "libc.a" or "libc.so". The linker which is a piece of software called by the compiler to link the object files together, takes this translated name and looks in places like "/usr/lib" for it.

    Now it turns out that for some estoteric reason the file "/usr/lib/libc.s' is now a "linker script" on Fedora Linux (the Linux intalled in CSIL). You can look at it.

    /* GNU ld script
       Use the shared library, but some functions are only in
       the static library, so try that secondarily.  */
    OUTPUT_FORMAT(elf32-i386)
    GROUP ( /lib/libc.so.6 /usr/lib/libc_nonshared.a  AS_NEEDED ( /lib/ld-linux.so.2 ) )
    
    What this says (I think) is that the compiler finds "/usr/lib/libc.so" when it specifies "-lc" silently. It then finds this script and replaces the library with the commands that load /usr/lib/libc.so.6 and /usr/lib/libc_nonshared.a. This is a new wrinkle but it is essentially the same mechanism as we have been discussing.

    You can change the places the C compiler will look with a "-L" option. Look at how the program p4 is compiled from the code for p2.c.

    
    p4: p2.c libmylib.a
            $(CC) $(CFLAGS) -o p4 p2.c -L. -lmylib
    
    
    Here, the linker makes the library name libmylib.a from the "-lmylib" option, and then looks in the standard places (where it doesn't find the library) and in the places specified by the "-L" option (which is specified to be the current working directory with a "."). Since it finds it there, it uses the object code in the library to resolve the external references found in p2.c.