a) interactive shell - Yale University

12
Compilers, linkers and makefiles…. Oh, no… Here are some quick pointers to producing executable code (and lots of other stuff you never thought you needed to know ). First, very few of you will be programming directly in assembly code/machine language, so your human readable program needs to get translated into something useable for the machine. There are currently three main models/methods for doing this. a) Interpreted languages and interactive shell (command) programs. Here the idea is that you are having a conversation with the computer, which is constantly “interpreting” what you are saying as you tell it what to do next. This is exactly what the “shell” program we’ve been talking about so much recently does, and a “shell script” is nothing but a list of the commands you would normally type in with your keyboard but that are stored in a file instead so you can send them (repeatedly) to the “standard input” channel (“stdin” in C language notation) of the shell program. The advantage/disadvantage of an interpreted language, like IDL and Matlab in interactive mode, is that everything is done on the fly and all variables are dynamically allocated. This is actually a good and a bad thing. Commands are interpreted line by line, and not all once (as in the compiled execution model below). So if you declare a variable “A,” the language first has to grab the memory from the operating system, which can be very slow (considering the billions of operations you can now do in 1 sec). That’s bad, but the plus side is that you can change everything “on the fly” as you get new ideas, and even though setting up the memory for a new variable is painfully slow by the computer’s clock, it seem almost instantaneous on human timescales. So the feedback you get from the program feels quick – if you change something in the program, just send the new program back to the interpreter and see if it works now. The premium here (for an interpreted language) is on the speed and flexibility of translating your wishes into computer actions. So an inte rpreted language is ideal for tasks like interactive data analysis, where you have to rescale and replot things, and what exactly you decide to do next will depends on what you see in the data. b) Tokenized/semi-interpreted languages. The distinction between the first (interpreted) and the third model (compilation) has vanished somewhat as compilers have gotten much faste r. Back in the bad “old” days, though, compiling a relatively simple program to produce machine code could take 2-3 minutes, even if you only changed one letter of the code. If you have to wait 2-3 minutes every time you change your mind, this will clobber your efficiency and eventually drive you crazy. (Trust me. ) So in the old days, compiled languages were only used for serious, complicated and well-defined tasks where computing efficiency was at a premium instead of responsiveness. If you’re wondering what “tokenized” at the top means, it is an approach to telling the computer that is half-way the interpreted and compiled language models. (The “Basic” language started out as a tokenized language.) It is designed for writing computer programs that are fairly sophisticated and intended to

Transcript of a) interactive shell - Yale University

Compilers, linkers and makefiles…. Oh, no…

Here are some quick pointers to producing executable code (and lots of other stuff you never thought

you needed to know ).

First, very few of you will be programming directly in assembly code/machine language, so your human

readable program needs to get translated into something useable for the machine.

There are currently three main models/methods for doing this.

a) Interpreted languages and interactive shell (command) programs. Here the idea is that you are

having a conversation with the computer, which is constantly “interpreting” what you are saying as

you tell it what to do next. This is exactly what the “shell” program we’ve been talking about so

much recently does, and a “shell script” is nothing but a list of the commands you would normally

type in with your keyboard but that are stored in a file instead so you can send them (repeatedly) to

the “standard input” channel (“stdin” in C language notation) of the shell program. The

advantage/disadvantage of an interpreted language, like IDL and Matlab in interactive mode, is that

everything is done on the fly and all variables are dynamically allocated. This is actually a good and

a bad thing. Commands are interpreted line by line, and not all once (as in the compiled execution

model below). So if you declare a variable “A,” the language first has to grab the memory from the

operating system, which can be very slow (considering the billions of operations you can now do in 1

sec). That’s bad, but the plus side is that you can change everything “on the fly” as you get new

ideas, and even though setting up the memory for a new variable is painfully slow by the computer’s

clock, it seem almost instantaneous on human timescales. So the feedback you get from the

program feels quick – if you change something in the program, just send the new program back to

the interpreter and see if it works now. The premium here (for an interpreted language) is on the

speed and flexibility of translating your wishes into computer actions. So an inte rpreted language is

ideal for tasks like interactive data analysis, where you have to rescale and replot things, and what

exactly you decide to do next will depends on what you see in the data.

b) Tokenized/semi-interpreted languages. The distinction between the first (interpreted) and the third

model (compilation) has vanished somewhat as compilers have gotten much faste r. Back in the bad

“old” days, though, compiling a relatively simple program to produce machine code could take 2-3

minutes, even if you only changed one letter of the code. If you have to wait 2-3 minutes every time

you change your mind, this will clobber your efficiency and eventually drive you crazy. (Trust me. )

So in the old days, compiled languages were only used for serious, complicated and well-defined

tasks where computing efficiency was at a premium instead of responsiveness. If you’re wondering

what “tokenized” at the top means, it is an approach to telling the computer that is half-way the

interpreted and compiled language models. (The “Basic” language started out as a tokenized

language.) It is designed for writing computer programs that are fairly sophisticated and intended to

be run multiple times, but in a way that avoids most of the pain of that expensive compilation

operation. Instead of looking over the entire program to try to find efficiencies, the tokenizer takes

your input line by line and immediately converts all the operations on that line to a set of “tokens,”

where a token is a numerical value that points to a standard (pre-compiled) subroutine that is to be

executed when that token is encountered. So then a program is turned into string of pre -compiled

tokens, and if you decide to change something on a given line, you re -interpret the new line and

update the associated token list, which is fast since it is essentially “tokenization” is a look-up

operation. (Again, all the machine instructions associated with a token are pre -compiled, before the

tokenizer program runs.) The “tokenized” version of your program is an intermediate level

translation of your human program, where each token represents a generalized operation. The cool

thing about this is that once you have a tokenized program, it really doesn’t matter what language

you used to generate those tokens. Also, to the extent you have access to a token “interpreter” for

every machine type of interest, you know have a “machine independent” language. This is how Java

achieves its “machine independence” yet manages to still produces fairly compact code files. [But

see below for how to improve Java performance on any given machine…]

c) Compilers – the “right” way to do things. In this model, you are forced spell everything out that you

want the computer to do beforehand. You then feed it to a compiler which translates it an object

code (machine language) “stub.” Since the computer science compiler wizards can see everything

that you want to do up front and you are not allowed to change your mind, they can do a much

better job of figuring out how to “optimally” translate your ASCII text instructions into the

instructions the microprocessor actually understands. For example, if the compiler knows you will

be performing the same set of operation on a bunch of array elements, it can arrange the data and

the operations to be done in such a way that the multiple pipeline units on the CPU can be

continually fed and we don’t “bust” caches. Again, in case you just got confused, the difference

between the output of a tokenizer and a compiler is that the compiler doesn’t produce tokens but

rather real machine (assembly) language that can be interpreted directly by the CPU. Since we’re

explicitly dealing with the lowest-level details of the CPU, we can presumably do the best possible

job of mapping our human instructions to machine instructions, and a very big part of why we can

do things more efficiently is because we have the entire program to look at (so in principle we know

what the programmer is up to) and the programmer can’t change his mid-stream. This means we

can look for optimizations and, e.g., think about the best way to store or order the program’s data.

And more importantly, if we know we’re going to need 1 MB of array data, we can allocate it

statically, before the program runs so we don’t have to wait for the operating system to finally give

us 1MB at some random time. And we can also do things like initialize that array data and have it

ready to go before the program is run. One consequence of this is that a simple “hello world”

program that happens to declare a 1 MB array can be very small in size in an interpreted/tokenized

language (after all there are only a few lines of code) yet the compiled program that allocates the

1MB array and initializes it is guaranteed to have a size > 1MB. This is again good and bad. It’s bad

because it takes a lot more space to store the program code (and is why people developed “DLLs –

dynamically linked libraries” of code that are shared between problems and can be loaded into

memory at the last minute.) BUT it’s very fast and efficient. Yes, still we have to wait for the

operating system to free 1MB of memory, but (i) it’s done at the beginning and doesn’t cause

random program execution behavior once the program starts, and (ii) the array loading and

initialization is a simple memory copy, which is a very fast, common operation. And one final big

advantage of the compiled program. Once you gone through all that effort, you don’t have to do it

again. You simply copy our code into memory and let the CPU have at it. With an interpreted

program, in principle you have to reinterpret it from the beginning, set up all the variables, etc.

Bottom line: compiled programs can be 100x faster than interpreted programs. If you want speed,

you don’t use IDL…

d) Just-in-time compilers … a hybrid scheme. O.K. Let’s say you’ve worked hard on debugging your

tokenized program, and you swear you will never touch it again. Well, if you’re not going to alter

your program on the fly, and I know what operations your tokens stand for, I basically have all the

information I need to figure out what you really intend to do, just as in the case of an ASCII text file

that gets compiled. So in the case of Java, the web browser or operating system provides a “just -in-

time” (JIT) compiler.” In the background, while you are online looking at YouTube, the JIT compiler

scans your downloaded Java applets and figures out an optimal way to convert the tokens into

object (directly executable) machine code. In this case, your Java applications will eventually start to

run at close to compiled speeds. The point here is that you are usually just a consumer of the Java

applications and thus don’t need the speed/flexibility in changing an application that is provided by

an interpreter/tokenizer.

e) O.K. I finally understand this ASCII to machine code translation business, but what does a linker do

and why do I need it? After all, didn’t you just tell me a compiler, in particular, produces machine

code that can be directly fed to the CPU? Yup, if your text program is complete, i.e., contains all the

function and subroutine definitions you might need, AND your operating is very simple so that it

only runs one program at a time, i.e., you know exactly where in memory your program will live,

then indeed you don’t need a linker. An example of this is the early IBM computers that operated in

“batch” mode. The computer operator would literally feed the IBM punch card reader a “batch” of

cards (the complete text version of the program). These would get translated and loaded by the

assembler or primitive compiler into machine code that was directly copied into memory. When the

computer operator was satisfied everything was loaded properly and no punch cards had been

eaten or missed, he would press the “go” button and the CPU instruction pointer would get set to

the address of the first memory location, and that particular “batch job” would start executing.

Usually, it would produce a printout on a big line printer that operator would then collect and put

together with the punch cards, which would then all be handed back to the computer user who had

submitted that particular “batch job,” so the user could figure out why his program had failed (or as

occasionally happened, actually worked!). The computer operator would then grab the next batch

of cards, feed it into the punch card reader, and the process would start all over. The reason I have

gone into such excruciating detail is that the “batch mode” of execution still exists today (e.g., on

Yale’s big clusters), and I wanted you to understand how a computer operates at the most basic

level. But it didn’t take longs for things to change…

First of all, the human operator was replaced by an “autoloader” that continually fed punch cards

into the reader and hit the “go” button, etc. without human intervention. Next people decided that

it would be cool if the autoloader could respond differently to different situations, so that it became

a punch card program that had to be loaded into memory first, and depending on what exactly was

going on, it might take up different amounts of the machine memory. That complicated the story

because a user’s code now had to be “relocatable,” i.e., written in such a way that the autoloader

could quickly figure out to change it so it would still run if it were copied to a different chunk of

memory (with a different starting address). This lead to the notion of “relative addressing” (where

the pointer to a particular memory location was computed by adding an offset to a base value

stored in a register on the CPU) and “entry points” or “code hooks” where the autoloader had to

insert specific values computed at run time (after it had decided where to copy the next program to

be executed). The need to be able to load a program into different memory and still have it run

became particularly necessary once people discovered/decided that a computer could appear to do

multiple things at once (“multi-tasking”) by loading several programs in memory at the same time,

and then “time slicing,” e.g., by executing the next 100 instructions of the first program, then

executing the next 100 instructions of the second program and so on. Programs came in different

sizes and lengths, so it was impossible to predict where any given program would end up in

memory. The output of the compiler therefore turned into an “executable image” that had to be

processed by the autoloader before it would do anything useful, and eventually the autoloader got

so complicated that it began to be called an “operating system.”

Just around the time computers started to multi-task, the first random access, non-punch card

storage appeared! One could store the content of thousands, even millions, of punch cards on a

magnetic drum and load any of them into system memory on demand. In particular, because one

could store the compiled, executable image of a program on the magnetic drum too, it suddenly

made sense to separate the process of compilation (translation to machine code from that of

execution. You could compile your program, and then wait weeks to execute your image, or you

could ship the compiled image to your friend via a magnetic tape, and he could copy it onto his

drum without ever having to run a compiler on the source (text) code. (And maybe you didn’t want

him or her to, because you didn’t trust them to keep your source code private.) And so the

compilation stage of the old autoloader turned into just another program (the “compiler”) that you

could call up on demand to produce executable images.

From this point, it was only a small step to the notion of a linker program. For you see, that compiler

program was often abominably slow. Back then, a thousand-line program could even take hours to

compile, and as I noted above, this made it extremely unwieldy to debug. So computer scientists

came up with the idea of breaking up the executable image into chunks. After all, we were already

familiar with the notion of breaking the text version of the program up into subroutines and

functions, so why not do the same for executable image? Let’s say a program consisted of one

thousand lines distributed fairly evenly over ~10 subroutines. Now if I forget a semi-colon

somewhere and the machine code is broken up into chunks corresponding to the subroutines , I

would only have to recompile the subroutine with the mistake, and not the entire program, which

would be ~ten times faster. This breaking up of a program’s machine code is particularly essential

now that are codes are millions of lines long and can contain thousands of subroutines. Breaking up

the executable image into chunks corresponding to separate program functions (which we can then

mix and match as desired) is indeed a brilliant idea – but only as long as we can eventually put all

those pieces back together again!

There are typically two ways this is accomplished. The first is to let the autoloader (the operati ng

system) do it every time the program is run. This may sound like a lot of extra work, and it is. Why

not assemble the complete program once and for all, and then use this complete image from then

on? This is in fact exactly what a linker does, but…. In our example program with 10 subroutines,

the total program size is 10 times larger than the size of the individual chunks, which might be quite

large. Since storage space on magnetic drums is limited, is there any way to save some space? Not

really if all the subroutines/chunks are written you and not in common with those used by other

programs on the machine. But what if your program is a “hello world”-type program:

Main()

{ printf(“hello world!\n”)

}

How much storage space could that possibly take? Unfortunately, if you manage to produce the full

executable image that needs to be loaded into memory, you will be shocked to discover that it could

take several or tens of megabytes! How could this possibly be? Your program only prints out a few

characters!? But there’s a catch? How do you get those characters to appear on a screen? You’re

not actually doing this yourself, but you’re actually calling the “printf” subroutine to do it. And

remember that this printf subroutine has to know how to handle al l kinds of special formats and

potentially send characters over the internet? It’s actually quite complicated, and there’s actually a

fair amount of behind-the-scenes stuff that happens for most programs. So your “trivial” program is

also quite complicated in its totality and takes up a lot of storage space.

Can we find any way to save space this time? Yes, because not just your program uses the printf

subroutine. In general, almost every program on your Mac or PC calls the same printf or windows

graphics routines, so in fact there is a massive duplication of information. So researchers came up

with the concept of “dynamic” (run time) linking. Instead of having to include the printf subroutine

in every compiled image, what if we stored a single copy of a compiled printf subroutine on the

magnetic drum, and only when we are ready to run our program, do we load this into memory too

and somehow join it to the rest of our program? Then suddenly the compiled version of our “hello

world” program needs only a few bytes of storage space since it just basically loads “hello world”

into memory and then immediately calls the printf routine. If we have 100 programs that use the

printf routine, and the compiled printf routine requires 1MB of storage, then we have just saved

ourselves 99 MB of storage! That doesn’t sound like much saved space by today’s standards, but it

was in the past, and this practice of “dynamic linking” in commonly used system subroutines still

continues today since the size of subroutines tends to bloat/grow with time as more features are

added. And actually, a modern operating system contains thousands of sub-programs that call the

same basic routines, so the saving can be quite significant. And… there is still one area of storage

where a few MBs are important. That is your RAM. Even if we do late-time linking of common

subroutines, in the end everything has to be copied and joined together into memory, so we won’t

see any space savings. Indeed, try opening a “trivial” terminal window on your computer and

depending on your operating system, you’ll see that 20 MB of space may vanish. A browser window

may take even more. Students these days tend to have windows open all over the place on their

screen, and the memory they consume quickly adds up. If you have 20 windows open, 400 MB of

your 2GB of main memory is then immediately gone. Some of that can be shuffled back to disk if

some windows are in the background and not being used, but in general you have lost a lot of space

to common subroutines, and continually loading in these subroutines also costs execution time. So

modern operating systems can “cache” these special, commonly used subroutine chunks by keeping

a copy of them in memory. This way if one of them needs to be copied and joined (linked) to some

particular program, it can be done quickly, via a RAM to RAM copy instead of having to pull the

information of the slow hard disk. Moreover, some subroutines are so critical and common that a

lot of work goes into making them re-entrant, that is callable by multiple programs, without having

to create multiple copies of the subroutine code. Subroutines like this (e.g., the one that knows

how to store characters on a hard disk) are defined to be special and are often part of the so-called

system “kernel,” a collection of core “system calls” or system functions that enable your programs

to interact with the hardware and the operating system autoloader and program scheduler, i.e.,

make a modern multi-tasking computer “run.” For these subroutines, we never have to keep

multiple copies around, and we do save space.

So why did I go into this long explanation of something that most people never notice? Because it is

an example of the linking process we’ll discuss in more detail soon , AND because if you don’t know

this is going and you download binary code from friends or web, you can easily get into trouble. For

example, your friend is very proud he finally got his “hello world” program to work, so you ask him if

you can copy the compiled program image to your machine so you can play with it. The file transfer

goes very quickly because it is a simple program (right? ). Then you try to run it. Rats. It fails. The

system complains about an “incompatible executable file type.” What’s the problem? Both of your

machines are running Intel processors, so the machine code should be identical? Yes, but you forgot

that your friend compiled his program on a Scientific Linux machine, and you are trying to run it on a

Mac. Guess what. Macs and Linux machines have slightly different conventions on how a program

image should be formatted and structured so that their autoloaders can figure out how to properly

install and relocate the full program into main system memory. Moreover, if any subroutines are

dynamically linked in at the last minute, there’s no guarantee that those subroutines exist or do the

same thing on both systems. So you are stuck, and in general , the same “binary” (compiled program

image files) cannot be used on different operating systems. Slightly frustrated, you log into a

departmental Linux machine, copy the program binary over to there, and try to run it again. Again

the program fails! This time you get the message “shared library libX11.so.2 cannot be loaded.” But

it’s the same operating system, you complain loudly. libX11 is a collection of X-windows routines,

the ones you need to draw graphics on your screen, so it’s kind of fundamental and surely must

exist on the system you just logged into. Yup, it does – sort of. When you go to /usr/lib (the

directory usually containing all the “shared” routines which will be dynamically linked in at run time)

you indeed find a libX11.so. But it is a symbolic link to a file libX11_v3.so.4 – a different,

incompatible file from the one your friend’s program needs. This is because the departmental

machine is running Scientific Linux 6.1, a new version, while your friend is still running the older

Scientific Linux 5.4 . So even if the two versions of the operating system are in principle highly

compatible, they are often not in practice because the versions of the dynamically linked

libraries/subroutines are different. Is there a quick way to fix this? If the operating system versions

are not so different, you can ask your friend for the libX11.so.2 from his computer so you can install

it on the newer Scientific Linux system. (Using the “ldconfig” program to tell the operating where to

find this new dynamic/shared library – exactly how to do this is starting to enter the world of system

managers, so I won’t go into it more here.) Ta, da. When the autoloader scans the program image

file and discovers it needs to copy in a subroutine that is sitting in the file libX11.so.2, it now knows

where to find it that file, and everything works! You need to know a bit about this dynamic linking

business, so you know what help to ask for when you start seeing error messages like this.

Summary: Modern “executable” compiled files are actually “stubs,” chunks of code that cannot be

run by themselves and need to be attached (linked) to other code and then initialized so that they

will run properly at a specific memory location in RAM. So, executable files in general cannot be

shared between different operating systems, even if they are running on identical hardware. If you

do want to share compiled code (e.g., because you don’t want to give your friend or customers

access to the source code), you have to be very clear about subroutines need to be linked in

dynamically and then give some hints to your friend/customers on how to get those dynamically

linked routines onto their systems if they happen not be installed. This dependence on “dynamically

linked” files can be very sneaky since using shared libraries is often the default behavior of compilers

(which in fact try to use as many shared routines as possible) and you will not be automatically told

that a particular set of shared files is implicitly being used.

Because dynamic linking can become a pain when sharing programs and you sometimes absolutely

want to have an executable file run correctly on another otherwise compatible system, you can

luckily turn off the dynamic linking mechanism (at the expense of increase d file and memory space

usage) and force the compiler (translator program) to produce a program image that has no

“external dependencies,” i.e., force it take whatever dynamically linked routines you might need and

explicitly copy them into your final machine code file so that you don’t need to depend on the

operating system to provide them to you. The technical jargon for this is that you “statically link”

your program, which is probably what you thought you would be doing all along when you started

reading this section. Note that in general, if you don’t set any special options the compiler produces

a hybrid machine code file, where all the non-generic subroutines that you wrote are included in the

code file but the system-shared subroutines are not (and are dynamically linked in).

So without further ado, onto Static linking: how do you create code chunks and then put them all

together (before the autoloader gets its hands on your code)? The best way to see is to examine the

compiler output for our “hello world” program. Written in generic assembly language, it looks

something like this:

Set BASE IP XXXX

Set BASE DP YYYY

PUSH “Hello World!\n” onto stack .

JSR _printf [ZZZZ]

RET

The XXXX, YYYY, ZZZZ are blank memory addresses (“pointers” that have not been initialized yet).

XXXX and YYYY need to be set by operating system autoloader at run time so they point to the start

of the code and data memory segments that the operating system copied the program image to.

(Remember these aren’t known until run time and depend, for example, on what other p rograms

are already loaded into memory, and so where the free memory slots are.) The ZZZZ is the memory

address that we have to “JUMP” to so we can start executing the printf subroutine. Unfortunately,

our hello_world.c program (as typed above) contains no definition of the printf subroutine, so the

poor compiler is stuck. It could simply quit and complain with the error message, “Hey, I can’t

possibly produce a working piece of code since you didn’t tell me what the printf routine is. I quit.”

However, because we are trying to break up our code into smaller, functional chunks that live in

separate files, the compiler will give you the benefit of the doubt, and simply insert a blank “hook”

(the ZZZZ memory value) that you will need to update to point to a piece of real code if you ever

want your program to run.

To make the final working program, we then need to run what has come to be called the “linker

program” or the “link phase” of the compiler. Usually the compiler automatically does this for us,

unless we give it the “-c” option, or you manually do this by executing a command that looks

something like “ln hello_world.o printf.o” . (We can typically add an argument like -static to this

command line to force all the dynamic subroutines to be copied and “linked in” too.) What

does the “ln” program do? It decides on the final memory configuration of the image, e.g., where

the hello_world and printf code chunks sit in memory, and then it scans all the code chunks for

“unresolved external references” – the ZZZZ memory address/hook found in the hello_word.o code

file – and since it now knows where all the subroutines will be sitting in memory, it fills in the ZZZZ’s

with the appropriate values. (It “resolves the links.” N.B. in computer science, “resolving” a generic

name or link means to assign it a specific, appropriate value. For example, if you want to talk to the

machine behind www.google.com, you first have to contact a name server, which then “resolves”

the generic name www.google.com by replacing it with an actual IP address that you can use to

direct data packets to the google server.) As a final step, the “ln” program then creates the

“memory image” of the program by creating a disk file where all the subroutines and data arrays are

properly located and copied, with all “external references” resolved. This memory image can then

be directly (quickly) copied into memory, and except for the last minute fixes the autoloader does,

the program is ready to go.

If for some reason, the linker scanned the subroutines you are using and found a hook that it

couldn’t resolve (e.g., because it wasn’t given the name of a file containing the code chunk for a

particular subroutine), then it will indeed complain “Hey bozo, error: unresolved externa l reference

found to _printf . ” and quite. It will not produce the memory image, because it would not be

complete. This is a common error message that novices encounter, and it means you either forget

to include some of your code chunk files on the command line, or you have to talk to your professor

or system manager about where to find the “right library” that contains the routine that appears to

be missing.

One more thing about external references. We’ll be talking later about “common” or “global”

variables that exist outside of subroutines and can act as central directories/memo boards to keep

track of information we want to share between subroutines. In a program, a particular global

variable can obviously be defined only once. So if we then decide to break our program into chunks,

with subroutines in separate files, only one of the files can contain the definition of that global

variable. How do the other subroutines, which now don’t contain that variable definition, then

figure out how to access that global variable? In those other subroutines, the trick is to define the

variable as “external.” So we do something link this:

external double global_variable;

double function spit_global_variable_back()

{ return( global_variable);

}

Here, the “external” qualifier in the global_variable definition tells the compiler to give you the

benefit of the doubt again, and insert a hook/external reference (another ZZZZ-type memory

address/pointer) that needs to be filled in correctly once the location of global_variable in memory

is known. That doesn’t happen until the “link phase,” and actually the job of the linker is not just to

fill in the blanks/hooks for subroutines defined elsewhere but also those for variables too.

II. Enough talk, a practical example!

The assignment today is to generate and print out 1000 random numbers. The subject of how to

generate (pseudo) random numbers is an important one we will talk more about later, but for now

let’s assume your professor hands you the source code for a function, ran3.c, that you can call to

generate a random number.

Here’s the skeleton of ran3.c (technically known as a “function prototype” – it contains all the

information you need to know on how to use it):

double ran3(long *idum) <- just this line is the prototype, and in ANSI C it is always required if you

plan to use the function in your file (even if it’s definition is external).

{

Stuff

}

This tells us that we need to feed ran3 a pointer to a long integer (that had better be defined

somewhere) as an input parameter, and that we should expect a double precision number in return

The main.c file that will call this routine and print out the random numbers then looks like this:

#include <stdio.h> /* necessary to include the function prototypes for the standard input/output

routines like printf */

void main()

{ double num;

long idum;

int count;

idum = -23;

for(count =0; count < 1000; ++ count)

{ ans = ran3(&idum);

printf(“random number %d -- %f \n’, count,ans);

}

}

O.K. So how do we compile and link everything. Well, if you don’t specify any extra options, the

compiler will do everything (translation to machine code + linking) without asking you.

So try this,

gcc –o myprog.x main.c ran3.c . The -o argument tells the compiler to generate an output (memory

image file) with the name that follows the argument. So this will compile the two source code files you

gave it (main.c and ran3.c) and produce an executable file, myprog.x, which then you can then run by

typing, “./myprog.x” at the shell prompt. (Question: why do I always type “./” before a program that

I’ve written?)

What does the compiler actually do to create your executable file? Lots of things actually. First it runs

something called the CPP (C preprocessor) on each .c (C source code file) to do things like replace the

#include <sdtio.h> by the contents of the file stdio.h. (In general, lines that start with “#” are commands

to the C preprocessor, a program that does something called “macro substitutions.”) The outputs of the

C preprocessor are then stored as temporary source code files , say, temp_file1.c and temp_file2.c .

Then compiler is then run on each source code file to produce the code chunks for the subroutines and

variables defined in those source code files. Because the linker hasn’t been run yet, those code chunks

necessarily contain unresolved external references. Such code chunks are called “object files,” and in

Unix, by convention, end with the letter .o . [So a file ending in .c is a C source code file, and the same

file name, but ending in .o, is the compiled object code file.] Object files are again examples of code

stubs, that cannot be loaded directly into memory and executed. The output of the compiler at this

stage will be the two object files, temp_file1.o and temp_file2.o . Finally, the linker gets called! -> ln –o

myprog.x temp_file.o and temp_file2.o, which copies the machine language for all the routines into the

master memory image, resolves all the links/hooks, and writes the image out to the file myprog.x . Along

the way, though, the compiler realized you didn’t bother to define any of the C language subroutines

like printf ! It therefore goes fishing for the definition of printf. It’s not in any of your files, but when the

ln is called by the C compiler, it is actually called as “ln –o myprog.x tem_file1.o temp_file2.o -lc” . The

last argument tells the compiler to check the C library file, libc.a, for the definition of any missing

subroutines. That is indeed where the definition of printf, so the linker grabs it and copies it into the

image too and resolves the link in your main routine to point to it. [Or on some systems, it replaces that

hook with a special dynamic link flag, that tells the autoloader to load the appropriate routine into

memory too and the resolve the links to it.] If you check the right temporary storage directories, you can

catch the compiler creating all the temporary files I’ve discussed, and then deleting all evidence of this

after it has finished its job…

Phew. Do you ever actually want to deal individually with all these steps? Yes, especially if we have a

really big program that calls lots of subroutines and we want to recompile as little as possible, so we can

rapidly debug our program.

So in this case, we explicitly tell the compiler to create the object files and keep them around. We do

this by using that –c argument:

gcc –c ran3.c produces a file ran3.o, and gcc –c main.c produces a file, main.o. We then link them using

the shorthand command (which knows which language libraries, like libc.a, to include by default),

gcc –o myprog.x main.o ran3.o .

The compiler is smart enough to realize it is dealing with two already pre -compiled object code files, so

it actually just calls the linker (“ln”) as above to produce the final image.

What’s the advantage here? Well, hopefully all the bugs are in the program you wrote, so you don’t ever

have to compile ran3.c again (which maybe could take a really long time). After I fix the typo in main.c, I

simply redo gcc –o main.c, and then redo the link step gcc –o myprog.x main.o ran3.o . All done.

This is a very simple example, but I’ll show you more complicated examples later, where we can use a

program called “make” and “makefiles” to automate all the steps above. The source code for the Linux

operating system kernel, for example, consists of thousands of files and subroutines. Imagine that a

couple friends came over and fixed a few random files for you but forgot tell you which ones. How

would you ever figure which ones they changed and then how to properly re-compile (if necessary) and

link all those thousands of files together? Stay tuned for the next exciting installment ….