Examining the Compilation Process

6
Examinin g th e Compilat io n Process. Pa rt 1. | Li nu x Jo ur na l http:/ /www.l inuxjourna l. co m/content/examining- compilat io n- pr ocess- p. .. 1 of 6 10/08/2008 09:35 PM  Search Home Topics Newsletter Community Resources Forums Shop Magazine Home Examining the Compilation Process. Part 1. Octobe r 6t h, 20 08 by Mike Diehl in Softwa re Average: Great Great Your rating: None Average: 3.9 (14 votes) Rate This article, and the one to follow, are based on a Software Development class I taught a few years ago. The students in this class were non-programmers who had been hired to receive bug reports for a compiler product. As Analysts, they had to understand the software compilation process in some detail, even though s ome of them had never written a single line of code. It was a fun class to teach, so I'm hoping that the subject translates into interesting reading. In this article, I'm going to discuss the process that the computer goes through to compile source code into an executable program. I won't be clouding the issue with the Make environment, or Revision Control, like I necessarily did in the class. For this article, we're only going to discuss what happens after you type gcc test.c. Broadly speaking, the compilation process is broken down into 4 steps: preprocessing, compilation, assembly, and linking. We'll discuss each step in turn. Before we can discuss compiling a program, we really need to have a program to compile. Our program needs to be simple enough that we can discuss it in detail, but broad enough that it exercises all of the concepts that I want to discuss. Here is a program that I hope fits the bill: #include <stdio.h> // This is a comment. #define STRING "This is a test" #define COUNT (5) int main () { int i; for (i=0; i<COUNT; i++) { puts(STRING); } Subscribe Renew Free Issue Customer service The Latest MySQL Founding Father Sails Into the Sunset Oct-08-08 Bash Extended Globbing Oct-08-08 Clickjacking! Noooooooooo! Oct-08-08 LinuxWorld Sheds Its Conference Cocoon Oct-07-08 Compiz Killed My Video Card Oct-07-08 The Green Penguin: Going Green With Google Oct-07-08 more Featured Videos Linux Journal Live - Oct 2, 2008 October 3rd, 2008 by Shawn Powers The October 2, 2008 edition of Linux Journal Live! Associate Editor, Shawn Powers, and Steven Evatt, Online Development manager for The Houston Chronicle discuss surviving

Transcript of Examining the Compilation Process

Page 1: Examining the Compilation Process

8/2/2019 Examining the Compilation Process

http://slidepdf.com/reader/full/examining-the-compilation-process 1/6

mining the Compilation Process. Part 1. | Linux Journal http://www.linuxjournal.com/content/examining-compilation-pro

6 10/08/2008 0

  Search

Home Topics Newsletter Community Resources Forums Shop Magazine

Home

Examining the Compilation Process. Part 1.October 6th, 2008 by Mike Diehl in Software

Average:

GreatGreat

Your rating: None Average: 3.9 (14 votes)

Rate

This article, and the one to follow, are based on a Software Development

class I taught a few years ago. The students in this class were

non-programmers who had been hired to receive bug reports for a compiler

product. As Analysts, they had to understand the software compilation

process in some detail, even though some of them had never written a

single line of code. It was a fun class to teach, so I'm hoping that the

subject translates into interesting reading.

In this article, I'm going to discuss the process that the computer goes

through to compile source code into an executable program. I won't be

clouding the issue with the Make environment, or Revision Control, like I

necessarily did in the class. For this article, we're only going to discuss

what happens after you type gcc test.c.

Broadly speaking, the compilation process is broken down into 4 steps:

preprocessing, compilation, assembly, and linking. We'll discuss each step

in turn.

Before we can discuss compiling a program, we really need to have a

program to compile. Our program needs to be simple enough that we can

discuss it in detail, but broad enough that it exercises all of the concepts

that I want to discuss. Here is a program that I hope fits the bill:

#include <stdio.h>

// This is a comment.

#define STRING "This is a test"

#define COUNT (5)

int main () {

int i;

for (i=0; i<COUNT; i++) {

puts(STRING);

}

Subscribe Renew Free Issue Customerservice

The Latest

MySQL Founding Father Sails

Into the SunsetOct-08-08

Bash Extended Globbing Oct-08-08

Clickjacking! Noooooooooo! Oct-08-08

LinuxWorld Sheds Its Conference

CocoonOct-07-08

Compiz Killed My Video Card Oct-07-08

The Green Penguin: Going Green

With GoogleOct-07-08

more

Featured Videos

Linux Journal Live - Oct 2, 2008

October 3rd, 2008 by Shawn Powers

The October 2, 2008 edition of Linux Journal 

Live! Associate Editor, Shawn Powers, and

Steven Evatt, Online Development manager

for The Houston Chronicle discuss surviving

Page 2: Examining the Compilation Process

8/2/2019 Examining the Compilation Process

http://slidepdf.com/reader/full/examining-the-compilation-process 2/6

mining the Compilation Process. Part 1. | Linux Journal http://www.linuxjournal.com/content/examining-compilation-pro

6 10/08/2008 0

return 1;

}

If we put this program in a file called test.c, we can compile this program with the simple command: gcc test.c. What we end up

with is an executable file called a.out. The name a.out has some history behind it. Back in the days of the PDP computer, a.out

stood for “assembler output.” Today, it simply means an older executable file format. Modern versions of Unix and Linux use

the ELF executable file format. The ELF format is much more sophisticated. So even though the default filename of the output

of gcc is “a.out,” its actually in ELF format. Enough history, let's run our program.

When we type ./a.out, we get:

This is a test

This is a test

This is a test

This is a test

This is a test

This, of course, doesn't come as a surprise, so let's discuss the steps that gcc went through to create the a.out file from the

test.c file.

As mentioned earlier, the first step that the compiler does is it sends our source code through the C Preprocessor. The C

Preprocessor is responsible for 3 tasks: text substitution, stripping comments, and file inclusion. Text substitution and fileinclusion is requested in our source code using preprocessor directives. The lines in our code that begin with the “#” character

are preprocessor directives. The first one requests that a standard header, stdio.h, be included into our source file. The other

two request a string substitution to take place in our code. By using gcc's “-E” flag, we can see the results of only running the C

preprocessor on our code. The stdio.h file is fairly large, so I'll clean up the results a little.

gcc -E test.c > test.txt

# 1 "test.c"

# 1 "/usr/include/stdio.h" 1 3 4

# 28 "/usr/include/stdio.h" 3 4

# 1 "/usr/include/features.h" 1 3 4

# 330 "/usr/include/features.h" 3 4

# 1 "/usr/include/sys/cdefs.h" 1 3 4

# 348 "/usr/include/sys/cdefs.h" 3 4

# 1 "/usr/include/bits/wordsize.h" 1 3 4

# 349 "/usr/include/sys/cdefs.h" 2 3 4

# 331 "/usr/include/features.h" 2 3 4

# 354 "/usr/include/features.h" 3 4

# 1 "/usr/include/gnu/stubs.h" 1 3 4

# 653 "/usr/include/stdio.h" 3 4

extern int puts (__const char *__s);

int main () {

int i;

for (i=0; i<(5); i++) {

puts("This is a test");

}

return 1;

}

Page 3: Examining the Compilation Process

8/2/2019 Examining the Compilation Process

http://slidepdf.com/reader/full/examining-the-compilation-process 3/6

mining the Compilation Process. Part 1. | Linux Journal http://www.linuxjournal.com/content/examining-compilation-pro

6 10/08/2008 0

The first thing that becomes obvious is that the C Preprocessor has added a lot to our simple little program. Before I cleaned it

up, the output was over 750 lines long. So, what was added, and why? Well, our program requested that the stdio.h header be

included into our source. Stdio.h, in turn, requested a whole bunch of other header files. So, the preprocessor made a note of

the file and line number where the request was made and made this information available to the next steps in the compilation

process. Thus, the lines,

# 28 "/usr/include/stdio.h" 3 4

# 1 "/usr/include/features.h" 1 3 4

indicates that the features.h file was requested on line 28 of stdio.h. The preprocessor creates a line number and file name

entry before what might be “interesting” to subsequent compilation steps, so that if there is an error, the compiler can report

exactly where the error occurred.

When we get to the lines,

# 653 "/usr/include/stdio.h" 3 4

extern int puts (__const char *__s);

We see that puts() is declared as an external function that returns an integer and accepts a single constant character array as a

parameter. If something were to go horribly wrong with this declaration, the compiler could tell us that the function was declared

on line 653 of stdio.h. It's interesting to note that puts() isn't defined, only declared. That is, we don't get to see the code that

actually makes puts() work. We'll talk about how puts(), and other common functions get defined later.

Also notice that none of our program comments are left in the preprocessor output, and that all of the string substitutions have

been performed. At this point, the program is ready for the next step of the process, compilation into assembly language.

We can examine the results of the compilation process by using gcc's -S flag.

gcc -S test.c

This command results in a file called test.s that contains the assembly code implementation of our program. Let's take a brief

look.

.file "test.c"

.section .rodata.LC0:

.string "This is a test"

.text

.globl main

.type main, @function

main:

leal 4(%esp), %ecx

andl $-16, %esp

pushl -4(%ecx)

pushl %ebp

movl %esp, %ebp

pushl %ecx

subl $20, %esp

movl $0, -8(%ebp)

jmp .L2

.L3:

movl $.LC0, (%esp)

call puts

addl $1, -8(%ebp)

.L2:

Page 4: Examining the Compilation Process

8/2/2019 Examining the Compilation Process

http://slidepdf.com/reader/full/examining-the-compilation-process 4/6

mining the Compilation Process. Part 1. | Linux Journal http://www.linuxjournal.com/content/examining-compilation-pro

6 10/08/2008 0

cmpl $4, -8(%ebp)

jle .L3

movl $1, %eax

addl $20, %esp

popl %ecx

popl %ebp

leal -4(%ecx), %esp

ret

.size main, .-main

.ident "GCC: (GNU) 4.2.4 (Gentoo 4.2.4 p1.0)"

.section .note.GNU-stack,"",@progbits

My assembly language skills are a bit rusty, but there are a few features that we can spot fairly readily. We can see that our

message string has been moved to a different part of memory and given the name .LC0. We can also see that there are quite a

few steps needed to start and exit our program. You might be able to follow the implementation of the for loop at .L2; it's simply

a comparison (cmpl) and a “Jump if Less Than” (jle) instruction. The initialization was done in the movl instruction just above

the .L3 label. The call to puts() is fairly easy to spot. Somehow the Assembler knows that it can call the puts() function by name

and not a funky label like the rest of the memory locations. We'll discuss this mechanism next when we talk about the final

stage of compilation, linking. Finally, our program ends with a return (ret).

The next step in the compilation process is to assemble the resulting Assembly code into an object file. We'll discuss objectfiles in more detail when we discuss linking. Suffice it to say that assembling is the process of converting (relatively) human

readable assembly language into machine readable machine language.

Linking is the final stage that either produces an executable program file or an object file that can be combined with other object

files to produce an executable file. It's at the link stage that we finally resolve the problem with the call to puts(). Remember that

puts() was declared in stdio.h as an external function. This means that the function will actually be defined, or implemented,

elsewhere. If we had several source files in our program, we might have declared some of our functions as extern and

implemented them in different files; such functions would be available anywhere in our source files by nature of having been

declared extern. Until the compiler knows exactly where all of these functions are implemented, it simply uses a place-holder

for the function call. The linker will resolve all of these dependencies and plug in the actual address of the functions.

The linker also does a few additional tasks for us. It combines our program with some standard routines that are needed tomake our program run. For example, there is standard code required at the beginning of our program that sets up the running

environment, such as passing in command-line parameters and environment variables. Also, there is code that needs to be run

at the end of our program so that it can pass back a return code, among other tasks. It turns out that this is no small amount of

code. Let's take a look.

If we compile our example program, as we did above, we get an executable file that is 6885 byes in size. However, if we

instruct the compiler to not go through the linking stage, by using the -c flag (gcc -c test.c -o test.o), we get an object module

that is 888 bytes in size. The difference in file size is the code to startup and terminate our program, along with the code that

allows us to call the puts() function in libc.so.

At this point, we've looked at the compilation process in some detail. I hope this has been interesting to you. Next time, we'll

discuss the linking process in a bit more detail and consider some of the optimization features that gcc provides.

 __________________________ 

Mike Diehl is a recently self-employed Computer Nerd and lives in Albuquerque, NM. with his wife and 3 sons. He can be

reached at [email protected]

Special Magazine Offer -- 2 Free Trial Issues!

Receive 2 free trial issues of Linux Journal as well as instant online access to current and past issues. There's NO RISK and

NO OBLIGATION to buy. CLICK HERE for offer

Page 5: Examining the Compilation Process

8/2/2019 Examining the Compilation Process

http://slidepdf.com/reader/full/examining-the-compilation-process 5/6

Page 6: Examining the Compilation Process

8/2/2019 Examining the Compilation Process

http://slidepdf.com/reader/full/examining-the-compilation-process 6/6

mining the Compilation Process. Part 1. | Linux Journal http://www.linuxjournal.com/content/examining-compilation-pro

Subscribe Advertise Contact us Privacy statement Report problems RSS Feeds

Copyright © 1994 - 2008 Linux Journal . All rights reserved.HTML Entities problem...

On October 6th, 2008 Anonymous (not verified) says:

You forgot to convert < characters into &lt; in your example program, it doesn't show up properly in the article.

reply   Printer-friendly version

Post new comment

Please note that comments may not appear immediately, so there is no need to repost your comment.

Your name:

E-mail:

The content of this field is kept private and will not be shown publicly.

Homepage:

Subject:

Comment: *

Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre> <ul> <ol> <li> <dl> <dt> <dd> <i> <b>

Lines and paragraphs break automatically.

More information about formatting options

Preview comment

Anonymous