Compilation - Tel Aviv Universitymaon/teaching/2016-2017/compilation/... · 2017-01-24 ·...
Transcript of Compilation - Tel Aviv Universitymaon/teaching/2016-2017/compilation/... · 2017-01-24 ·...
Compilation0368-31332016/17a
Lecture12Assemblers,linkers,loaders
NoamRinetzky
1
Whatisacompiler?
“Acompilerisacomputerprogramthattransformssourcecodewritteninaprogramminglanguage(sourcelanguage)intoanotherlanguage(targetlanguage).
Themostcommonreasonforwantingtotransformsourcecodeistocreateanexecutableprogram.”
--Wikipedia
AST+Sym.Tab.
StagesofcompilationSourcecode
(program)
LexicalAnalysis
SyntaxAnalysis
Parsing
ContextAnalysis
Portable/Retargetable codegeneration
Targetcode
(executable)
Assembly
IRText
Tokenstream
AST
CodeGeneration
Compilationè Execution
AST+Sym.Tab.
Sourcecode
(program)
LexicalAnalysis
SyntaxAnalysis
Parsing
ContextAnalysis
Portable/Retargetable codegeneration
Targetcode
(executable)
IRText
Tokenstream
AST
CodeGeneration
Linker
Assembler
Loader
Symbo
licAdd
r
ObjectF
ile
ExecutableFile
image
Executingprogram
Runtim
eSystem
ProgramRuntimeState
Code
StaticDataStack
Heap
Registers 0x11000
0x22000
0x33000
0x99000
G,extern_G
foo,extern_fooprintf
x
0x88000
Challenges
§ goto L2è JMP0x110FF§ G:=3èMOV0x2200F,0..011§ foo()è CALL0x130FF§ extern_G :=1èMOV0x2400F,0..01§ extern_foo()è CALL0x140FF§ printf()è CALL0x150FF
§ x:=2èMOVFP+32,0…010§ goto L2è JMP[PC+]0x000FF
Code
StaticDataStack
Heap
0x11000
0x22000
0x33000
0x99000
G,extern_G
foo,extern_fooprintf
x
0x88000
Assemblyè Image
Assembler
Compiler
Linker
Loader
Sourceprogram
Assemblylang.program(.s)
Machinelang.Module(.o):program(+library)modules
Executable(“.exe”):
Image(inmemory):
“compilation”time
“execution”timeLibraries(.o)
(dynamicloading)
Outline
§ Assembly§ Linker/Linkeditor§ Loader
§ Staticlinking§ Dynamiclinking
Assemblyè Image
Linker
Loader
Assembler
Compiler
Sourcefile(e.g., utils)
Assembly(.s)
Executable(“.elf”)
Image(inmemory):
Assembler
Compiler
Sourcefile(e.g.,main)
Assembly(.s)
Assembler
Compiler
library
Assembly(.s)
Object(.o)Object(.o) Object(.o)
Assembler§ Converts(symbolic)assemblertobinary(object)code
§ Objectfilescontainacombinationofmachine instructions,data,andinformationneededtoplaceinstructionsproperlyinmemory
§ Yetanother(simple)compiler§ One-toonetranslation
§ Convertsconstantstomachinerepr.(3è0…011)§ Resolveinternalreferences§ Recordsinfoforcode&datarelocation
ObjectFileFormat
§ Header:Admininfo+“filemap”§ Textseg.:machineinstruction§ Dataseg.:(Initialized)datainmachineformat§ Relocationinfo:instructionsanddatathatdependonabsoluteaddresses
§ Symboltable:“exported”references+unresolvedreferences
Header TextSegment
DataSegment
RelocationInformation
SymbolTable
DebuggingInformation
HandlingInternalAddresses
ResolvingInternalAddresses
§ Twoscansofthecode§ Constructatablelabel® address§ Replacelabelswithvalues
§ Onescanofthecode(Backpatching)§ Simultaneouslyconstructthetableandresolvesymbolicaddresses§ Maintainslistofunresolvedlabels
§ Usefulbeyondassemblers
Backpatching
HandlingExternalAddresses
§ Recordsymboltablein“external”table§ Exported(defined)symbols
§ G,foo()
§ Imported(required)symbols§ Extern_G,extern_bar(),printf()
§ Relocationbits§ Markinstructionsthatdependonabsolute(fixed)addresses§ Instructionsusingglobals,
Example
ExternalreferencesresolvedbytheLinker usingtherelocationinfo.
ExampleofExternalSymbolTable
AssemblerSummary
§ Convertssymbolicmachinecodetobinary§ addl %edx,%ecxÞ 000000111010001=01D1(Hex)
§ Formatconversions§ 3è 0x0..011or0x000000110…0
§ Resolvesinternaladdresses
§ Someassemblerssupportoverloading§ Differentopcodes basedontypes
Linker
§ Mergesobjectfilestoanexecutable§ Enablesseparatecompilation
§ Combinememorylayoutsofobjectmodules§ Linksprogramcallstolibraryroutines
§ printf(),malloc()
§ Relocatesinstructionsbyadjustingabsolutereferences§ Resolvesreferencesamongfiles
Linker
CodeSegment1
Data
Segment1
CodeSegment2
Data
Segment2
0
200
100
0
450
300
120
ext_bar()
380
ext_bar 150zoo 180
Data
Segment1
CodeSegment2
Data
Segment2
0
400
100
500
420
580
ext_bar 250zoo 280
650
CodeSegment1
foofoo
Relocationinformation
• Informationneededtochangeaddresses
§ Positionsinthecodewhichcontainsaddresses§ Data§ Code
§ Twoimplementations§ Bitmap§ Linked-lists
ExternalReferences
§ Thecodemayincludereferencestoexternalnames(identifiers)§ Librarycalls§ Externaldata
§ Storedinexternalsymboltable
ExampleofExternalSymbolTable
Example
Linker(Summary)
§ Mergeseveralexecutables§ Resolveexternalreferences§ Relocateaddresses
§ Usermode
§ Providedbytheoperatingsystem§ Butcanbespecificforthecompiler
§ Moresecurecode§ Bettererrordiagnosis
LinkerDesignIssues
§ Merges§ Codesegments§ Datasegments§ Relocationbitmaps§ Externalsymboltables
§ Retaininformationaboutstaticlength§ Reallifecomplications
§ Aggregateinitializations§ Objectfileformats§ Largelibrary§ Efficientsearchprocedures
Loader
§ Bringsanexecutablefilefromdiskintomemoryandstartsitrunning§ Readexecutablefile’sheadertodeterminethesizeoftextanddata
segments§ Createanewaddressspacefortheprogram§ Copiesinstructionsanddataintomemory§ Copiesargumentspassedtotheprogramonthestack
§ Initializesthemachineregistersincludingthestackptr§ Jumpstoastartuproutinethatcopiestheprogram’sargumentsfromthestacktoregistersandcallstheprogram’smainroutine
ProgramLoading
Registers
LoaderImage
CodeSegment2
Data
Segment2
0
400
100
500
420
580
ext_bar 250zoo 280
650
CodeSegment1
Data
Segment1
CodeSegment
StaticData
Stack
Heap
ProgramExecutable
foo
Loader(Summary)
§ Initializestheruntimestate
§ Partoftheoperatingsystem§ Privilegedmode
§ Doesnotdependontheprogramminglanguage
§ “Invisibleactivationrecord”
StaticLinking(Recap)
§ Assemblergeneratesbinarycode§ Unresolvedaddresses§ Relocatableaddresses
§ Linkergeneratesexecutablecode§ Loadergeneratesruntimestates(images)
DynamicLinking
§ Whydynamiclinking?§ Sharedlibraries
§ Savespace§ Consistency
§ Dynamicloading§ Loadondemand
What’sthechallenge?
Assembler
Compiler
Linker
Loader
Sourceprogram
Assemblylang.program(.s)
Machinelang.Module(.o):program(+library)modules
Executable(“.exe”):
Image(inmemory):
“compilation”time
“execution”timeLibraries(.o)
(dynamiclinking)
Position-IndependentCode(PIC)
§ Codewhichdoesnotneedtobechangedregardlessoftheaddressinwhichitisloaded§ Enableloadingthesameobjectfileatdifferentaddresses
§ Thus,sharedlibrariesanddynamicloading
§ “Good”instructionsforPIC:userelativeaddresses§ relativejumps§ referencetoactivationrecords
§ “Bad”instructionsfor:usefixedaddresses§ Accessingglobalandstaticdata§ Procedurecalls
§ Wherearethelibraryprocedureslocated?
How?
“Allproblemsincomputersciencecanbesolvedbyanotherlevelofindirection"
ButlerLampson
PIC:TheMainIdea
§ Keeptheglobaldatainatable§ Refertoalldatarelativetothedesignatedregister
Per-RoutinePointerTable
§ Recordforeveryroutineinatable
&foo
&D.S.1
PText_bar
&ext_bar
&D.S.2
&zoo
&D.S.2
PText_bar
&D.S.2
foo
Per-RoutinePointerTable
§ Recordforeveryroutineinatable
Data
Segment1
CodeSegment2
Data
Segment2 580
ext_barzoo
CodeSegment1
foo
&foo
&D.S.1
PText_bar
&ext_bar
&D.S.2
&zoo
&D.S.2
PText_bar
&D.S.2 ext_g
foo
Per-RoutinePointerTable§ Recordforeveryroutineinatable§ Recordusedasaaddresstoprocedure
Caller:1. LoadPointertableaddress
intoRP2. LoadCodeaddressfrom
0(RP)intoRC3. CallviaRC
Callee:1. RPpointstopointertable2. Tablehasaddressesofpointertable
forsub-procedures
Otherdata
RP.func
PIC:TheMainIdea
§ Keeptheglobaldatainatable§ Refertoalldatarelativetothedesignatedregister
§ Efficiency:usearegistertopointtothebeginningofthetable§ TroublesomeinCISCmachines
ELF-PositionIndependentCode
§ ExecutableandLinkablecodeFormat§ IntroducedinUnixSystemV
§ Observation§ Executableconsistsofcodefollowedbydata§ Theoffsetofthedatafromthebeginningofthecodeisknownat
compile-time
GOT(GlobalOffsetTable)Data
Segment
CodeSegment
XX0000
callL2L2:
popl %ebxaddl $_GOT[.-..L2],%ebx
ELF:Accessingglobaldata
ELF:CallingProcedures(before1stcall)
ELF:CallingProcedures(after1stcall)
PICbenefitsand costs§ Enableloadingw/orelocation
§ Sharememorylocationsamongprocesses
§ Datasegmentmayneedtobereloaded
§ GOTcanbelarge§ Moreruntimeoverhead§ Morespaceoverhead
SharedLibraries
§ Heavilyusedlibraries§ Significantcodespace
§ 5-10Megaforprint§ Significantdiskspace§ Significantmemoryspace
§ Canbesavedbysharingthesamecode§ Enforceconsistency§ Butintroducessomeoverhead
§ Canbeimplementedeitherwithstaticordynamicloading
SharedLibraries
§ Heavilyusedlibraries§ Significantcodespace
§ 5-10Megaforprint§ Significantdiskspace§ Significantmemoryspace
§ Canbesavedbysharingthesamecode§ Enforceconsistency§ Butintroducessomeoverhead
ContentofELFfile
CallPLT
GOT
Text
Data
RoutinePLT
GOT
Text
Data
Program Libraries
Consistency
§ Howtoguaranteethatthecode/libraryusedthe“right” libraryversion
LoadingDynamicallyLinkedPrograms§ Startthedynamiclinker§ Findthelibraries§ Initialization§ Resolvesymbols§ GOT
§ Typicallysmall
§ Libraryspecificinitialization
§ Lazyprocedurelinkage
MicrosoftDynamicLibraries(DLL)
§ SimilartoELF§ Somewhatsimpler§ Requirecompilersupporttoaddressdynamiclibraries
§ ProgramsandDLLarePortableExecutable(PE)§ Eachapplicationhasitownaddress§ Supportslazybindings
DynamicLinkingApproaches
§ Unix/ELFusesasinglenamespacespaceandMS/PEusesseveralnamespaces
§ ELFexecutableliststhenamesofsymbolsandlibrariesitneeds
§ PEfileliststhelibrariestoimportfromotherlibraries
§ ELFismoreflexible§ PEismoreefficient
Costsofdynamicloading
§ Loadtimerelocationoflibraries§ Loadtimeresolutionoflibrariesandexecutable§ OverheadfromPICprolog§ Overheadfromindirectaddressing§ Reservedregisters
Summary
§ Codegenerationyieldscodewhichisstillfarfromexecutable§ Delegatetoexistingassembler
§ Assemblertranslatessymbolicinstructionsintobinaryandcreatesrelocationbits
§ Linkercreatesexecutablefromseveralfilesproducedbytheassembly
§ Loadercreatesanimagefromexecutable