Example:Example:WindowsWindows
1
Windows HistoryWindows History
2
2009 Windows 7
Features addedFeatures added
Windows2000 additionsWindows2000 additionsPlugPlug--andand--playplayNetwork directory serviceNetwork directory serviceNew GUINew GUI
Vista additionsVista additionsNew GUINew GUIMore focus on securityMore focus on securitycleanclean--up theup the codecode
Windows 7Windows 7Focus on performance and compatibilityFocus on performance and compatibility
3
Different OS sizesDifferent OS sizes
Windows XPWindows XP –– 50 M50 MWindows VistaWindows Vista –– 70 M70 M
4
Tan01 Fig 11-3
Programming interfaceProgramming interface
5
Downward CompatibilityDownward Compatibility
Common API across both MSCommon API across both MS--DOS and NT basedDOS and NT basedWindowsWindows
Important for downward compatibilityImportant for downward compatibilityIncreases the complexity of the API and its implementationIncreases the complexity of the API and its implementation
6
NT API: Programming layersNT API: Programming layers
Libraries (DLL)Libraries (DLL) –– as in most operating systemsas in most operating systemsUser mode service processesUser mode service processes –– accessed using RPCaccessed using RPCUpper layers interfaces public, but Native NT API not !Upper layers interfaces public, but Native NT API not !
7
NT featuresNT features
Thread is the unit of concurrencyThread is the unit of concurrencyDynamic library is the unit of compositionDynamic library is the unit of compositionOne command CreateProcess does both fork and execOne command CreateProcess does both fork and execObjects just for data hiding and abstractionObjects just for data hiding and abstraction
Object handles are specific to the process that created theObject handles are specific to the process that created theobjectobjectObject is just a data structureObject is just a data structure
Windows object namingWindows object namingUnicodeUnicodeCaseCase--preserving, but casepreserving, but case--insensitive!insensitive!
8
Win32 APIWin32 API
Public and fully documentedPublic and fully documentedLibrary routines to wrap native NT calls or to do the actual workLibrary routines to wrap native NT calls or to do the actual workin user modein user modeRich interface:Rich interface:
lot of parameters,lot of parameters,several functions for the same taskseveral functions for the same taskMix of lowMix of low--level and highlevel and high--level functionslevel functions
Two special execution environments (WindowsTwo special execution environments (Windows--onon--Windows)Windows)WOW32WOW32 –– for 32for 32--bit x86 systems to run 16bit x86 systems to run 16--bit old windows 3.xbit old windows 3.xapplicationsapplicationsWOW64WOW64 –– to allow 32to allow 32--bit applications to run on x64 systembit applications to run on x64 system
9
Example Win32 API FunctionsExample Win32 API Functions
10
RegistryRegistry
Cross between file system and databaseCross between file system and databaseUsed to store NT namespaceUsed to store NT namespaceOrganised into separate volumes called hivesOrganised into separate volumes called hives
Each hive is kep in a separate file in directoryEach hive is kep in a separate file in directoryC:C:\\WindowsWindows\\system32system32\\configconfig
Loss of the registry requires reinstalling all softwareLoss of the registry requires reinstalling all softwareRegedit, procmon, PowerShellRegedit, procmon, PowerShellCurrent situationCurrent situation
Very disorganised, poorly defined conventionsVery disorganised, poorly defined conventionsPlans to use kernelPlans to use kernel--based transaction manager in the futurebased transaction manager in the future
11
System structureSystem structure
12
Windows kernelWindows kernel--mode layersmode layers
13
NTOS kernel layerNTOS kernel layer
Implements exception handling, trap and interruptImplements exception handling, trap and interruptmechanismsmechanisms –– transition from user mode to kernel modetransition from user mode to kernel modeScheduling and synchronizing threadsScheduling and synchronizing threads
Switched at timer interrupts, starts waiting, higherSwitched at timer interrupts, starts waiting, higher--priorityprioritythread becomes readythread becomes ready--toto--runrun
LowLow--level support for synchronizationlevel support for synchronizationControl objectControl object
Including Deferred Procedure Call (DPC) object, AsynchronousIncluding Deferred Procedure Call (DPC) object, AsynchronousProcedure Call (APC) objectProcedure Call (APC) object
Dispatcher objectDispatcher object
14
DPC and APCDPC and APC
DPCDPC –– deferred proc. calldeferred proc. callReduce the time to executeReduce the time to executeInterrupt Service RoutinesInterrupt Service RoutinesIn any context (as interrupts)In any context (as interrupts)Only the critical operations areOnly the critical operations areperformed on high priorityperformed on high priorityRest is deferred in DPC object andRest is deferred in DPC object andexecuted later with lower priorityexecuted later with lower priorityRun level (2) between interrupts (3)Run level (2) between interrupts (3)and normal processes (0)and normal processes (0)Executed immediately on aExecuted immediately on aparticular processor when levelparticular processor when levellowers to 2 (from 3)lowers to 2 (from 3)
APCAPC –– asynchr. proc. callasynchr. proc. callSomewhat similar to signals orSomewhat similar to signals ornotifications, defer processing ofnotifications, defer processing ofsystem routine, looks likesystem routine, looks likeunexpected procedure callunexpected procedure callIn the thread contextIn the thread contextAPCs queue to thread, executed inAPCs queue to thread, executed infifo order, when thread allowsfifo order, when thread allowsThe kernelThe kernel--mode APC continuesmode APC continuesDPCs work in the thread contextDPCs work in the thread contextUserUser--mode APCs wakemode APCs wake--up waitingup waitingthreadthread
15
Dispatcher objectDispatcher object
A synchronization object, users can refer with handleA synchronization object, users can refer with handleA flag representing the signal state of the objectA flag representing the signal state of the objectA queue for threads waiting to be a signaledA queue for threads waiting to be a signaledAPI callAPI call WaitForMultipleObjectsWaitForMultipleObjects allows thread to wait on any ofallows thread to wait on any ofthe Dispatcher objects it has a handle tothe Dispatcher objects it has a handle toNotificationNotification –– allall waiting threadswaiting threads become runnablebecome runnableSyncronizationSyncronization –– only the first waiting one become runnableonly the first waiting one become runnable
16
NTOS Executive LayerNTOS Executive Layer
Contains most kernel service managers except driversContains most kernel service managers except driversObject managerObject managerI/O managerI/O managerProcess managerProcess managerMemory managerMemory managerCache managerCache managerSecurity reference monitorSecurity reference monitorConfiguration managerConfiguration managerLPCLPC –– local procedure calllocal procedure call
17
Device driverDevice driver
Dynamic link libraries loaded by NTOS executiveDynamic link libraries loaded by NTOS executiveLoad mechanism general, used also to extend the kernelLoad mechanism general, used also to extend the kernel
A lot of objects:A lot of objects:Device objectDevice object –– for each element in the device stack (datafor each element in the device stack (dataflow path)flow path)Driver objectDriver object –– contains the table of routines to usecontains the table of routines to useEach device object isEach device object is linked withlinked with a particular driver objecta particular driver objectSome of these pairs are used just as filtersSome of these pairs are used just as filters
PrePre-- or postprocessing, provide new funtionality, work around accessor postprocessing, provide new funtionality, work around accessrightsrights
18
Device stackDevice stack
19
HALHAL –– Hardware abstraction layerHardware abstraction layer
Abstracts lowAbstracts low--level hardware detailslevel hardware detailsEnhances portability across hardware platformsEnhances portability across hardware platformsCannot hideCannot hide
Format of pageFormat of page--table entriestable entriesPhysical memory page size, word lengthPhysical memory page size, word lengthLittleLittle--endian modeendian modeCompare&swap synchronization primitiveCompare&swap synchronization primitive
Can hideCan hideMost other processor and chipset variationMost other processor and chipset variation
Main idea: less changes when portingMain idea: less changes when porting20
HALHAL
Maps busMaps bus--related addresses onto systemrelated addresses onto system--wide logicalwide logicaladdressesaddressesSystemSystem--wide naming of interruptswide naming of interruptsDMADMA--transfers in devicetransfers in device--independent wayindependent way
21
Object manager &Object manager &Object namespaceObject namespace
Everything is object within WindowsEverything is object within WindowsAn (executive) object is just a data structure in the virtualAn (executive) object is just a data structure in the virtualmemory accessible in kernel modememory accessible in kernel modeObject represents an abstract concept (file, process, ...)Object represents an abstract concept (file, process, ...)
Object manager provides a uniform and consistentObject manager provides a uniform and consistentinterface to managing all kinds of objects!interface to managing all kinds of objects!
Creating, deleting and accountingCreating, deleting and accountingObjects can have names within NT namespaceObjects can have names within NT namespaceObjects do not exist over reboots, they need to be createdObjects do not exist over reboots, they need to be createdevery time system is bootedevery time system is booted
22
ExecutiveExecutiveobjectobject
Memory for kernel objects is allocated either from the pageableMemory for kernel objects is allocated either from the pageableor nonpageable kernel memory heap by executive layeror nonpageable kernel memory heap by executive layerObjects have a quota charge informationObjects have a quota charge informationPocess has limited quotaPocess has limited quotaObjects are ’named’ using handles, which user process can use toObjects are ’named’ using handles, which user process can use toaccess a particular objectaccess a particular object
23
Handle tableHandle table
Each process has its own handle table!Each process has its own handle table!24
Object name spaceObject name space
Handles are processHandles are process--specific, cannot be shared directlyspecific, cannot be shared directlyFor sharing process can duplicate its handle only toFor sharing process can duplicate its handle only toprocesses it has a handle toprocesses it has a handle toBetter way of sharing: give object a nameBetter way of sharing: give object a name
Arbitrary names can be given to any objectArbitrary names can be given to any object
Object exists only as long as there is at least oneObject exists only as long as there is at least oneprocess having handle to it or it has been given a nameprocess having handle to it or it has been given a nameOnly some object types have interfaces supportingOnly some object types have interfaces supportingnamesnamesObject manager has dirs and symbolic links for namesObject manager has dirs and symbolic links for names
25
Typical directories inTypical directories inobject name spaceobject name space
26
Example: creating openExample: creating open--file objectfile object
27
BootingBooting
28
Booting Windows VistaBooting Windows VistaInitial startup phase: BIOS loads the small boot strap loaderInitial startup phase: BIOS loads the small boot strap loaderfrom diskfrom diskBootstrap loader load theBootstrap loader load the BootMgrBootMgrBootMgrBootMgr can resume a hibernated systemcan resume a hibernated system WinResumeWinResume or load aor load anew onenew one WinLoadWinLoadWinLoadWinLoad loads:loads:
Ntoskrnl.eceNtoskrnl.ece, hal.dll (hardware abstraction), hal.dll (hardware abstraction)bootboot drivers (hard drive, file system, etcdrivers (hard drive, file system, etc))
RunRun smsssmss –– first program in user space (like init in UNIX)first program in user space (like init in UNIX)Windows boot procedure contains several recovery mechanismWindows boot procedure contains several recovery mechanismfor possible problem situations encountered during bootfor possible problem situations encountered during boot
29
UserUser--mode componentsmode components
SubsystemsSubsystemsSeen as a way to support multiple OS personalitiesSeen as a way to support multiple OS personalities
Dynamic link libraries (DLL)Dynamic link libraries (DLL)Linked to executable program at runLinked to executable program at run--time not at compilationtime not at compilationCommon code can be shared, versioning becomes an issueCommon code can be shared, versioning becomes an issueDLLs form a graph, one may need others to be loadedDLLs form a graph, one may need others to be loadedAllowed to run when a new process or thread is startedAllowed to run when a new process or thread is started(this(this attachattach is meant just for initialization purpose, but ...)is meant just for initialization purpose, but ...)
UserUser--mode servicesmode servicesExtends system’s functionality, strongly used in VistaExtends system’s functionality, strongly used in Vista
30
Process managementProcess management
31
ProcessProcess
Just a containerJust a containerVirtual address spaceVirtual address spaceHandles to kernelHandles to kernel--mode objects and threadsmode objects and threadsHold common resources (quota structure, shared tokens)Hold common resources (quota structure, shared tokens)
Process Environment Block (PEB)Process Environment Block (PEB)List of loaded modules (dll & exe)List of loaded modules (dll & exe)Memory containing environment stringsMemory containing environment stringsCurrent working directoryCurrent working directoryData for managing the process’ heapData for managing the process’ heap
User shared data (kernel writes, threads only read)User shared data (kernel writes, threads only read)Performance optimization: time formats, version info, flagsPerformance optimization: time formats, version info, flags
32
ThreadThread
Abstraction for scheduling the CPUAbstraction for scheduling the CPUStateState
Priorities assigned to thread based on the processPriorities assigned to thread based on the processpriority valuepriority valueAffinitized to certain processorsAffinitized to certain processorsThread Environment Block (TEB)Thread Environment Block (TEB)
Thread Local StorageThread Local StorageFields for localization (language, cultural)Fields for localization (language, cultural)
All system threads belong to special system processAll system threads belong to special system process
33
Jobs & fibersJobs & fibers
Job is a group of processes sharing the same resource limits andJob is a group of processes sharing the same resource limits andrestrictions (quota etc)restrictions (quota etc)A fiber has a stack and data structure for registers and dataA fiber has a stack and data structure for registers and data
Fibers do not belong to threads,Fibers do not belong to threads,Threads are converted to fibers or fibers created indpendently of threadsThreads are converted to fibers or fibers created indpendently of threadsIndependent fibers can run only by thread’s explicit callIndependent fibers can run only by thread’s explicit call SwitchToFiberSwitchToFiberOverhead of switching between fibers is low,Overhead of switching between fibers is low, rarely usedrarely used 34
SchedulingScheduling
Priority based, roundPriority based, round--robin on each priority levelrobin on each priority levelDefault time quantum on user system 20ms and server 180 msDefault time quantum on user system 20ms and server 180 ms
Threads must execute the scheduler code when they block,Threads must execute the scheduler code when they block,quantum expires or they signal an objectquantum expires or they signal an objectScheduling code is also run as DPC after I/O and timerScheduling code is also run as DPC after I/O and timerinterruptsinterrupts
35
Memory managementMemory management
36
Virtual address space layoutVirtual address space layout
37
Physical MemoryPhysical Memory ManagementManagement
38
Some of the major fields in the page frame data base for a valid pageSome of the major fields in the page frame data base for a valid page
39
Windows: page framesWindows: page frames
(Fig 11.36 [Tane08])
File system (and I/O)File system (and I/O)
40
NTFS:NTFS: FeaturesFeatures
Recovers from systemRecovers from systemfailures and disk errorsfailures and disk errors
LFSLFS log filelog file
Access controlAccess controlsecurity descriptor (pääsylistat)security descriptor (pääsylistat)
Allow large disks and filesAllow large disks and filesFAT32 has onlyFAT32 has only 223232 blocks,blocks,large allocation tablelarge allocation table
File objects are pairsFile objects are pairs (value,(value,attribute)attribute)Allows indexing for faster fileAllows indexing for faster fileaccessaccess
Block,Block, clusterclusterOne or more contiguousOne or more contiguoussectors (f.eg. 512 Bsectors (f.eg. 512 B -- 4 KB)4 KB)Unit for allocation andUnit for allocation andaccountingaccounting
Partition,Partition, volumevolumeLogical part of the physicalLogical part of the physicaldisk.disk.Has its own file systemHas its own file system
41
NTFSNTFS--partitionpartition
42
Boot sectorBoot sectorBoot information, partition and file system structureBoot information, partition and file system structureLocation of MFTLocation of MFT
MFTMFTInformation about files, folders and free blocksInformation about files, folders and free blocks
System Files (~ 1MB)System Files (~ 1MB)Dublicate of the early part of MFTDublicate of the early part of MFTLog for recoveryLog for recovery , bittimap free/allocated blocks, attribute, bittimap free/allocated blocks, attributedefinitionsdefinitions
FileFile AreaArea
(Fig. 12.17 [Stal 05])
43
NTFSNTFS –– MFT (Master File Table)MFT (Master File Table)
Linear sequence of 1Linear sequence of 1--KB MFT recordsKB MFT recordsDescribes files or directories, one per file or directoryDescribes files or directories, one per file or directoryContains sequence of (attribute header, value) pairsContains sequence of (attribute header, value) pairs
variablevariable--size of the record usedsize of the record used
1616 first records reserved for metadatafirst records reserved for metadataNames start with dollar sign $Names start with dollar sign $
For small files, the MFT record containsFor small files, the MFT record contains also the dataalso the dataFor large files, the data on a separate storage areaFor large files, the data on a separate storage area
MFTMFT--record has the sequence of block numbersrecord has the sequence of block numbersCan continue in other (even several) MFT recordsCan continue in other (even several) MFT records
44
[Tane 01]
45
[Tane 01]
MFT RecordMFT Record
46[Tane 01]
47
[Tane 01]
MFT record for small directoryMFT record for small directory
In small directories, the MFT index of files in sequential orderIn small directories, the MFT index of files in sequential orderIn large directories, the MFT index of files in BIn large directories, the MFT index of files in B--treetree
Search for name no longer sequentialSearch for name no longer sequential
48(Fig. 11-45 [Tane08])
49
[Tane 01]
50
[Tane 01]
NTFS:NTFS: RecoveryRecoveryLog FSLog FS
Journaling, log all changes, only structural changesJournaling, log all changes, only structural changesCreated in file cache as one transactionCreated in file cache as one transaction
Change disk contentChange disk contenti.i. Create the the log entry in file cacheCreate the the log entry in file cacheii.ii. Make the file change in file cache firstMake the file change in file cache firstiii.iii. Store the log enry to disk from file cacheStore the log enry to disk from file cacheiv.iv. Write the file changed from cache to diskWrite the file changed from cache to diskv.v. CommitCommit the changesthe changes
If the system crash happens before all changes are written toIf the system crash happens before all changes are written todisk, the reboot candisk, the reboot can rollbackrollback to consistent situation using theto consistent situation using theloglog
No guarantee that the file content remainsNo guarantee that the file content remainsSystem structure remains coherentSystem structure remains coherent 51
Fig 12.18 [Stal 05]
”transaction”
”transaction”
52
(Fig 12.18 [Stal05])
Top Related