NTFS forensics

27
Page 1 of 27 NTFS Forensics: A Programmers View of Raw Filesystem Data Extraction Jason Medeiros Grayscale Research 2008

Transcript of NTFS forensics

Page 1: NTFS forensics

Page 1 of 27

NTFS Forensics: A Programmers View of Raw Filesystem Data Extraction

Jason Medeiros Grayscale Research 2008

Page 2: NTFS forensics

Table of Contents: Document Introduction .................................................................................................................... 3 Section 1: Information Pertaining To This Document...................................................................... 4 Section 1.1: Intended Audience....................................................................................................... 4 Section 1.2: Conventions in This Document ................................................................................... 4 Section 1.3: The Evolution of NTFS ................................................................................................ 5 Section 1.4: Features ...................................................................................................................... 5 Section 1.5: File Systems and Binary Trees ................................................................................... 6 Section 2: Accessing Data Through Raw Reads ............................................................................ 7 Section 2.1: Beginning The Process Of Extraction ......................................................................... 7 Figure 2.1 Opening The Physical Drive With Read Permissions .................................................... 7 Figure 2.2 Partition Structure........................................................................................................... 8 Section 2.2: Identifying a Partition Type......................................................................................... 8 Figure 2.3 Example Different Partition Types.................................................................................. 9 Section 2.3: The NTFS Boot Sector ................................................................................................ 9 Figure 2.4 NTFS Boot Sector Layout ............................................................................................ 10 Section 2.4: Master File Table (MFT)............................................................................................ 11 Figure 2.5 MFT File Entry Header ................................................................................................. 12 Section 2.5: Metafiles .................................................................................................................... 13 Figure 2.6 MFT Special Files......................................................................................................... 13 Section 2.6: MFT Record Attributes .............................................................................................. 14 Figure 2.7 MFT Record Attribute Type Table................................................................................ 14 Figure 2.8 MFT Record Attribute Header ...................................................................................... 15 Section 2.7: Extracting Resident Attribute Data ............................................................................ 17 Figure 2.9 Resident Attribute Structure ......................................................................................... 18 Section 2.8: Extracting Non-Resident Attribute Data .................................................................... 18 Figure 2.10 Nonresident Attribute Structure.................................................................................. 18 Figure 2.11 A Theoretical Length Offset Union In C ..................................................................... 20 Section 2.9: Directories and the NTFS B+ Tree............................................................................ 22 Figure 2.12 INDX Record Standard Header.................................................................................. 23 Figure 2.13 Individual INDX Record Entry Structure..................................................................... 24 Section 3: Simplified Theory of Data Extraction ............................................................................ 26 Appendix 1. Citations and References ......................................................................................... 27

Page 2 of 27

Page 3: NTFS forensics

Document Introduction The purpose of this document is to attempt, in a straightforward technical manner, to describe the process of extracting data raw from an NTFS file system from a physical device. The platform of discussion for this particular paper centers on Win32 platforms running on NTFS version 3.0 or above.

Some of the concepts discussed in this document are rather difficult to understand without a proper background in programming. That being said, section 3 of this document still contains a very simplified seven step logical breakdown of the steps required for NTFS file system data extraction. For understanding the theory without the practical programming knowledge, consult section 3 for a high-level overview.

It should be known that this paper does not intend to describe all

functional parts of the NTFS file system. For details on portions of the filesystem which are not discussed with in this paper itself a developer would be best suited to read the "ntfsdoc.pdf" file listed in the citations and references section of this document. The “ntfsdoc.pdf” document was generated by the developers responsible for the Linux NTFS drivers, and is not fully completed at the time of this document however it still maintains a collection of information more useful to a developer utilizing NTFS, than any other documentation freely available. Additionally on the same topic, for additional algorithmic information detailing how NTFS operations are performed, reading the NTFS Linux driver source code provides an excellent reference for advanced developers. The author would almost go as far as to consider reading relevant portions of the Linux drivers as a prerequisite to this paper, if only to make the paper more understandable.

Page 3 of 27

Page 4: NTFS forensics

Section 1: Information Pertaining To This Document

Section 1.1: Intended Audience

The audience for this paper should have a solid background in programming and critical thinking. So long as the programmer has an understanding of how to read and write files on a filesystem using a programming language with disk access, this paper should be understandable in the purpose of building raw NTFS extraction engines. This is not to say that this paper will be a simple experience for a novice programmer. The full extraction, from start to finish, of NTFS data takes a significant number of calculations to simply reach the data to be extracted, let alone the several more required to extract nonresident data from the physical device. However, the reward of accomplishing such a task affords a developer the ability to access data which is typically locked even by the kernel (e.g. the page file). Several forensics toolkits utilize this technology for that exact purpose, this paper attempts to document the process in full.

Section 1.2: Conventions in This Document

All code examples within this document are written utilizing the C

programming language. Before continuing any further it should be said that without knowledge of the C programming language it will be very easy to get lost within the concepts described by this document.

Additionally all C structures in this document should be considered to be

packed to a one byte alignment using the #pragma pack keyword. GNU Linux

Page 4 of 27

Page 5: NTFS forensics

can enable this functionality utilizing structure attributes. For a good example of how to do this in Linux examine the source code (specifically the headers) found in the NTFS -- 3 G. driver package.

Section 1.3: The Evolution of NTFS

Over the years there have been several iterations of the NTFS file system,

each of which bringing new features to end-users and application designers. The first iteration version 1.0 was released in 1993 and was utilized by NT 3.1 as its primary filesystem. Version 1.1 was released in 1994 with the release of NT 3.5. The third iteration version 1.2 was an update that was available for NT 3.51 and NT 4.0 mid-1995 and mid-1996. The newer Windows versions from Windows 2000 and above are utilizing either version 3.0 or version 3.1 as their core file systems. Version 3.1 as of the writing of this paper is the current version of NTFS.

Section 1.4: Features

NTFS has several features that make it a capable and robust journaling

filesystem. Features such as alternate data streams, quotas, sparse files, re-parse points, volume mount points, directory junctions, hard links, hierarchical storage management, file encryption and compression, and very high performance make NTFS a well-suited choice for various intensive filesystem tasks.

Page 5 of 27

Page 6: NTFS forensics

Section 1.5: File Systems and Binary Trees

No matter which filesystem is being discussed, they all rely on the efficiency of binary trees to make ease of complicated hierarchical storage requirements. The reason for this is that binary trees afford the file system developer the simplest and most efficient solution to the problem at hand.

Algorithmically NTFS utilizes a B+ tree in order to maintain a perfect

balance between of efficiency in both large file operations and small file operations. As a primer before reading this document consider reading the Wikipedia entry on B+ trees, as it will provide valuable insight into what theoretical data structures you will be encountering within this paper. A link to the entry at Wikipedia can be found in the citations and references section of this paper.

Page 6 of 27

Page 7: NTFS forensics

Section 2: Accessing Data Through Raw Reads

Section 2.1: Beginning The Process Of Extraction

Opening a disk for raw reading and writing is as simple as opening the device \\.\PhysicalDriveN where N represents the number of the drive you want to open for raw reading and writing. Once you open the device, navigating it becomes the next obstacle. A physical disc can hold multiple different types of partitions. Even though you're only interested as far as this paper is concerned in the content of NTFS partitions some care should be taken to detail the steps it takes to identify other partition types on a physical disk so that we can identify which partition is truly NTFS. Figure 2.1 below shows how to use the create file API routine in order to access the drive for physical reading.

Figure 2.1 Opening The Physical Drive With Read Permissions

HANDLE hd = CreateFile(“\\\\.\\PhysicalDrive0”, GENERIC_READ, FILE_SHARE_READ, 0, OPEN_EXISTING, 0, NULL);

Note: Before we continue it is important to know that when reading from an NTFS drive, if you do not read in increments of 512 bytes (or the disk sector size) the drive reads will fail.

The first sector on the primary disk is the master boot record. This record is 512 bytes long and contains an area for boot code, an optional disk signature, and a table of primary partitions with a maximum entry count of 4. This structure is not defined in this document however the Wikipedia entry on the topic has a full synopsis of all members of the MBR structure.

Page 7 of 27

Page 8: NTFS forensics

To access partition information set the file pointer of the open drive to the

hexadecimal offset 0x1BE (offset to partition table in MBR). This moves us to the beginning of our partition table on the disc. The structure which represents what our partition table looks like is shown in the structure 2.2 below.

Figure 2.2 Partition Structure typedef struct _PARTITION { BYTE chBootInd; BYTE chHead; BYTE chSector; BYTE chCylinder; BYTE chType; // <- partition type identifier BYTE chLastHead; BYTE chLastSector; BYTE chLastCylinder; DWORD dwRelativeSector; DWORD dwNumberSectors; }PARTITION, *P_PARTITION;

The partition table by itself is nothing more than a contiguous sequence of

partition structures one following the other. Multiple partition table entries can exist on the same host disk, so being able to sort through them becomes important especially in the case of file system forensics. Identifying partitions is as simple as looking up the partition type member.

Section 2.2: Identifying a Partition Type Looking at the structure of figure 2.2 you can see the parameter chType

which is used to identify a partition type. NTFS has a partition type number designation of 0x07. If you match the chType variable from a partition table entry with this number you’ve identified a NTFS partition.

Page 8 of 27

Page 9: NTFS forensics

Figure 2.3 Example Different Partition Types

Value Type 0x00 Empty 0x07 NTFS 0x83 Linux Native 0x82 Linux Swap or Solaris X86

Note: A full list of available partition type can be found at the link below. http://www.win.tue.nl/~aeb/partitions/partition_types-1.html

Section 2.3: The NTFS Boot Sector

As soon as you've identified an NTFS partition you can follow the “relative sector” number offset to the NTFS boot sector itself. Multiply the sector number by the size of a sector, and you have the correct offset from which to set the read file pointer. Figure 2.4 below describes the aspects of our boot sector.

Page 9 of 27

Page 10: NTFS forensics

Figure 2.4 NTFS Boot Sector Layout typedef struct _NTFS_PART_BOOT_SECTOR { char chJumpInstruction[3]; char chOemID[4]; char chDummy[4]; struct NTFS_BPB { WORD wBytesPerSec; BYTE uchSecPerClust; WORD wReservedSec; BYTE uchReserved[3]; WORD wUnused1; BYTE uchMediaDescriptor; WORD wUnused2; WORD wSecPerTrack; WORD wNumberOfHeads; DWORD dwHiddenSec; DWORD dwUnused3; DWORD dwUnused4; LONGLONG n64TotalSec; LONGLONG n64MFTLogicalClustNum; // ( cluster for MFT LONGLONG n64MFTMirrLogicalClustNum; int nClustPerMFTRecord; int nClustPerIndexRecord; LONGLONG n64VolumeSerialNum; DWORD dwChecksum; } bpb; char chBootstrapCode[426]; WORD wSecMark; } NTFS_BOOT_SECTOR, *P_NTFS_BOOT_SECTOR;

The NTFS boot sector contains important internal pieces of information

which can be used for the intent of disk forensics. For one it contains the bootstrap code which is used to start Windows as well as a jump instruction that is used to start the operating system. Within the NTFS boot sector you can find what is known as the BPB or bios parameter block. The bios parameter block serves as a description of the physical medium being used to store the NTFS file system itself. In Figure 2.4 above, the BPB is shown as a substructure inside the NTFS boot sector, as that is how it is placed in the boot sector during drive reading.

Page 10 of 27

Page 11: NTFS forensics

Section 2.4: Master File Table (MFT)

Perhaps the most important element found inside the BPB is the cluster

number of the Master file table. NTFS uses this Master file table as a directory of sorts, with its internal binary tree referencing the MFT similar to looking up phone numbers in a phone book if you knew a correct page number in the phone book for the person you were attempting to find. In reality a Master file table is simply a series of clusters on the physical disk which holds file entries that can be looked up by a "phone number" of sorts. This theoretical phone number, in practice, is known as the MFT record number. Each record number corresponds to one file entry, and each of these entries represents one file in the file system. In the body of each MFT file entry, information about the file is stored in a series of file attributes, stored in variably sized and labeled attribute structures.

To find the start of the MFT, first find the number of bytes per cluster by multiplying the number of sectors per cluster times the number of bytes per sector. Go ahead and put this number on the side we will be using it later. Next will be using large integer arithmetic in order to determine where our proper offset to the MFT is. Take the number of bytes per sector and multiply that first by the NT relative sector found inside the partition table, take that number and put it aside with the other result.

The next part of the calculation requires us to multiply the NT relative

sector, which can be found in the partition table, with the number of bytes per sector which we can find in the BIOS parameter block in the NTFS boot sector. This effectively grows our large integer to a number that borders the start of our NT relative data. The MFT offset is relative to this position.

To complete the calculation take the number of bytes per cluster, which

was calculated earlier, and multiply it by the logical cluster number of the MFT

Page 11 of 27

Page 12: NTFS forensics

which can be found inside the BIOS parameter block. The result of this calculation can be added to the start of the NT relative data offset which we have already calculated earlier in order to create an offset that we can use easily with the SetFilePointer() Microsoft API routine to advance the file pointer to the correct offset.

Figure 2.5 below describes the Master file table entry structure format that is used with NTFS. Each structure starts off with a four byte file signature “FILE” in ASCII at the start of each record, which makes record identification easy. Each record is 1024 bytes, or the typical size of two sectors on the system.

Figure 2.5 MFT File Entry Header typedef struct _NTFS_MFT_FILE_ENTRY_HEADER { char fileSignature[4]; WORD wFixupOffset; WORD wFixupSize; LONGLONG n64LogSeqNumber; WORD wSequence; WORD wHardLinks; WORD wAttribOffset; WORD wFlags; DWORD dwRecLength; DWORD dwAllLength; LONGLONG n64BaseMftRec; WORD wNextAttrID; WORD wFixupPattern; DWORD dwMFTRecNumber; } NTFS_MFT_FILE_ENTRY_HEADER, *P_NTFS_MFT_FILE_ENTRY_HEADER;

Page 12 of 27

Page 13: NTFS forensics

Section 2.5: Metafiles Certain files in the MFT file table have been designated by the operating system as metafiles which are of special-purpose. For example consider that the MFT itself has its own MFT entry. Various metafiles contain important information about the system. Figure 1.6 below shows a full listing of MFT metafiles. This paper primarily focuses on the MFT metafiles, specifically the root directory metafile and the MFT metafile, for the forensic process of recovery of raw file system data.

Figure 2.6 MFT Special Files

MFT record number File Name Special file purpose

0 $MFT

The Master file table used to describe the positions, names, timestamps, and all other pertinent pieces of information useful in the identification and lookup of individual files.

1 $MFTMirr A stored copy of the first four MFT record entries, which are used to restore a partition in the case of repair.

2 $LogFile Transaction log of the file system, used to record changes for file system journaling.

3 $Volume Contains various data elements, such as the volume object identifier (VOI), volume label, volume flags, and file system version.

4 $AttrDef A table of NTFS specific attributes, data structure sizes, and other useful information.

5 . Root directory.

6 $Bitmap Bit field representing if a particular cluster on the volume is either used or free.

7 $Boot Volume boot record.

Page 13 of 27

Page 14: NTFS forensics

8 $BadClus

A file which contains all the clusters marked as having bad sectors. This file simplifies cluster management by the chkdsk utility, both as a place to put newly discovered bad sectors, and for identifying unreferenced clusters.

9 $Secure Access control list database which reduces overhead having many identical ACLs stored with each file, by uniquely storing these ACLs in this database only.

10 $UpCase A table of unicode uppercased characters for ensuring case insensitivity in Win32 and DOS namespaces.

11 $Extend A filesystem directory containing various optional extensions, such as $Quota, $ObjId, $Reparse or $UsnJrnl.

Section 2.6: MFT Record Attributes

In order for the operating system to determine one MFT entry from another it uses what are known as attribute headers. The table below shows each different hexadecimal value which correspond to a record attribute type.

Figure 2.7 MFT Record Attribute Type Table

Attribute Name Hexidecimal Value Unused 0x00

Standard Information 0x10 File Name 0x30 Object ID 0x40

Security Descriptor 0x50 Volume Name 0x60

Volume Information 0x70 Data 0x80

Index Root 0x90 Index Allocation 0xa0

Bitmap 0xb0 Reparse Point 0xc0 EA Information 0xd0

EA 0xe0 Property Set 0xf0

Logged Utility Stream 0x100

Page 14 of 27

Page 15: NTFS forensics

First User Defined Attribute 0x1000 End of Attributes (records) 0xffffffff

To find the entry type of an attribute header, read in one (typically 1024

byte) MFT record from the start of file entry in the MFT into an MFT file structure as shown in figure 2.5. Next use the attribute offset member of that structure as an offset into the MFT record to the appropriate position for reading attributes. Attributes are to be read in one at a time as they are variable in length, content, purpose, and data type.

Figure 2.8 MFT Record Attribute Header typedef struct _NTFS_ATTRIBUTE { DWORD dwType; DWORD dwFullLength; BYTE uchNonResFlag; BYTE uchNameLength; WORD wNameOffset; WORD wFlags; WORD wID; union ATTR { struct RESIDENT { DWORD dwLength; WORD wAttrOffset; BYTE uchIndexedTag; BYTE uchPadding; } Resident; struct NONRESIDENT { LONGLONG n64StartVCN; LONGLONG n64EndVCN; WORD wDatarunOffset; WORD wCompressionSize; BYTE uchPadding[4]; LONGLONG n64AllocSize; LONGLONG n64RealSize; LONGLONG n64StreamSize; } NonResident; } Attr; } _NTFS_ATTRIBUTE, *P_NTFS_ATTRIBUTE;

Page 15 of 27

Page 16: NTFS forensics

After reading in an attribute, during extraction of its relevant data, it is first important to understand that the attribute header and any data header are completely separate. The attribute header is there to help the programmer find the relevant data for an attribute, not to be the relevant data header itself.

It is also important to be aware that the MFT record attribute data is stored

in two ways, both as resident and nonresident data. Resident data is literally resident within the MFT record itself. If this is the case, which it typically is for smaller files under 1024 bytes, the data for this file fits within the actual MFT record entry in the MFT table.

On the contrary, if an attribute is marked as nonresident it means that it is stored in a sequence of runs (clusters) elsewhere on the disk. At this point take note of the internal ATTR union in figure 2.8, depending on the value of the uchNonResFlag member of that structure, either one union value or the other is chosen. If the flag is not true the data is resident in the MFT record and if it is true the data is nonresident and stored elsewhere on the disk in a series of runs representing multiple clusters.

In order to properly operate on NTFS attributes, they must first be

extracted from the clusters on where they reside. The next section of this document describes how to extract attribute data from the file system. It should be noted that the data attribute literally contains information that in the next section can be used for extracting resident or nonresident attribute data.

Page 16 of 27

Page 17: NTFS forensics

Section 2.7: Extracting Resident Attribute Data

Extracting resident data from a disk is a relatively simple procedure when compared to nonresident data. In order to extract resident data you first have to determine that the data you are trying to store is in fact resident in the MFT entry. The uchNonResFlag member inside of the attribute header will indicate whether or not a records data is resident or not.

Before we continue make sure you have handy the number of bytes per

cluster. If you haven't calculated this value previously, examine your NTFS boot sector structure and multiply the two values found in the BIOS parameter block that reflect the sectors per cluster and the bytes per sector. The result of this multiplication is the number of bytes per cluster which we will be using later in this document.

To read actual data which is marked resident in a MFT entry, start from

the beginning of your MFT record entry and move the byte index of the record itself, to the attribute headers "resident" union structure. Next extract an offset from that structure, for a correct offset appropriate for reading in an attribute header. Now that you have the correct offset, set the correct offset position within a MFT record buffer so that data can begin to be copied.

The number of bytes to read from this position is found in the union

resident structure itself. Figure 2.9 below describes the resident attribute structure, and shows that the first member is a length field. Reading from the correct position, at a maximum distance of that length value, resident data can be extracted into a buffer for use by a developer.

Page 17 of 27

Page 18: NTFS forensics

Figure 2.9 Resident Attribute Structure struct RESIDENT { DWORD dwLength; WORD wAttrOffset; BYTE uchIndexedTag; BYTE uchPadding; } Resident;

Section 2.8: Extracting Non-Resident Attribute Data

The extraction of nonresident data is a bit more difficult than resident data to extract due to the additional calculations required to simply find the location of the clusters holding the data. This is because of the fact that instead of being stored resident in the header, the data is stored in one or multiple separate "runs" (clusters) on the disk.

Below in figure 2.10. the nonresident attribute structure is defined. Remember that this nonresident attribute structure is really a union with the resident attribute structure, and remember again that both of these union structures are part of the overlying NTFS attributes structure for MFT entries.

Figure 2.10 Nonresident Attribute Structure struct NONRESIDENT { LONGLONG n64StartVCN; LONGLONG n64EndVCN; WORD wDatarunOffset; WORD wCompressionSize; // compression unit size BYTE uchPadding[4]; LONGLONG n64AllocSize; LONGLONG n64RealSize; LONGLONG n64StreamSize; } NonResident;

Page 18 of 27

Page 19: NTFS forensics

Understand that data stored nonresident from its MFT entry can, and often

will achieve sizes much greater then there is physical memory available to the application. This makes reading data from physical runs into allocated buffers an iterative process which usually ends in a sort of bucket brigade with the final bucket typically being another out file. Due to this, when reading data which is nonresident, it is very important to check the size. The real size of a file with all of its runs combined is stored in a large integer value (eight bytes) as a member inside of the nonresident structure (e.g. n64RealSize).

At this point it becomes impossible to continue extraction without making a few calculations. First, for ease of use store the nonresident structures data run offset member in an easy to reference local variable, or make note of it because it will be used in calculations shortly. For the purpose of clarification, when we make our first read from our record buffer we will be first reading from this position.

Begin extraction by creating a one byte value which will represent our length offset size, the first byte from our read position should now be copied into this variable. Using the variable you created to hold the nonresident structures, data run offset earlier increment it by one byte, as this will be our next read position.

Using the byte we just saved, we will need to split it into 4 bit pieces. The one byte itself represents a bit field where the top four bits represent a length, and the last four bits represent an offset. Calculating these two values can be done by creating a split union typedef for a one byte value which can then be used as a typecast when representing the length/offset bit-field. Figure 2.11 below shows how a union structure would look if created in this fashion in the C programming language.

Page 19 of 27

Page 20: NTFS forensics

Figure 2.11 A Theoretical Length Offset Union In C typedef struct _LEN_OFFS_BITFIELD { union { BYTE val; struct { unsigned char offs:4; unsigned char len: 4; } bitfield; }; } LEN_OFFS_BITFIELD, *P_LEN_OFFS_BITFIELD;

Development Note: when using the offset member of the bit-field you have to bit shift four bits to the right in order to get the correct offset value unless using the union in figure 2.11.

To continue, create an eight byte value which will hold a copied record length for our first data run. The four bit length field in our bit field union contains the length in bytes from which to copy the full length of the data run. This length will never exceed eight bytes in length, but can often be much shorter than eight bytes. This dynamic sizing makes knowing the value in the bit field extremely important, so that a programmer will not copy in bytes which were not intended and skew results. Copy in the data from the current record position at the specific length, and advance the current read position by that same length to move to the next read offset.

From the newly set read position another read with a possible length of eight bytes will be read; create another large integer value for the purpose of storing this data. Using the offset member in the bit field shown in figure 2.11 as a length parameter, copy in the data from the current read position into our new variable using memcpy or a similar routine. Now that you've read in your offset length, advance your read position by the offset member in the bit field to position the correct placement in the read buffer.

Page 20 of 27

Page 21: NTFS forensics

To maintain usability during the extraction of data runs (clusters), a new large integer value should be created. Since this is the first run being extracted, add the value of the extracted eight byte offset to the newly created large integer value. Maintaining this offset will allow us to read multiple different data runs much easier in the long run.

Once you have found the correct NT relative sector offset in bytes, add the calculated logical cluster number for the data run to this offset to find our correct read position. The number of bytes which we will be read, should be read in increments of the systems cluster size. You should have calculated this value earlier in this document and put it aside.

Begin reading clusters one by one until you read the correct number of

bytes which is defined by our previous length calculation. Once that is finished, you have completed the extraction of the first data run. During this process you can either read the data directly into a file, or into a memory buffer. Authors Note: Preferably, due to the potential of large files when reading from a data attribute, it's usually best to read this attribute type directly into a file.

In order to extract multiple data runs, the same steps can be repeated from the next MFT records read position. Typically for loops, or while loops which keep track of our attribute position within the MFT buffer, are best suited for this purpose.

Page 21 of 27

Page 22: NTFS forensics

Section 2.9: Directories and the NTFS B+ Tree

In the world of NTFS everything is a file, and that includes directories.

When people discuss NTFS in terms of a high-performance filesystem is typically because of the way it's implemented. When people talk about B+ trees in terms of the NTFS file system they're really talking about directories and how directories reference data in the Master file table. You see a B+ tree is really almost precisely the same as it looks in Explorer, where every directory can contain files or other directories and if other directories are contained, those directories can contain more directories and so on.

In order first to continue it is important to know that one of the metafiles in

the MFT reflects the root system directory. This root system directory is the true B+ tree utilized by NTFS.

In order to determine if an MFT record entry is a directory the simplest

way to find this information is to look at record attribute headers, and examine them one by one until the end of records with the intent of looking for an attribute of type "index root". If you find this index root attribute, you are currently looking at the MFT entry for a directory. You can test this theory on the root directory in order to verify this principle.

The actual contents of a directory entry in regard to files, are stored within

a separate attribute header within the same MFT record. The index allocation attribute of an MFT record starts with a simple header then, stores contiguously in 4096 byte allocations, the full directory contents as a sequence of INDX records. Figure 2.12 shown below contains the structure of the header found at the beginning of each 4096 byte INDX record.

Page 22 of 27

Page 23: NTFS forensics

Figure 2.12 INDX Record Standard Header typedef struct _NTATTR_STANDARD_INDEX_HEADER { char magicNumber[4]; unsigned short updateSeqOffs; unsigned short sizeOfUpdateSequenceNumberInWords; LONGLONG logFileSeqNum; LONGLONG vcnOfINDX; DWORD indexEntryOffs; DWORD sizeOFEntries; DWORD sizeOfEntryAlloc; BYTE flags; BYTE padding[3]; unsigned short updateSeq; } NTATTR_STANDARD_INDEX_HEADER, *P_NTATTR_STANDARD_INDEX_HEADER;

In the standard index header the four byte structure element,

magicNumber contains the actual ASCII string "INDX" as a marker to denote the actual beginning of the index standard header. This makes it simple to find INDX records during application testing. It is important to know that INDX records are the actual B+ tree entries themselves fundamentally.

Each full INDX record as was said previously is 4096 bytes on a typical

system, each starting with a header, followed by individual INDX record entries. Figure 2.13 below shows the structure of an individual INDX record entry as it would appear when encountered in the index allocation after the INDX header on the raw disk within an INDX record.

Page 23 of 27

Page 24: NTFS forensics

Figure 2.13 Individual INDX Record Entry Structure typedef struct _NTATTR_INDEX_RECORD_ENTRY { LONGLONG mftReference; unsigned short sizeOfIndexEntry; unsigned short filenameOffset; unsigned short flags; char padding[2]; LONGLONG mftFileReferenceOfParent; LONGLONG creationTime; LONGLONG lastModified; LONGLONG lastModifiedForFileRecord; LONGLONG lastAccessTime; LONGLONG allocatedSizeOfFile; LONGLONG realFileSize; LONGLONG fileFlags; BYTE fNameLength; BYTE filenameNamespace; } _NTATTR_INDEX_RECORD_ENTRY, *P_NTATTR_INDEX_RECORD_ENTRY;

The first member of each INDX record entry contains the MFT record

number, which corresponds to the actual MFT record in the MFT itself. In order to translate this MFT record number to the actual record entry in the MFT, multiply the value by the size of an entry (typically 1024 bytes) and use that as an offset from the start of the MFT to find the record. The MFT when searching for records through offsets, has to be the complete MFT extracted from all runs (clusters) on the disc, unless the developer only wants to search through entries stored in the first data run of the MFT (e.g. metafiles).

When searching through INDX records for a particular file, accessing the

file name is of some importance. The file name character string (in Unicode), of the file for which each Index record entry belongs, can be found directly after the end of the Index record entry structure. Parts of the Index record entry structure itself contain information about the dynamic length of this Unicode string as well as a locale for the file name type (e.g. POSIX, Win32).

Page 24 of 27

Page 25: NTFS forensics

For navigating INDX records without having to go through the trouble of the calculation of elements and sizes for members, the INDX record contains a member element for the "size of the index entry". Retrieving the correct offset to the next INDX record is as simple as adding your current offset for reading with the "size of the index entry" member.

Finding the end of the index entries can be done by taking the previous

read offset and verifying that it is not larger than the INDX standard index header "size of entries" member. For loops or while loops are traditionally the best logical programming constructs for executing a search through a sequence of INDX records.

That is all there truly is to it, a balanced tree in the case of NTFS can be

witnessed as an INDX record which is a directory. Since each directory can have sub nodes, which are directories, the tree can branch effectively. This is the principle behind the B+ binary tree as implemented by Microsoft for the NTFS file system.

Page 25 of 27

Page 26: NTFS forensics

Section 3: Simplified Theory of Data Extraction

The following seven steps are the general steps that need to be taken in order to extract data from files on the raw file system. Each of these steps are documented in reliable detail in the above section two. All steps documented have been used practically to perform this task in production environments.

1. Find the partition entry which holds the NTFS partition

2. Navigate to the beginning of the Master File Table

3. Find the root directory metafile entry in the MFT and extract its index allocation attribute data.

4. Process the INDX records found within the index allocation attribute data

one by one recursively until you find a file that matches the one you are looking for.

5. In the index entry that matched, find the MFT record number and move to

that record position within the MFT.

6. Record the MFT entry and process its standard attribute headers one by one until the data attribute is encountered.

7. Use the process of attribute data extraction in order to retrieve the data

attribute, which contains the contents of the file that is being accessed.

Page 26 of 27

Page 27: NTFS forensics

Appendix 1. Citations and References Linux NTFS Driver Project. "NTFS documentation", Richard Russon and Yuval Fledel 2005 http://data.linux-ntfs.org/ntfsdoc.pdf Wikipedia entry on NTFS "NTFS", http://en.wikipedia.org/wiki/NTFS Wikipedia Entry On The BIOS Parameter Block "BIOS Parameter Block", http://en.wikipedia.org/wiki/BIOS_parameter_block Wikipedia Entry on the Master Boot Record "Master Boot Record", http://en.wikipedia.org/wiki/Master_boot_record Additional Research Note: for additional information on NTFS which was not utilized in this document but that could still be potentially useful, see the "external links section" at the bottom of the NTFS Wikipedia entry.

Page 27 of 27