Core Topics in Windows Driver Development

Core Topics

in Windows

Driver

Development

Toby Opferman

(PDF & CHM Version Edited by Maximus Byamukama)

PART 1: INTRODUCTION

This tutorial will attempt to describe how to write a simple device driver for Windows NT. There are various resources and tutorials on the internet for writing device drivers; however, they are somewhat scarce as compared to writing a hello world GUI program for Windows. This makes the search for information on

starting to write device drivers a bit harder. You may think that if there’s already one tutorial, why do you need more? The answer is that more information is always better especially when you are first beginning to understand a concept. It is always good to see information from different perspectives. People write differently and describe certain pieces of information in a different light depending on how familiar they are with a certain aspect or how they think it should be explained. This being the case, I would recommend anyone who wants to write device drivers not to stop here or somewhere else. Always find a variety of samples and code snippets and research the differences. Sometimes there are bugs and things omitted.

Sometimes there are things that are being done that aren’t necessary, and sometimes there’s information incorrect or just incomplete. This tutorial will describe how to create a simple device driver, dynamically load and unload it, and finally talk to it from user mode.

Creating a Simple Device Driver

What is a subsystem?

I need to define a starting ground before we begin to explain how to write a device driver. The starting point for this article will be the compiler. The compiler and linker generate a binary in a format that the Operating System understands. In Windows, this format is PE for Portable Executable format. In this format, there is an idea called a subsystem. A subsystem, along with other options specified in the PE header information, describes how to load an executable which also includes the entry point into the binary.

Many people use the VC++ IDE to simply create a project with some default pre-set options for the compilers (and linker) command line. This is why a lot of people may not be familiar with this concept even though they are most likely already using it if they have ever written Windows applications. Have you ever written a console application? Have you ever written a GUI application for Windows? These are different subsystems in Windows. Both of these will generate a PE binary with the appropriate subsystem information. This is also why a console application uses main where a WINDOWS application uses WinMain . When you choose these projects, VC++ simply creates a project with

/SUBSYSTEM: CONSOLE or /SUBSYSTEM: WINDOWS.

If you accidentally choose the wrong project, you can simply change this in the linker options menu rather than needing to create a new project.

There’s a point to all of this? A driver is simply linked using a different subsystem called NATIVE. Check MSDN Subsystem compiler options.

The Driver ’’’’s main

After the compiler is setup with the appropriate options, it’s probably good to start thinking about the entry

point to a driver. The first section lied a little bit about the subsystem. NATIVE can also be used to run user-

mode applications which define an entry point called NtProcessStartup . This is the default type of executable that is made when specifying NATIVE in the same way WinMain and main are found when the linker is creating an application. You can override the default entry point with your own, simply by using the -entry:<functionname> linker option. If we know we want this to be a driver, we simply need to write an entry point whose parameter list and return type matches that of a driver. The system will then load the driver when we install it and tell the system that it is a driver.

The name we use can be anything. We can call it BufferFly() if we want. The most common practice used by driver developers and Microsoft is using the name DriverEntry as its initial entry point. This means we add -entry:DriverEntry to the linker’s command line options. If you are using the DDK, this is done for you when you specify DRIVER as the type of executable to build. The DDK contains an environment that has pre-set options in the common make file directory which makes it simpler to create an application as it specifies the default options. The actual driver developer can then override these settings in the make file or simply use them as a connivance. This is essentially how DriverEntry became the somewhat official name for driver entry points.

Remember, DLLs actually are also compiled specifying WINDOWS as the subsystem, but they also have an additional switch called /DLL. There is a switch which can also be used for drivers: /DRIVER:WDM (which also sets NATIVE behind the scenes) as well as a /DRIVER:UP which means this driver cannot be loaded on a multi-processor system.

The linker builds the final binary, and based on what the options are in the PE header and how the binary is attempting to be loaded (run as an EXE through the loader, loaded by LoadLibrary , or attempting to be loaded as a driver) will define how the loading system behaves. The loading system attempts to perform some level of verification, that the image being loaded is indeed supposed to be loaded in this manner, for example. There is even, in some cases, startup code added to the binary that executes before your entry point is reached (WinMainCRTStartup calling WinMain , for example, to initialize the CRT). Your job is to simply write the application based on how you want it to be loaded and then set the correct options in the linker so it knows how to properly create the binary. There are various resources on the details of the PE format which you should be able to find if you are interested in further investigation into this area.

The options we will set for the linker will end up being the following:

/SUBSYSTEM:NATIVE /DRIVER:WDM -entry:DriverEntry

Before creating the DriverEntry

There are some things we need to go over before we simply sit down and write the DriverEntry . I know that a lot of people simply want to jump right into writing the driver and seeing it work. This is generally the case in most programming scenarios as you usually just take the code, change it around, compile it, and test it out. If you remember back to when you were first learning Windows development, it was probably the same way.

Your application probably didn’t work right away, probably crashed, or just disappeared. This was a lot of fun and you probably learned a lot, but you know that with a driver, the adventure is a little different. Not knowing what to do can end up in blue screening the system, and if your driver is loaded on boot and executes that code, you now have a problem. Hopefully, you can boot in safe mode or restore to a previous hardware configuration. That being the case, we have a few things to go over before you write the driver in order to help educate you on what you are doing before you actually do it.

The first rule of thumb is do not just take a driver and compile it with some of your changes. If you do not understand how the driver is working or how to program correctly in the environment, you are likely to cause

problems. Drivers can corrupt the integrity of the whole system, they can have bugs that don’t always occur but in some rare circumstances. Application programs can have the same type of bugs in behavior but not in root cause. As an example, there are times when you cannot access memory that is pagable. If you know how Virtual Memory works, you know that the Operating System will remove pages from memory to pull in pages that are needed, and this is how more applications can run than would have been physically possible given the memory limitations of the machine. There are places, however, when pages cannot be read into memory from disk. At these times, those drivers who work with memory can only access memory that cannot be paged out.

Where am I going with this? Well, if you allow a driver which runs under these constraints to access memory that is pagable, it may not crash as the Operating System usually tries to keep all pages in memory as long as possible. If you close an application that was running, it may still be in memory, for example! This is why a bug like this may go undetected (unless you try doing things like driver verifier) and eventually may trap. When it does, if you do not understand the basic concepts like this, you would be lost as to what the problem is and how to fix it.

There are a lot of concepts behind everything that will be described in this document. On IRQL alone, there

is a twenty page document you can find on MSDN. There,s an equally large document on IRP. I will not attempt to duplicate this information nor point out every single little detail. What I will attempt to do is give a

basic summary and point you in the direction of where to find more information. It’s important to at least know that these concepts exist and understand some basic idea behind them, before writing the driver.

What is IRQL?

The IRQL is known as the Interrupt ReQuest Level. The processor will be executing code in a thread at a particular IRQL. The IRQL of the processor essentially helps determine how that thread is allowed to be interrupted. The thread can only be interrupted by code which needs to run at a higher IRQL on the same processor. Interrupts requiring the same IRQL or lower are masked off so only interrupts requiring a higher IRQL are available for processing. In a multi-processor system, each processor operates independently at its own IRQL.

There are four IRQL levels which you generally will be dealing with, which are Passive, APC, Dispatch and DIRQL. Kernel APIs documented in MSDN generally have a note which specifies the IRQL level at which you need to be running in order to use the API. The higher the IRQL you go, the less APIs that are available for use. The documentation on MSDN defines what IRQL the processor will be running at when the particular entry point of the driver is called. DriverEntry , for example, will be called at PASSIVE_LEVEL.

PASSIVE_LEVEL

This is the lowest IRQL. No interrupts are masked off and this is the level in which a thread executing in user mode is running. Pagable memory is accessible.

APC_LEVEL

In a processor running at this level, only APC level interrupts are masked. This is the level in which Asynchronous Procedure Calls occur. Pagable memory is still accessible. When an APC occurs, the processor is raised to APC level. This, in turn, also disables other APCs from occurring. A driver can

manually raise its IRQL to APC (or any other level) in order to perform some synchronization with APCs,

for example, since APCs can’t be invoked if you are already at APC level. There are some APIs which can’t be called at APC level due to the fact that APCs are disabled, which, in turn, may disable some I/O Completion APCs.

DISPATCH_LEVEL

The processor running at this level has DPC level interrupts and lower masked off. Pagable memory cannot be accessed, so all memory being accessed must be non-paged. If you are running at Dispatch Level, the APIs that you can use greatly decrease since you can only deal with non-paged memory.

DIRQL (Device IRQL)

Generally, higher level drivers do not deal with IRQLs at this level, but all interrupts at this level or less are masked off and do not occur. This is actually a range of IRQLs, and this is a method to determine which devices have priority over other devices.

In this driver, we will basically only be working at PASSIVE_LEVEL, so we won’t have to worry about the gotchas. However, it is necessary for you to be aware of what IRQL is, if you intend to continue writing device drivers.

For more information on IRQLs and thread scheduling, refer to the following documentation, and another good source of information is here.

What is an IRP?

The IRP is called the I/O Request Packet, and it is passed down from driver to driver in the driver stack. This is a data structure that allows drivers to communicate with each other and to request work to be done by the driver. The I/O manager or another driver may create an IRP and pass it down to your driver. The IRP includes information about the operation that is being requested.

A description of the IRP data structure can be found here.

The description and usage of an IRP can go from simple to complex very easily, so we will only be describing, in general, what an IRP will mean to you. There is an article on MSDN which describes in a lot more detail (about twenty pages) of what exactly an IRP is and how to handle them. That article can be found here.

The IRP will also contain a list of sub-requests also known as the IRP Stack Location. Each driver in the device stack will generally have its own sub request of how to interpret the IRP. This data structure is the IO_STACK_LOCATION and is described on MSDN.

To create an analogy of the IRP and IO_STACK_LOCATION, perhaps you have three people who do different jobs such as carpentry, plumbing and welding. If they were going to build a house, they could have a common overall design and perhaps a common set of tools like their tool box. This includes things like power drills, etc. All of these common tools and overall design of building a house would be the IRP. Each of them has an individual piece they need to work on to make this happen, for example, the plumber needs the plans on where to put the pipe, how much pipe he has, etc. These could be interpreted as the IO_STACK_LOCATION as his specific job is to do the piping. The carpenter could be building the framework

for the house and the details of that would be in his IO_STACK_LOCATION. So, while the entire IRP is a request to build a house, each person in the stack of people has their own job as defined by the IO_STACK_LOCATION to make this happen. Once everyone has completed their job, they then complete the IRP.

The device driver we will be building will not be that complex and will basically be the only driver in the stack.

Things to Avoid

There are a lot of pitfalls that you will need to avoid but they are mostly unrelated to our simple driver. To be more informed, however, here is a list of items called things to avoid when it comes to driver development.

Create the DriverEntry routine

There is so much to explain, however, I think it’s time we simply started to develop the driver and explain as we go. It is hard to digest theory or even how code is supposed to work, without actually doing anything. You need some hands on experience so you can bring these ideas out of space and into reality.

The prototype for the DriverEntry is the following.

NTSTATUS DriverEntry(PDRIVER_OBJECT pDriverObject, PUNICODE_STRING pRegistryPath);

The DRIVER_OBJECT is a data structure used to represent this driver. The DriverEntry routine will use it to populate it with other entry points to the driver for handling specific I/O requests. This object also has a pointer to a DEVICE_OBJECT which is a data structure which represents a particular device. A single driver may actually advertise itself as handling multiple devices, and as such, the DRIVER_OBJECT maintains a linked list pointer to all the devices this particular driver services request for. We will simply be creating one device.

The Registry Path is a string which points to the location in the registry where the information for the driver was stored. The driver can use this location to store driver specific information.

The next part is to actually put things in the DriverEntry routine. The first thing we will do is create the device. You may be wondering how we are going to create a device and what type of device we should create. This is generally because a driver is usually associated with hardware but this is not the case. There are a variety of different types of drivers which operate at different levels, not all drivers work or interface directly with hardware. Generally, you maintain a stack of drivers each with a specific job to do. The highest level driver is the one that communicates with user mode, and the lowest level drivers generally just talk to other drivers and hardware. There are network drivers, display drivers, file system drivers, etc., and each has their own stack of drivers. Each place in the stack breaks up a request into a more generic or simpler request for the lower level driver to service. The highest level drivers are the ones which communicate themselves to user mode, and unless they are a special device with a particular framework (like display drivers), they can behave generally the same as other drivers just as they implement different types of operations.

As an example, take the hard disk drive. The driver which communicates to user mode does not talk directly to hardware. The high level driver simply manages the file system itself and where to put things. It then communicates where it wants to read or write from the disk to the lower level driver which may or may not talk directly to hardware. There may be another layer which then communicates that request to the actual hardware driver which then physically reads or writes a particular sector off a disk and then returns it to the

higher level. The highest level may interpret them as file data, but the lowest level driver may simply be stupid and only manage requests as far as when to read a sector based off where the read/write head is located on the disk. It could then determine what sector read requests to service, however, it has no idea what the data is and does not interpret it.

Let’s take a look at the first part of our DriverEntry .

NTSTATUS DriverEntry(PDRIVER_OBJECT pDriverObject, PUNICODE_STRING pRegistryPath) { NTSTATUS NtStatus = STATUS_SUCCESS; UINT uiIndex = 0; PDEVICE_OBJECT pDeviceObject = NULL; UNICODE_STRING usDriverName, usDosDeviceName; DbgPrint("DriverEntry Called \r\n"); RtlInitUnicodeString(&usDriverName, L"\\Device\ \Example"); RtlInitUnicodeString(&usDosDeviceName, L"\\DosD evices\\Example"); NtStatus = IoCreateDevice(pDriverObject, 0, &usDriverName, FILE_DEVICE_UNKNOWN, FILE_DEVICE_SECURE_OP EN, FALSE, &pDeviceObject );

The first thing you will notice is the DbgPrint function. This works just like printf and it prints messages out to the debugger or debug output window. You can get a tool called DBGVIEW from www.sysinternals.com and all of the information in those messages will be displayed.

You will then notice that we use a function called RtlInitUnicodeString which basically initializes a UNICODE_STRING data structure. This data structure contains basically three entries. The first is the size of the current Unicode string, the second is the maximum size that the Unicode string can be, and the third is a pointer to the Unicode string. This is used to describe a Unicode string and used commonly in drivers. The one thing to remember with UNICODE_STRING is that they are not required to be NULL terminated since there is a size parameter in the structure! This causes problems for people new to driver development as they assume a UNICODE_STRING is NULL terminated, and they blue-screen the driver. Most Unicode strings passing into your driver will not be NULL terminated, so this is something you need to be aware of.

Devices have names just like anything else. They are generally named \Device\<somename> and this is the string we were creating to pass into IoCreateDevice . The second string, \DosDevices\Example, we will get

into later as it’s not used in the driver yet. To the IoCreateDevice , we pass in the driver object, a pointer to

the Unicode string we want to call the driver, and we pass in a type of driver UNKNOWN as it’s not associated with any particular type of device, and we also pass in a pointer to receive the newly created device object. The parameters are explained in more detail at IoCreateDevice.

The second parameter we passed 0, and it says to specify the number of bytes to create for the device extension. This is basically a data structure that the driver writer can define which is unique to that device. This is how you can extend the information being passed into a device and create device contexts, etc. in which to store instance data. We will not be using this for this example.

Now that we have successfully created our \Device\Example device driver, we need to setup the Driver Object to call into our driver when certain requests are made. These requests are called IRP Major requests. There are also Minor requests which are sub-requests of these and can be found in the stack location of the IRP.

The following code populates certain requests:

for(uiIndex = 0; uiIndex < IRP_MJ_MAXIMUM_F UNCTION; uiIndex++) pDriverObject->MajorFunction[uiIndex] = Example_UnSupportedFunction; pDriverObject->MajorFunction[IRP_MJ_CLOSE] = Example_Close; pDriverObject->MajorFunction[IRP_MJ_CREATE] = Example_Create; pDriverObject->MajorFunction[IRP_MJ_DEVICE_ CONTROL] = Example_IoControl; pDriverObject->MajorFunction[IRP_MJ_READ] = Example_Read; pDriverObject->MajorFunction[IRP_MJ_WRITE] = USE_WRITE_FUNCTION;

We populate the Create , Close , IoControl , Read and Write . What do these refer to? When communicating with the user-mode application, certain APIs call directly to the driver and pass in parameters!

• CreateFile -> IRP_MJ_CREATE • CloseHandle -> IRP_MJ_CLEANUP & IRP_MJ_CLOSE • WriteFile -> IRP_MJ_WRITE • ReadFile -> IRP_MJ_READ • DeviceIoControl -> IRP_MJ_DEVICE_CONTROL

To explain, one difference is IRP_MJ_CLOSE is not called in the context of the process which created the handle. If you need to perform process related clean up, then you need to handle IRP_MJ_CLEANUP as well.

So as you can see, when a user mode application uses these functions, it calls into your driver. You may be

wondering why the user mode API says file when it doesn’t really mean file. That is true, these APIs can talk to any device which exposes itself to user mode, they are not only for accessing files. In the last piece of this article, we will be writing a user mode application to talk to our driver and it will simply do CreateFile ,

WriteFile , CloseHandle . That’s how simple it is. USE_WRITE_FUNCTION is a constant I will explain later.

The next piece of code is pretty simple, it’s the driver unload function.

pDriverObject->DriverUnload = Example_Unload;

You can technically omit this function but if you want to unload your driver dynamically, then it must be specified. If you do not specify this function once your driver is loaded, the system will not allow it to be unloaded.

The code after this is actually using the DEVICE_OBJECT, not the DRIVER_OBJECT. These two data structures

may get a little confusing since they both start with D and end with _OBJECT, so it’s easy to confuse which

one we’re using.

pDeviceObject->Flags |= IO_TYPE; pDeviceObject->Flags &= (~DO_DEVICE_INITIAL IZING);

We are simply setting the flags. IO_TYPE is actually a constant which defines the type of I/O we want to do (I defined it in example.h). I will explain this in the section on handling user-mode write requests.

The DO_DEVICE_INITIALIZING tells the I/O Manager that the device is being initialized and not to send any I/O requests to the driver. For devices created in the context of the DriverEntry , this is not needed since the I/O Manager will clear this flag once the DriverEntry is done. However, if you create a device in any function outside of the DriverEntry , you need to manually clear this flag for any device you create with IoCreateDevice . This flag is actually set by the IoCreateDevice function. We cleared it here just for fun

even though we weren’t required to.

The last piece of our driver is using both of the Unicode strings we defined above. \Device\Example and \DosDevices\Example.

IoCreateSymbolicLink(&usDosDeviceName, &usDriverNam e);

IoCreateSymbolicLink does just that, it creates a Symbolic Link in the object manager. To view the object manager, you may download my tool QuickView, or go to www.sysinternals.com and download WINOBJ. A Symbolic Link simply maps a DOS Device Name to an NT Device Name. In this example, Example is our DOS Device Name and \Device\Example is our NT Device Name.

To put this into perspective, different vendors have different drivers and each driver is required to have its own name. You cannot have two drivers with the same NT Device name. Say, you have a memory stick which can display itself to the system as a new drive letter which is any available drive letter such as E:. If you remove this memory stick and say you map a network drive to E:. Application can talk to E: the same way, they do not care if E: is a CD ROM, Floppy Disk, memory stick or network drive. How is this possible? Well, the driver needs to be able to interpret the requests and either handle them within themselves such as the case of a network redirector or pass them down to the appropriate hardware driver. This is done through symbolic links. E: is a symbolic link. The network mapped drive may map E: to \Device\NetworkRedirector and the memory stick may map E: to \Device\FujiMemoryStick, for example.

This is how applications can be written using a commonly defined name which can be abstracted to point to any device driver which would be able to handle requests. There are no rules here, we could actually map \Device\Example to E:. We can do whatever we wish to do, but in the end, however, the application attempts to use the device as how the device driver needs to respond and act. This means supporting IOCTLs commonly used by those devices as applications will try to use them. COM1, COM2, etc. are all examples of this. COM1 is a DOS name which is mapped to an NT Device name of a driver which handles serial

requests. This doesn’t even need to be a real physical serial port!

So we have defined Example as a DOS Device which points to \Device\Example. In the communicating with usermode portion, we will learn more about how to use this mapping.

Create the Unload Routine

The next piece of code we will look at is the unload routine. This is required in order to be able to unload the device driver dynamically. This section will be a bit smaller as there is not much to explain.

VOID Example_Unload(PDRIVER_OBJECT DriverObject) {

UNICODE_STRING usDosDeviceName; DbgPrint("Example_Unload Called \r\n"); RtlInitUnicodeString(&usDosDeviceName, L"\\DosD evices\\Example"); IoDeleteSymbolicLink(&usDosDeviceName); IoDeleteDevice(DriverObject->DeviceObject); }

You can do whatever you wish in your unload routine. This unload routine is very simple, it just deletes the symbolic link we created and then deletes the only device that we created which was \Device\Example.

Creating the IRP_MJ_WRITE

The rest of the functions should be self explanatory as they don’t do anything. This is why I am only choosing to explain the Write routine. If this article is liked, I may write a second tutorial on implementing the IO Control function.

If you have used WriteFile and ReadFile , you know that you simply pass a buffer of data to write data to a device or read data from a device. These parameters are sent to the device in the IRP as we explained previously. There is more to the story though as there are actually three different methods that the I/O Manager will use to marshal this data before giving the IRP to the driver. That also means that how the data

is marshaled is how the driver’s Read and Write functions need to interpret the data.

The three methods are Direct I/O, Buffered I/O and Neither.

#ifdef __USE_DIRECT__ #define IO_TYPE DO_DIRECT_IO #define USE_WRITE_FUNCTION Example_WriteDirectIO #endif #ifdef __USE_BUFFERED__ #define IO_TYPE DO_BUFFERED_IO #define USE_WRITE_FUNCTION Example_WriteBufferedIO #endif #ifndef IO_TYPE #define IO_TYPE 0 #define USE_WRITE_FUNCTION Example_WriteNeither #endif

The code was written so if you define __USE_DIRECT__ in the header, then IO_TYPE is now DO_DIRECT_IO and USE_WRITE_FUNCTION is now Example_WriteDirectIO . If you define __USE_BUFFERED__ in the header, then IO_TYPE is now DO_BUFFERED_IO and USE_WRITE_FUNCTION is now

Example_WriteBufferedIO . If you don’t define __USE_DIRECT__ or __USE_BUFFERED__, then IO_TYPE is

defined as 0 (neither) and the write function is Example_WriteNeither .

We will now go over each type of I/O.

Direct I/O

The first thing I will do is simply show you the code for handling direct I/O.

NTSTATUS Example_WriteDirectIO(PDEVICE_OBJECT Devic eObject, PIRP Irp) { NTSTATUS NtStatus = STATUS_SUCCESS; PIO_STACK_LOCATION pIoStackIrp = NULL; PCHAR pWriteDataBuffer; DbgPrint("Example_WriteDirectIO Called \r\n"); /* * Each time the IRP is passed down * the driver stack a new stack location is add ed * specifying certain parameters for the IRP to the driver. */ pIoStackIrp = IoGetCurrentIrpStackLocation(Irp) ; if(pIoStackIrp) { pWriteDataBuffer = MmGetSystemAddressForMdlSafe(Irp->MdlAddr ess, NormalPagePriority); if(pWriteDataBuffer) { /* * We need to verify that the string * is NULL terminated. Bad things can h appen * if we access memory not valid while in the Kernel. */ if(Example_IsStringTerminated(pWriteData Buffer, pIoStackIrp->Parameters.Write.Length) ) { DbgPrint(pWriteDataBuffer); } } } return NtStatus; }

The entry point simply provides the device object for the device for which this request is being sent for. If you recall, a single driver can create multiple devices even though we have only created one. The other parameter is as was mentioned before which is an IRP!

The first thing we do is call IoGetCurrentIrpStackLocation , and this simply provides us with our IO_STACK_LOCATION. In our example, the only parameter we need from this is the length of the buffer provided to the driver, which is at Parameters.Write.Length .

The way buffered I/O works is that it provides you with a MdlAddress which is a Memory Descriptor List. This is a description of the user mode addresses and how they map to physical addresses. The function we call then is MmGetSystemAddressForMdlSafe and we use the Irp->MdlAddress to do this. This operation will then give us a system virtual address which we can then use to read the memory.

The reasoning behind this is that some drivers do not always process a user mode request in the context of the thread or even the process in which it was issued. If you process a request in a different thread which is running in another process context, you would not be able to read user mode memory across process

boundaries. You should know this already, as you run two applications they can’t just read/write to each other without Operating System support.

So, this simply maps the physical pages used by the user mode process into system memory. We can then use the returned address to access the buffer passed down from user mode.

This method is generally used for larger buffers since it does not require memory to be copied. The user mode buffers are locked in memory until the IRP is completed which is the downside of using direct I/O.

This is the only downfall and is why it’s generally more useful for larger buffers.

Buffered I/O

The first thing I will do is simply show you the code for handling buffered I/O.

NTSTATUS Example_WriteBufferedIO(PDEVICE_OBJECT Dev iceObject, PIRP Irp) { NTSTATUS NtStatus = STATUS_SUCCESS; PIO_STACK_LOCATION pIoStackIrp = NULL; PCHAR pWriteDataBuffer; DbgPrint("Example_WriteBufferedIO Called \r\n") ; /* * Each time the IRP is passed down * the driver stack a new stack location is add ed * specifying certain parameters for the IRP to the driver. */ pIoStackIrp = IoGetCurrentIrpStackLocation(Irp) ; if(pIoStackIrp) { pWriteDataBuffer = (PCHAR)Irp->AssociatedIr p.SystemBuffer; if(pWriteDataBuffer) { /* * We need to verify that the string * is NULL terminated. Bad things can h appen * if we access memory not valid while in the Kernel. */ if(Example_IsStringTerminated(pWriteData Buffer, pIoStackIrp->Parameters.Write.Le ngth)) { DbgPrint(pWriteDataBuffer); } } } return NtStatus; }

As mentioned above, the idea is to pass data down to the driver that can be accessed from any context such as another thread in another process. The other reason would be to map the memory to be non-paged so the driver can also read it at raised IRQL levels.

The reason you may need to access memory outside the current process context is that some drivers create threads in the SYSTEM process. They then defer work to this process either asynchronously or synchronously. A driver at a higher level than your driver may do this or your driver itself may do it.

The downfall of using Buffered I/O is that it allocates non-paged memory and performs a copy. This is now overhead in processing every read and write into the driver. This is one of the reasons this is best used on

smaller buffers. The whole user mode page doesn’t need to be locked in memory as with Direct I/O, which is the plus side of this. The other problem with using this for larger buffers is that since it allocates non-paged memory, it would need to allocate a large block of sequential non-paged memory.

Neither Buffered nor Direct

The first thing I will do is show you the code for handling neither Buffered nor Direct I/O.

NTSTATUS Example_WriteNeither(PDEVICE_OBJECT Device Object, PIRP Irp) { NTSTATUS NtStatus = STATUS_SUCCESS; PIO_STACK_LOCATION pIoStackIrp = NULL; PCHAR pWriteDataBuffer; DbgPrint("Example_WriteNeither Called \r\n"); /* * Each time the IRP is passed down * the driver stack a new stack location is add ed * specifying certain parameters for the IRP to the driver. */ pIoStackIrp = IoGetCurrentIrpStackLocation(Irp) ; if(pIoStackIrp) { /* * We need this in an exception handler or else we could trap. */ __try { ProbeForRead(Irp->UserBuffer, pIoStackIrp->Parameters.Write.Len gth, TYPE_ALIGNMENT(char)); pWriteDataBuffer = Irp->UserBuffer; if(pWriteDataBuffer) { /* * We need to verify that the s tring * is NULL terminated. Bad thin gs can happen * if we access memory not vali d while in the Kernel. */ if(Example_IsStringTerminated(pW riteDataBuffer, pIoStackIrp->Parameters.W rite.Length)) { DbgPrint(pWriteDataBuffer); } } } __except( EXCEPTION_EXECUTE_HANDLER ) { NtStatus = GetExceptionCode(); } } return NtStatus;

}

In this method, the driver accesses the user mode address directly. The I/O manager does not copy the data, it does not lock the user mode pages in memory, it simply gives the driver the user mode address buffer.

The upside of this is that no data is copied, no memory is allocated, and no pages are locked into memory. The downside of this is that you must process this request in the context of the calling thread so you will be able to access the user mode address space of the correct process. The other downside of this is that the process itself can attempt to change access to the pages, free the memory, etc., on another thread. This is why you generally want to use ProbeForRead and ProbeForWrite functions and surround all the code in an

exception handler. There’s no guarantee that at any time the pages could be invalid, you can simply attempt

to make sure they are, before you attempt to read or write. This buffer is stored at Irp->UserBuffer .

What’s this #pragma stuff?

These directives you see simply let the linker know what segment to put the code and what options to set on the pages. The DriverEntry , for example, is set as INIT which is a discardable page. This is because you only need that function during initialization.

Homework!

Your homework is to create the Read routines for each type of I/O processing. You can use the Write routines as reference to figure out what you need to do.

Dynamically Loading and Unloading the Driver

A lot of tutorials will go and explain the registry, however, I have chosen not to at this time. There is a simple user mode API that you can use to load and unload the driver without having to do anything else. This is what we will use for now.

Collapse Collapse

int _cdecl main(void) { HANDLE hSCManager; HANDLE hService; SERVICE_STATUS ss; hSCManager = OpenSCManager(NULL, NULL, SC_MANAG ER_CREATE_SERVICE); printf("Load Driver\n"); if(hSCManager) { printf("Create Service\n"); hService = CreateService(hSCManager, "Examp le", "Example Driver", SERVICE_START | D ELETE | SERVICE_STOP, SERVICE_KERNEL_DR IVER, SERVICE_DEMAND_ST ART, SERVICE_ERROR_IGN ORE, "C:\\example.sys" , NULL, NULL, NULL, NULL, NULL);

if(!hService) { hService = OpenService(hSCManager, "Exa mple", SERVICE_START | DELETE | SER VICE_STOP); } if(hService) { printf("Start Service\n"); StartService(hService, 0, NULL); printf("Press Enter to close service\r\ n"); getchar(); ControlService(hService, SERVICE_CONTRO L_STOP, &ss); DeleteService(hService); CloseServiceHandle(hService); } CloseServiceHandle(hSCManager); } return 0; }

This code will load the driver and start it. We load the driver with �SERVICE_DEMAND_START� which means this driver must be physically started. It will not start automatically on boot, that way we can test it, and if we blue-screen, we can fix the issue without having to boot to safe mode.

This program will simply pause. You can then run the application that talks to the service, in another window. The code above should be pretty easy to understand that you need to copy the driver to C:\example.sys in order to use it. If the service fails to create, it knows it has already been created and opens it. We then start the service and pause. Once you press Enter, we stop the service, delete it from the list of services, and exit. This is very simple code and you can modify it to serve your purposes.

Communicating to the Device Driver

The following is the code that communicates to the driver.

int _cdecl main(void) { HANDLE hFile; DWORD dwReturn; hFile = CreateFile("\\\\.\\Example", GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL); if(hFile) { WriteFile(hFile, "Hello from user mode!", sizeof("Hello from user mode!"), &dwReturn, NULL); CloseHandle(hFile); }

return 0; }

This is probably simpler than you thought. If you compile the driver three times using the three different methods of I/O, the message sent down from user mode should be printed in DBGVIEW. As you notice, you simply need to open the DOS Device Name using \\.\<DosName>. You could even open \Device\<Nt Device

Name> using the same method. You will then create a handle to the device and you can call WriteFile , ReadFile , CloseHandle , DeviceIoControl ! If you want to experiment, simply perform actions and use DbgPrint to show what code is being executed in your driver.

Conclusion

This article showed a simple example of how to create a driver, install it, and access it via a simple user mode application. You may use the associated source files to change and experiment. If you wish to write

drivers, it’s best to read up on many of the basic concepts of drivers, especially, some of the ones linked

PART 2: IMPLEMENTING IOCTLS

Introduction

This is the second tutorial of the Writing Device Drivers series. There seems to be a lot of interest in the topic, so this article will pick up where the first left off. The main focus of these articles will be to build up little by little the knowledge needed to write device drivers. In this article, we will be building on the same example source code used in part one. In this article, we will expand on that code to include Read functionality, Handle Input/Ouput Controls also known as IOCTLs, and learn a bit more about IRPs.

F.A.Q.

Before we begin with this article, here is a small list of some frequently asked questions that we can clear up.

Where do I get the DDK?

Microsoft allows MSDN subscribers to download the DDK from their website. If you do not subscribe, they sometimes allow new DDKs to be openly downloaded by the public for a certain period of time. At the time of this article, no DDKs are available for download, so if you are not a subscriber, you can request they mail you the DDK CD for the cost of shipping and handling. You can order the DDK from here.

Can I include windows.h in my driver?

You cannot mix the Windows SDK header files with the Windows DDK header files. They have definitions that will conflict and you will have trouble getting the code to compile. Sometimes there are user-mode applications which like to include part of the DDK. These generally will have to take out the types they want to define from the DDK or SDK and put them directly in their file. The other popular approach used where possible is to separate the files into DDK and SDK usage so each .C can include the appropriate headers without conflict.

Can I implement x type driver like this?

This is the general generic framework for which mostly all drivers are built upon in Windows. Drivers do not have to implement hardware, and as mentioned in the first tutorial, there is usually a stack of drivers. If you are looking to implement a specific type of driver, this is a starting point to understand in general how drivers work. The difference then becomes how you advertise your device to the system, what IOCTLs you implement, what drivers you communicate to underneath your driver, and any additional pieces you are required to implement such as supporting drivers or even user mode components. If you are looking to implement a specific type of driver, you will want to read information specific to that driver on MSDN, in the DDK and other places. There are sometimes other frameworks which actually encapsulate most of what we are doing here so its easier to write for example.

Can I use the C or C++ runtime in a driver?

You should avoid using these in a driver and instead use the equivalent kernel mode APIs. Kernel Run Time Library also includes a subtopic on Safe String Functions. When programming in the kernel, there are some pitfalls you may need to be aware of, and if you never look up the real kernel API, you may never be aware of these since you would never have read the "remarks" section for example. The kernel APIs also tell you at what IRQL you can use each of the functions. It is a lot safer and in your best interest to avoid the standard

run time as it will save you time from tracking down bugs and making simple common mistakes in your code.

Implementing the ReadFile

The first article left this as homework so even if you have not completed your homework, here are the answers. There are three types of I/O as we discussed previously and these are Direct, Buffered and Neither. I have implemented all three of these in the example driver. The difference is that instead of reading the memory, we write to the memory. I will not explain all three types of I/O as they are identical. What I will explain is the new functionality that I have added: return values!

In the WriteFile implementation, we didn’t need to worry about the return value. Proper implementations should always inform the user mode application how much data was written, however, I omitted this detail for simplicity at the time. This will become essential with the ReadFile implementation if not only for properly informing the user mode application but to let the I/O Manager know as well.

If you recall how Buffered I/O works for example, the memory buffer is created in another location and the user mode memory is copied. If we want to read data from the driver, the I/O manager needs to know how much memory to copy from this temporary buffer to the real user mode memory location! If we dont do this, no memory will be copied and the user mode application will not get any data!

NTSTATUS Example_ReadDirectIO(PDEVICE_OBJECT Device Object, PIRP Irp) { NTSTATUS NtStatus = STATUS_BUFFER_TOO_SMALL; PIO_STACK_LOCATION pIoStackIrp = NULL; PCHAR pReturnData = "Example_ReadDirectIO - Hel lo from the Kernel!"; UINT dwDataSize = sizeof("Example_ReadDirectIO - Hello from the Kernel!"); UINT dwDataRead = 0; PCHAR pReadDataBuffer; DbgPrint("Example_ReadDirectIO Called \r\n"); /* * Each time the IRP is passed down the driver stack a * new stack location is added * specifying certain parameters for the IRP to the * driver. */ pIoStackIrp = IoGetCurrentIrpStackLocation(Irp) ; if(pIoStackIrp && Irp->MdlAddress) { pReadDataBuffer = MmGetSystemAddressForMdlS afe(Irp->MdlAddress, NormalPagePriori ty); if(pReadDataBuffer && pIoStackIrp->Parameters.Read.Length >= dwDataSize) { /* * We use "RtlCopyMemory" in the kernel instead * of memcpy. * RtlCopyMemory *IS* memcpy, however i t's best * to use the * wrapper in case this changes in the future.

*/ RtlCopyMemory(pReadDataBuffer, pReturnD ata, dwDataSize); dwDataRead = dwDataSize; NtStatus = STATUS_SUCCESS; } }

Implementing Return Values

The return value is implemented using the IO_STATUS_BLOCK of the IRP. This contains a few data members which vary their use depending on the major function being implemented. In the major functions we are implementing, Status is equal to the return code and Information contains the number of bytes read or written. Looking at the new code, you also notice that we are now calling IoCompleteRequest . What does this all mean?

The IoCompleteRequest is always called by the driver after it completes the IRP. The reason we werent doing this in the previous example is that the I/O Manager being a nice guy will in most cases complete this for us. However, it is proper for the driver to complete the IRP where necessary. This location contains a document on IRP Handling which can supply more information.

Collapse

Collapse

Irp->IoStatus.Status = NtStatus; Irp->IoStatus.Information = dwDataRead; IoCompleteRequest(Irp, IO_NO_INCREMENT); return NtStatus; }

The second parameter of the IoCompleteRequest specifies the priority boost to give the thread waiting for this IRP to complete. As an example, perhaps the thread has been waiting a long time for a network operation. This boost helps the scheduler re-run this thread sooner than it may have if it simply went back into the ready queue without a boost. To put this quite simply, it's basically a helper being used to inform the scheduler to re-run the thread waiting for this I/O.

Stricter Parameter Validation and Error Checking

The code now implements a little more error checking and parameter validation than it previously did. This is one thing that you want to make sure with your driver, that a user mode application shouldnt be able to send invalid memory locations, etc. to the driver and blue screen the system. The driver implementation should also do a little better on the errors it returns to the user mode driver instead of just STATUS_SUCCESS all the time. We need to inform the user mode process if it needs to send us more data or attempt to determine exactly when wrong. You like APIs which you can call GetLastError to see why they failed or use the return value to determine how to fix your code. If your driver simply returns failed or even better success all the time, it becomes harder to know how to make your application work properly with the driver.

Input/Output Controls (IOCTL)

The IOCTL is used as more of a communication between the driver and application rather than simply reading or writing data. Generally, the driver exports a number of IOCTLs and defines data structures that would be used in this communication. Generally, these data structures should not contain pointers since the I/O Manager cannot interpret these structures. All data should be contained in the same block. If you want to create pointers, you can do things such as create offsets into the block of data past the end of the static data so the driver can easily find this information. If you do remember however, the driver does have the ability to read user mode data as long as its in the context of the process. So, it is possible to implement pointers to memory and the driver would need to copy the pages or lock the pages in memory (implement basically buffered or direct I/O from within the driver itself, which can be done). The user mode process will use the DeviceIoControl API to perform this communication.

Defining the IOCTL

The first thing we need to do is define the IOCTL code to be used between the application and the driver. I will essentially be summarizing this article on MSDN here. First, to relate the IOCTL to something in user mode, you may think of it as a Windows Message. Its simply a value used by the driver to implement some requested function with predefined input and output values. There is a little more to this value than a Windows Message however. The IOCTL defines the access required in order to issue the IOCTL as well as the method to be used when transferring the data between the driver and the application.

The IOCTL is a 32 bit number. The first two low bits define the transfer type which can be METHOD_OUT_DIRECT, METHOD_IN_DIRECT, METHOD_BUFFERED or METHOD_NEITHER.

The next set of bits from 2 to 13 defines the Function Code. The high bit is referred to as the custom bit. This is used to determine user-defined IOCTLs versus system defined. This means that function codes 0x800 and greater are custom defined similar to how WM_USER works for Windows Messages.

The next two bits define the access required to issue the IOCTL. This is how the I/O Manager can reject IOCTL requests if the handle has not been opened with the correct access. The access types are such as FILE_READ_DATA and FILE_WRITE_DATA for example.

The last bits represent the device type the IOCTLs are written for. The high bit again represents user defined values.

There is a macro we can use to define our IOCTLs quickly and it is CTL_CODE. I have used it in public.h to define four IOCTLs which implement different types of access transfer methods.

/* * IOCTL's are defined by the following bit layou t. * [Common |Device Type|Required Access|Custom|Func tion Code|Transfer Type] * 31 30 16 15 14 13 12 2 1 0 * * Common - 1 bit. This is set for user -defined * device types. * Device Type - This is the type of device t he IOCTL * belongs to. This can be use r defined * (Common bit set). This must match the * device type of the device ob ject. * Required Access - FILE_READ_DATA, FILE_WRITE_D ATA, etc. * This is the required access for the * device. * Custom - 1 bit. This is set for user -defined * IOCTL's. This is used in th e same

* manner as "WM_USER". * Function Code - This is the function code th at the * system or the user defined ( custom * bit set) * Transfer Type - METHOD_IN_DIRECT, METHOD_OUT _DIRECT, * METHOD_NEITHER, METHOD_BUFFE RED, This * the data transfer method to be used. * */ #define IOCTL_EXAMPLE_SAMPLE_DIRECT_IN_IO \ CTL_CODE(FILE_DEVICE_UNKNOWN, \ 0x800, \ METHOD_IN_DIRECT, \ FILE_READ_DATA | FILE_WRITE_DATA) #define IOCTL_EXAMPLE_SAMPLE_DIRECT_OUT_IO \ CTL_CODE(FILE_DEVICE_UNKNOWN, \ 0x801, \ METHOD_OUT_DIRECT, \ FILE_READ_DATA | FILE_WRITE_DATA) #define IOCTL_EXAMPLE_SAMPLE_BUFFERED_IO \ CTL_CODE(FILE_DEVICE_UNKNOWN, \ 0x802, \ METHOD_BUFFERED, \ FILE_READ_DATA | FILE_WRITE_DATA) #define IOCTL_EXAMPLE_SAMPLE_NEITHER_IO \ CTL_CODE(FILE_DEVICE_UNKNOWN, \ 0x803, \ METHOD_NEITHER, \ FILE_READ_DATA | FILE_WRITE_DATA)

The above displays how we defined our IOCTLs.

Implementing the IOCTL

The first thing that simply needs to occur is essentially a switch statement which distributes the IOCTL to the appropriate implementation. This is essentially the same thing a Windows procedure does to dispatch Windows messages. There is no such thing as a "def IOCTL proc" though!

The "Parameters.DeviceIoControl.IoControlCode " of the IO_STACK_LOCATION contains the IOCTL code being invoked. The following code is essentially a switch statement which dispatches each IOCTL to its implementation.

NTSTATUS Example_IoControl(PDEVICE_OBJECT DeviceObj ect, PIRP Irp) { NTSTATUS NtStatus = STATUS_NOT_SUPPORTED; PIO_STACK_LOCATION pIoStackIrp = NULL; UINT dwDataWritten = 0; DbgPrint("Example_IoControl Called \r\n"); pIoStackIrp = IoGetCurrentIrpStackLocation(Irp) ; if(pIoStackIrp) /* Should Never Be NULL! */

{ switch(pIoStackIrp->Parameters.DeviceIoCont rol.IoControlCode) { case IOCTL_EXAMPLE_SAMPLE_DIRECT_IN_IO: NtStatus = Example_HandleSampleIoc tl_DirectInIo(Irp, pIoStackIrp, &dwDa taWritten); break; case IOCTL_EXAMPLE_SAMPLE_DIRECT_OUT_IO : NtStatus = Example_HandleSampleIoc tl_DirectOutIo(Irp, pIoStackIrp, &dwDa taWritten); break; case IOCTL_EXAMPLE_SAMPLE_BUFFERED_IO: NtStatus = Example_HandleSampleIoc tl_BufferedIo(Irp, pIoStackIrp, &dwDa taWritten); break; case IOCTL_EXAMPLE_SAMPLE_NEITHER_IO: NtStatus = Example_HandleSampleIoc tl_NeitherIo(Irp, pIoStackIrp, &dwDa taWritten); break; } } Irp->IoStatus.Status = NtStatus; Irp->IoStatus.Information = dwDataWritten; IoCompleteRequest(Irp, IO_NO_INCREMENT); return NtStatus; }

If you understand the ReadFile and WriteFile implementations, these simply implement both in one call. This obviously doesn't have to be the case, IOCTLs can be used to only read data, only write data, or not send any data at all but simply inform or instruct the driver to perform an action.

METHOD_x_DIRECT

The METHOD_IN_DIRECT and METHOD_OUT_DIRECT can essentially be explained at the same time. They are basically the same. The INPUT buffer is passed in using "BUFFERED" implementation. The output buffer is passed in using the MdlAddress as explained in the Read/Write implementations. The difference between "IN" and "OUT" is that with "IN", you can use the output buffer to pass in data! The "OUT" is only used to return data. The driver example we have doesn't use the "IN" implementation to pass in data, and essentially the "OUT" and "IN" implementations are the same in the example. Since this is the case, I will just show you the "OUT" implementation.

NTSTATUS Example_HandleSampleIoctl_DirectOutIo(PIRP Irp, PIO_STACK_LOCATION pIoStackIrp, UINT *pdwData Written) { NTSTATUS NtStatus = STATUS_UNSUCCESSFUL; PCHAR pInputBuffer; PCHAR pOutputBuffer; UINT dwDataRead = 0, dwDataWritten = 0; PCHAR pReturnData = "IOCTL - Direct Out I/O Fro m Kernel!"; UINT dwDataSize = sizeof("IOCTL - Direct Out I/ O From Kernel!"); DbgPrint("Example_HandleSampleIoctl_DirectOutIo Called \r\n");

/* * METHOD_OUT_DIRECT * * Input Buffer = Irp->AssociatedIrp.SystemB uffer * Ouput Buffer = Irp->MdlAddress * * Input Size = Parameters.DeviceIoContro l.InputBufferLength * Output Size = Parameters.DeviceIoContro l.OutputBufferLength * * What's the difference between METHOD_IN_DIRE CT && METHOD_OUT_DIRECT? * * The function which we implemented METHOD_IN_ DIRECT * is actually *WRONG*!!!! We are using the ou tput buffer * as an output buffer! The difference is that METHOD_IN_DIRECT creates * an MDL for the outputbuffer with * *READ* access so the user mode application * can send large amounts of data to the driver for reading. * * METHOD_OUT_DIRECT creates an MDL * for the outputbuffer with *WRITE* access so the user mode * application can recieve large amounts of dat a from the driver! * * In both cases, the Input buffer is in the sa me place, * the SystemBuffer. There is a lot * of consfusion as people do think that * the MdlAddress contains the input buffer and this * is not true in either case. */ pInputBuffer = Irp->AssociatedIrp.SystemBuffer; pOutputBuffer = NULL; if(Irp->MdlAddress) { pOutputBuffer = MmGetSystemAddressForMdlSafe(Irp->MdlAddr ess, NormalPagePriority); } if(pInputBuffer && pOutputBuffer) { /* * We need to verify that the string * is NULL terminated. Bad things can happe n * if we access memory not valid while in t he Kernel. */ if(Example_IsStringTerminated(pInputBuffer, pIoStackIrp->Parameters.DeviceIoControl.In putBufferLength, &dwDataRead)) { DbgPrint("UserModeMessage = '%s'", pInp utBuffer); DbgPrint("%i >= %i", pIoStackIrp->Parameters.DeviceIoContr ol.OutputBufferLength, dwDataSize); if(pIoStackIrp-> Parameters.DeviceIoControl.OutputBuff erLength >= dwDataSize) { /* * We use "RtlCopyMemory" in the ke rnel instead of memcpy.

* RtlCopyMemory *IS* memcpy, howev er it's best to use the * wrapper in case this changes in the future. */ RtlCopyMemory(pOutputBuffer, pRetur nData, dwDataSize); *pdwDataWritten = dwDataSize; NtStatus = STATUS_SUCCESS; } else { *pdwDataWritten = dwDataSize; NtStatus = STATUS_BUFFER_TOO_SMALL; } } } return NtStatus; }

As homework, see if you can change the "IN" method to work correctly. Pass input data through the output buffer and display it.

METHOD_BUFFERED

The METHOD_BUFFERED implementation does essentially the same thing as the Read and Write implementations. A buffer is allocated and the data is copied from this buffer. The buffer is created as the larger of the two sizes, the input or output buffer. Then the read buffer is copied to this new buffer. Before you return, you simply copy the return data into the same buffer. The return value is put into the IO_STATUS_BLOCK and the I/O Manager copies the data into the output buffer.

NTSTATUS Example_HandleSampleIoctl_BufferedIo(PIRP Irp, PIO_STACK_LOCATION pIoStackIrp, UINT *pdwDa taWritten) { NTSTATUS NtStatus = STATUS_UNSUCCESSFUL; PCHAR pInputBuffer; PCHAR pOutputBuffer; UINT dwDataRead = 0, dwDataWritten = 0; PCHAR pReturnData = "IOCTL - Buffered I/O From Kernel!"; UINT dwDataSize = sizeof("IOCTL - Buffered I/O From Kernel!"); DbgPrint("Example_HandleSampleIoctl_BufferedIo Called \r\n"); /* * METHOD_BUFFERED * * Input Buffer = Irp->AssociatedIrp.SystemB uffer * Ouput Buffer = Irp->AssociatedIrp.SystemB uffer * * Input Size = Parameters.DeviceIoContro l.InputBufferLength * Output Size = Parameters.DeviceIoContro l.OutputBufferLength * * Since they both use the same location * so the "buffer" allocated by the I/O * manager is the size of the larger value ( Output vs. Input) */ pInputBuffer = Irp->AssociatedIrp.SystemBuffer; pOutputBuffer = Irp->AssociatedIrp.SystemBuffer ;

if(pInputBuffer && pOutputBuffer) { /* * We need to verify that the string * is NULL terminated. Bad things can happe n * if we access memory not valid while in t he Kernel. */ if(Example_IsStringTerminated(pInputBuffer, pIoStackIrp->Parameters.DeviceIoControl.In putBufferLength, &dwDataRead)) { DbgPrint("UserModeMessage = '%s'", pInp utBuffer); DbgPrint("%i >= %i", pIoStackIrp->Parameters.DeviceIoContr ol.OutputBufferLength, dwDataSize); if(pIoStackIrp->Parameters.DeviceIoCont rol.OutputBufferLength >= dwDataSize) { /* * We use "RtlCopyMemory" in the ke rnel instead of memcpy. * RtlCopyMemory *IS* memcpy, howev er it's best to use the * wrapper in case this changes in the future. */ RtlCopyMemory(pOutputBuffer, pRetur nData, dwDataSize); *pdwDataWritten = dwDataSize; NtStatus = STATUS_SUCCESS; } else { *pdwDataWritten = dwDataSize; NtStatus = STATUS_BUFFER_TOO_SMALL; } } } return NtStatus; }

METHOD_NEITHER

This is also the same as implementing neither I/O. The original user mode buffers are passed into the driver.

NTSTATUS Example_HandleSampleIoctl_NeitherIo(PIRP I rp, PIO_STACK_LOCATION pIoStackIrp, UINT *pdwDat aWritten) { NTSTATUS NtStatus = STATUS_UNSUCCESSFUL; PCHAR pInputBuffer; PCHAR pOutputBuffer; UINT dwDataRead = 0, dwDataWritten = 0; PCHAR pReturnData = "IOCTL - Neither I/O From K ernel!"; UINT dwDataSize = sizeof("IOCTL - Neither I/O F rom Kernel!"); DbgPrint("Example_HandleSampleIoctl_NeitherIo C alled \r\n"); /* * METHOD_NEITHER * * Input Buffer = Parameters.DeviceIoControl .Type3InputBuffer * Ouput Buffer = Irp->UserBuffer

* * Input Size = Parameters.DeviceIoContro l.InputBufferLength * Output Size = Parameters.DeviceIoContro l.OutputBufferLength * */ pInputBuffer = pIoStackIrp->Parameters.DeviceIo Control.Type3InputBuffer; pOutputBuffer = Irp->UserBuffer; if(pInputBuffer && pOutputBuffer) { /* * We need this in an exception handler or else we could trap. */ __try { ProbeForRead(pInputBuffer, pIoStackIrp->Parameters.Device IoControl.InputBufferLength, TYPE_ALIGNMENT(char)); /* * We need to verify that the strin g * is NULL terminated. Bad things c an happen * if we access memory not valid wh ile in the Kernel. */ if(Example_IsStringTerminated(pInput Buffer, pIoStackIrp->Parameters.DeviceI oControl.InputBufferLength, &dwDataRead)) { DbgPrint("UserModeMessage = '%s '", pInputBuffer); ProbeForWrite(pOutputBuffer, pIoStackIrp->Parameters.Devic eIoControl.OutputBufferLength, TYPE_ALIGNMENT(char)); if(pIoStackIrp-> Parameters.DeviceIoControl.Ou tputBufferLength >= dwDataSize) { /* * We use "RtlCopyMemory" * in the kernel instead of memcpy. * RtlCopyMemory *IS* memcp y, * however it's best to use the * wrapper in case this cha nges in the future. */ RtlCopyMemory(pOutputBuffer , pReturnData, dwDataSize); *pdwDataWritten = dwDataSiz e; NtStatus = STATUS_SUCCESS; } else { *pdwDataWritten = dwDataSiz e; NtStatus = STATUS_BUFFER_TO O_SMALL; } } } __except( EXCEPTION_EXECUTE_HANDLER ) {

NtStatus = GetExceptionCode(); } } return NtStatus; }

Calling DeviceIoControl

This is a very simple implementation.

ZeroMemory(szTemp, sizeof(szTemp)); DeviceIoControl(hFile, IOCTL_EXAMPLE_SAMPLE_DIRECT _IN_IO, "** Hello from User Mode Di rect IN I/O", sizeof("** Hello from User Mode Direct IN I/O"), szTemp, sizeof(szTemp), &dwReturn, NULL); printf(szTemp); printf("\n"); ZeroMemory(szTemp, sizeof(szTemp)); DeviceIoControl(hFile, IOCTL_EXAMPLE_SAMPLE_DIREC T_OUT_IO, "** Hello from User Mode D irect OUT I/O", sizeof("** Hello from User Mode Direct OUT I/O"), szTemp, sizeof(szTemp), &dwReturn, NULL); printf(szTemp); printf("\n"); ZeroMemory(szTemp, sizeof(szTemp)); DeviceIoControl(hFile, IOCTL_EXAMPLE_SAMPLE_BUFFER ED_IO, "** Hello from User Mode Bu ffered I/O", sizeof("** Hello from User Mode Buffered I/O"), szTemp, sizeof(szTemp), &dwReturn, NULL); printf(szTemp); printf("\n"); ZeroMemory(szTemp, sizeof(szTemp)); DeviceIoControl(hFile, IOCTL_EXAMPLE_SAMPLE_NEITHE R_IO, "** Hello from User Mode Ne ither I/O", sizeof("** Hello from User Mode Neither I/O"), szTemp, sizeof(szTemp), &dwReturn, NULL); printf(szTemp);

printf("\n");

System Memory Layout

This is probably a good time to look at how Windows memory layout looks. To show how this works, we need to first show how Intel processors implement Virtual Memory. I will explain the general implementation as there are a few variations of how this can be implemented. This is basically called the Virtual Address Translation. The following is an excerpt from another document that I have been writing on debugging.

Virtual Address Translation

All segment registers become selectors in protected mode. To get more familiar with how the x86 operates, we will go over the paging mechanism as an overview and not in detail. This is not a systems programming guide.

There are other registers in the CPU which point to descriptor tables. These tables define certain system attributes which we will not go into detail. Instead, we will discuss the process of converting a virtual address into a physical address. The descriptor table can define an offset which is then added to the virtual address. If paging is not enabled, once you add these two addresses, you get the physical address. If paging is enabled, you get instead a linear address which is then converted to a physical address using page tables.

There is a paging mechanism that is called Page Address Extensions which was originally introduced in the Pentium Pro. This mechanism allows Page Tables to reference up to 36 bit addresses. However, offsets are still 32 bit, so while you can access Physical Ram up to 36 bits, you can only access 4 GB at a time without

reloading the page tables. This paging mechanism is not what we will be discussing here, but it is very similar.

The normal 32 bit paging is done using the following. There is a CPU register that points to the base of the Page Directory Table, called CR3. The diagram below displays how the paging mechanism works. Notice that the location of the physical page does not need to be linear with the virtual address or even with the previous page table entry. The blue lines are involved in the example translation and the black lines are further examples of how the page tables could be setup.

The Page Directory Table has entries which each point to a structure of Page Table Entries. The entries in the Page Table Entry point to the beginning of a page in the physical RAM. While Windows and most other Operating Systems use 4k pages, the CPU actually can support 4k and 2MB pages.

The entire process can be listed in the following steps if the pages are defined as 4k.

1. The selector points to the Descriptor Table Entry. 2. The Descript Table Entry base offset is added to the offset of the virtual address creating a linear

address. 3. Bits 31-22 of the Linear Address index into the Page Directory Table pointed to by CR3. 4. The entry in the Page Directory Table points to the base of a Page Entry Table which is then indexed

by the bits 21-12 indexed into this table to retrieve a Page Table Entry. 5. The Page Table Entry aside from containing information about whether the address is paged to disk

points to the base location of the page in Physical Memory. 6. The remaining bits of the Linear Address, bits 11 0 are added to the start of the physical page to

create the final physical address.

Windows Implementation

If you generally ignore the implementation of the descriptor tables, the address translation should be quite simple to follow. The address is just divided into sections which help index into memory tables that eventually point to the location of a physical page. The last index simply indexes into that physical page.

Windows implements essentially three separate layers of virtual address ranges. The first would be the user-mode addresses. These addresses are essentially unique per-process. That is, each process will have its own memory addresses in this range. Of course, there are optimizations such as different page tables pointing to the same physical memory location in order to share code and not duplicate memory that is essentially static.

The second range of addresses would be those in the session space. If you have used Fast User Switching or Terminal Services, you know that each user essentially gets their own desktop. There are certain drivers which run in what is called Session Space which is memory that is unique per-session. In this memory are things like the display driver, win32k.sys and some printer drivers. This is one reason why Windows does not span sessions, i.e., you cannot do FindWindow and see a window on another users desktop.

The last is the range of addresses known as System Space. This is memory that is shared throughout the entire system and accessible anywhere. This is where our driver lies and where most drivers lie.

So, what happens? Every time a thread is switched, CR3 is reloaded with the appropriate pointer which points to the page tables accessible by that thread. The implementation is that each process has its own page directory pointer and its loaded into CR3. This is essentially how Windows isolates each process from one another, they all have their own directory pointer. This directory pointer is implemented in a way that processes in the same session map the same session space, and all processes on the system map system memory. The only memory ranges implemented unique per-process is essentially the user mode address ranges.

The /PAE Switch

This is called "Physical Address Extensions". It basically means the OS can map 36 bit physical memory into 32 bits. This doesn't mean that you can access > 4 GB of memory at the same time, it means that higher memory addresses can be mapped into 32 bits which means the process can access it. This also means that the OS could use this ability to use machines with > 4 GB of physical memory. So while one process may not access > 4 GB, the OS can manage the memory in a way that it can keep more pages in memory at the same time.

There are also special APIs that an application can use to manage memory itself and use > 4GB of memory. These are called "AWE" or Address Windows Extensions. You can find more information on these at this URL: MSDN.

The /3GB Switch

There is a switch that you may have heard about and its called the /3GB switch. This essentially allows user mode to have 3 GB of address space. Normally, the 4 GB range is divided into two. There is 2 GB of address space for user mode and 2 GB of address space for kernel mode. This essentially means that user mode addresses do not have the high bit (bit 31) set while kernel mode addresses have bit 31 set. This means that 0x78000000 is a user mode address while 0x82000000 is a kernel mode address. Setting the /3GB switch will then allow user mode processes to maintain more memory but the kernel will have less memory. There are upsides and downsides to this.

The general upsides of doing this are as follows:

1. Applications requiring a lot of memory will be able to function better if they know to take advantage of this. There would be less swapping to and from disk if they are using user mode memory to cache data.

The general downsides of doing this are as follows:

1. There is not much available kernel mode memory so applications or operations that essentially require a lot of kernel memory will not be able to perform.

2. Applications and drivers that check the high bit (bit 31) and use this to determine kernel mode memory versus user mode memory will not function properly.

Conclusion

In this article, we have learned a bit more about communications with user mode processes. We learned how to implement the ReadFile and DeviceIoControl APIs. We also learned about completing IRPs and returning status to user mode. We also learned about creating IOCTLs, and finally, we saw how memory is mapped in Windows.

In the next article, we may be using this information we just learned to implement something a little more fun, communications between two processes using the driver!

PART 3: INTRODUCTION TO DRIVER CONTEXTS

Introduction

This is the third edition of the Writing Device Drivers articles. The first article helped to simply get you acquainted with device drivers and a simple framework for developing a device driver for NT. The second tutorial attempted to show how to use IOCTLs and display what the memory layout of Windows NT is. In this edition, we will go into the idea of contexts and pools. The driver we write today will also be a little more interesting as it will allow two user mode applications to communicate with each other in a simple manner. We will call this the poor man’s pipes implementation.

What is a Context?

This is a generic question, and if you program in Windows, you should understand the concept. In any case, I will give a brief overview as a refresher. A context is a user-defined data structure (users are developers) which an underlying architecture has no knowledge of what it is. What the architecture does do is pass this context around for the user so in an event driven architecture, you do not need to implement global variables or attempt to determine what object, instance, data structure, etc. the request is being issued for.

In Windows, some examples of using contexts would be SetWindowLong with GWL_USERDATA, EnumWindows, CreateThread , etc. These all allow you to pass in contexts which your application can use to distinguish and implement multiple instances of functions using only one implementation of the function.

Device Context

If you recall, in the first article, we learned how to create a device object for our driver. The driver object contains information related to the physical instance of this driver in memory. There is obviously only one per driver binary and it contains things such as the function entry points for this binary. There can be multiple devices associated with the same binary as we know we can simply call IoCreateDevice to create any number of devices that are handled by a single driver object. This is the reason that all entry points send in a device object instead of a driver object, so you can determine which device the function is being invoked for. The device objects point back to the driver object so you can still relate back to it.

NtStatus = IoCreateDevice(pDriverObject, sizeof (EXAMPLE_DEVICE_CONTEXT), &usDriverName, FILE_DEVICE_UNKN OWN, FILE_DEVICE_SECURE_OPEN, FALSE, &pDeviceObject); ... /* * Per-Device Context, User Defined */ pExampleDeviceContext = (PEXAMPLE_DEVICE_CONTEXT)pDeviceObject->D eviceExtension; KeInitializeMutex(&pExampleDeviceContext->kList Mutex, 0); pExampleDeviceContext->pExampleList = NULL;

The IoCreateDevice function contains a parameter for the size of a Device Extension. This can then be used to create the device extension member of the device object and this represents the user defined context.

You can then create your own data structure to be passed around and used with each device object. If you define multiple devices for a single driver, you may want to have a single shared member among all your device contexts as the first member so you can quickly determine which device this function is being invoked for. The device represents the \Device\Name.

The context will generally contain any type of list which would need to be searched for this device, or attributes and locks for this device. An example of data which would be global per device would be free space on a disk drive. If you have three devices which each representing a particular disk drive image, the attributes which are particular for a certain device would be global for each instance of a device. As mentioned, the volume name, free space, used space, etc. would be per-device but global for all instances of the device.

Resource Context

This is something new but you can open a device with a longer string to specify a certain resource managed by the device itself. In the case of the file system, you would actually be specifying a file name and file path. As an example, the device can actually be opened by using \Device\A\Program Files\myfile.txt. Then the driver may want to allocate a context which is global for all processes who open this particular resource. In the example of the file system, items which may be global for an instance of a file could be certain cached items such as the file size, file attributes, etc. These would be unique per-file but shared among all instance handles to this file.

{ /* * We want to use the unicode string that w as used to open the driver to * identify "pipe contexts" and match up ap plications that open the same name * we do this by keeping a global list of a ll open instances in the device * extension context. * We then use reference counting so we onl y remove an instance from the list * after all instances have been deleted. We also put this in the FsContext * of the IRP so all IRP's will be returned with this so we can easily use * the context without searching for it. */ if(RtlCompareUnicodeString(&pExampleList-> usPipeName, &pFileObject->F ileName, TRUE) == 0) { bNeedsToCreate = FALSE; pExampleList->uiRefCount++; pFileObject->FsContext = (PVOID)pExamp leList;

Instance Context

This is the most unique context that you may want to create. It is unique for every single handle created on the system. So if process 1 and process 2 both open a new handle to the same file, while their resource context may be the same their instance context will be unique. A simple example of items which may be unique for each instance could be the file pointer. While both processes have opened the same file, they may not be reading the file from the same location. That means that each open instance handle to the file must maintain its own context data that remembers the location of the file currently being read by each particular handle.

Instance contexts and any context can always have pointers back to resource contexts and device contexts just as the device object has a pointer back to the driver object. These can be used where necessary to avoid needing to use look up tables and search lists for the appropriate context.

The Big Picture

The following diagram outlines the big picture and relationships just described above. This should help you to visualize how you may want to structure relationships within your driver. The context relationship can be structured any way you want, this is just an example. You can even create contexts outside of the three mentioned here that have their own scopes you defined.

Our Implementation

The implementation that will be used in our driver is to have a device context and a resource context. We do not need instance contexts for what we are doing.

We first create a device extension using IoCreateDevice . This data structure will be used to maintain the list of resource contexts so all calls to Create can then be associated with the proper resource context.

The second implementation we have is to simply create resource contexts. We first attempt to search the list on Create to determine if the resource already exists. If it does, we will simply increment the reference counter and associate it with that handle instance. If it does not, we simply create a new one and add it to the list.

The Close has the opposite operation. We will simply decrement the reference count and if it reaches 0, we then remove the resource context from the list and delete it.

The IRPs IO_STACK_LOCATION (if it) provides a pointer to a FILE_OBJECT which we can use as a handle instance. It contains two fields we can use to store contexts and we simply use one of them to store our resource context. We could also use these to store our instance contexts if we choose to. Certain drivers may have rules and be using this for different things, but we are developing this driver outside of any framework and there are no other drivers to communicate with. This means we are free to do whatever we want but if you choose to implement a driver of a particular class, you may want to make sure what is available to you.

To associate resources, we simply use the name of the device string being passed in. We now append a new string onto the end of our device name to create different resources. If two applications then open the same resource string, they will be associated and share the same resource context. This resource context we have created simply maintains its own locking and a circular buffer. This circular buffer, residing in kernel memory, is accessible from any process. Thus, we can copy memory from one process and give it to another.

Memory Pools

In this driver, we finally start to allocate memory. In the driver, allocations are called pools and you allocate memory from a particular pool. In user mode, you allocate memory from the heap. In this manner, they are essentially the same. There is a manager which keeps track of these allocations and provides you with the memory. In user mode, however, while there can be multiple heaps, they are essentially the same type of memory. Also, in user mode, each set of heaps used by a process is only accessible by that process. Two processes do not share the same heap.

pExampleList = (PEXAMPLE_LIST)ExAllocatePoolWit hTag(NonPagedPool, sizeof(EXAMPLE_LIST), EXAMPLE_POOL_TAG); if(pExampleList) {

In the kernel, things change a little bit. There are essentially two basic types of pools, paged and non-paged. The paged pool is essentially memory that can be paged out to disk and should only be used at IRQL < DISPATCH_LEVEL as explained in the first tutorial. Non-paged memory is different; you can access it anywhere at anytime because it’s never paged out to disk. There are things to be aware of, though you dont want to consume too much non-paged pool for obvious reasons, you start to run out of physical memory.

The pools are also shared between all drivers. That being the case, there is something that you can do to help debug pool issues and that is specifying a pool tag. This is a four byte identifier which is put in the pool header of your allocated memory. That way, if say, you overwrote your memory boundary, then all the

sudden the file system driver crashes, you can look at the memory in the pool before the memory being accessed is invalid and notice that your driver possibly corrupted the next pool entry. This is the same concept as in user mode and you can even enable heap tagging there as well. You generally want to think of some unique name to identify your driver’s memory. This string is also usually written backwards so its displayed forwards when using the debugger. Since the debugger will dump the memory in DWORDs, the high memory will be displayed first.

In our driver, we allocate from the non-paged pool simply because we have a KMUTEX inside the data structure. We could have allocated this separately and maintained a pointer here, but for simplicity, we simply have one allocation. KMUTEX objects must be in non-paged memory.

Kernel Mutexes

In this article, we start to get into creating objects you may already be familiar with in user mode. The mutex is actually the same in the kernel as it was when you used it in user mode. In fact, each process actually has what is called a handle table which is simply a mapping between user mode handles and kernel objects. When you create a mutex in user mode, you actually get a mutex object created in the kernel and this is exactly what we are creating today.

The one difference we need to establish is that the mutex handle we create in the kernel is actually a data structure used by the kernel and it must be in non-paged memory. The parameters to wait on a mutex are a little more complicated than we are used to however.

The MSDN documentation for KeWaitForMutexObject can be found by clicking the link. The documentation does mention that this is simply a macro that is really KeWaitForSingleObject .

So, what do the parameters mean? These options are explained at MSDN, however here is essentially a summary.

The first one is obvious, it’s the actual mutex object. The second parameter is a little stranger, its either UserRequest or Executive. The UserRequest essentially means the wait is waiting for the user and Executive means the wait is waiting for the scheduler. This is simply an information field, and if a process queries for the reason why this thread is waiting, this is what is returned. It doesnt actually affect what the API does.

The next set of options specifies the wait mode. KernelMode or UserMode are your options. Drivers will essentially use KernelMode in this parameter. If you do your wait in UserMode , your stack could be paged out so you would be unable to pass parameters on the stack.

The third parameter is Alertable and this specifies if the thread is alertable while it waits. If this is true, then APCs can be delivered and the wait interrupted. The API will return with an APC status.

The last parameter is the timeout and it is a LARGE_INTEGER. If you wanted to set a wait, the code would be the following:

LARGE_INTEGER TimeOut; TimeOut.QuadPart = 10000000L; TimeOut.QuadPart *= NumberOfSeconds; TimeOut.QuartPart = -(TimeOut.QuartPart);

The timeout value is relative time so it must be negative.

Our implementation attempts a simple approach and specifies KernelMode , non-alterable, and no timeout.

NtStatus = KeWaitForMutexObject(&pExampleDevice Context->kListMutex, Executive, Kerne lMode, FALSE, NULL); if(NT_SUCCESS(NtStatus)) {

You can find detailed information about how mutexes work, at this location in MSDN.

Poor Man’s Pipes Implementation

The project represents a very simple implementation. In this section, we will evaluate how the driver operates and some things to think about on how we could improve the implementation. We will also cover how to use the example driver.

Security

This is very simple, there is none! The driver itself sets absolutely no security so essentially we dont care who we allow to write or read from any buffer. Since we don’t care, this IPC can be used by any processes to communicate regardless of the user or their privilege.

The question then becomes does this really matter? I am not a security expert but to me, it all really depends. If your intention is to allow this to be used by anyone, then you may not want to implement security. If you think the users want to enforce only SYSTEM processes or not allow cross user IPC, then this is something to consider. There are cases for both. The other could possibly be that you don’t care about the user but rather you only wish that only two certain processes can communicate and no others. In this situation, perhaps you want to setup some type of registration or security so that you only allow the appropriate processes to open handles and the application then dictates what security it wants on its pipes. You could also have a model where you dont use names but rather do per-instance handling. In this case, they may be required to duplicate the handles into other processes, for example.

Circular Buffer

The circular buffer is a simple implementation; it never blocks a read or write and will simply ignore extra data. The buffer size is also not configurable so the application is stuck with the value we hard-coded.

Does this need to be the case? Definitely not, as we saw in Part 2, we can create our own IOCTLs to issue requests to the driver. An IOCTL could be implemented to do some configuration with the driver such as how big the buffer should be. The other part could be handling. Some circular buffers actually will start wrapping around and over writing old data with new data. This could be a flag on whether you want it to ignore new data or overwrite existing data with it.

The circular buffer implementation is not driver specific so I will not be going over its implementation in detail.

Graphical Flow of Example

This is a simple illustration of the flow of this example. The CreateFile() API will reference this object using the symbolic linker "Example". The I/O manager will map the DOS device name to the NT Device "\Device\Example" and append any string we put beyond this name (like, "\TestPipe"). We get the IRP created by the device manager and we will first look up using the device string if we already have created a resource context. If yes, we simply use the FileObject of the I/O Stack Location to put our resource context after we add a reference. If not, then we need to create it first.

As a quick reference though, the FILE_OBJECT will actually only contain the extra "\TestPipe". Here is an example:

kd> dt _FILE_OBJECT ff6f3ac0 +0x000 Type : 5 +0x002 Size : 112 +0x004 DeviceObject : 0x80deea48 +0x008 Vpb : (null) +0x00c FsContext : (null) +0x010 FsContext2 : (null) +0x014 SectionObjectPointer : (null) +0x018 PrivateCacheMap : (null) +0x01c FinalStatus : 0 +0x020 RelatedFileObject : (null) +0x024 LockOperation : 0 '' +0x025 DeletePending : 0 '' +0x026 ReadAccess : 0 '' +0x027 WriteAccess : 0 '' +0x028 DeleteAccess : 0 '' +0x029 SharedRead : 0 '' +0x02a SharedWrite : 0 '' +0x02b SharedDelete : 0 '' +0x02c Flags : 2 +0x030 FileName : _UNICODE_STRING "\HELL O" +0x038 CurrentByteOffset : _LARGE_INTEGER 0x0 +0x040 Waiters : 0 +0x044 Busy : 0 +0x048 LastLock : (null) +0x04c Lock : _KEVENT +0x05c Event : _KEVENT +0x06c CompletionContext : (null)

This is a simple illustration of how the ReadFile operation works. Since we associated our own context on the FILE_OBJECT, we do not need to perform look ups and we can simply access the appropriate circular buffer when we do the Read.

This is a simple illustration of how the WrteFile operation works. Since we associated our own context on the FILE_OBJECT, we do not need to perform look ups and we can simply access the appropriate circular buffer when we do the Write.

The close handle, we will simply dereference the resource context. If the context is now 0, we will delete it from the global list. If it is not, then we will simply do nothing more. One thing to remember is that this is a simple illustration and we are actually handling IRP_MJ_CLOSE and not IRP_MJ_CLEANUP. This code could has been put into either one since what we are doing does not interact with the user mode application. However, if we were freeing resources that should be done in the context of the application, we would need to move this to IRP_MJ_CLEANUP instead. Since IRP_MJ_CLOSE is not guaranteed to run in the context of the process, this illustration is more of how an IRP_MJ_CLEANUP could have occurred.

Although MSDN does state the IRP_MJ_CLOSE is not called in the context of the process, it doesn't mean that this is always true. The below stack trace shows it being called in the context of the application. If you debug and find this and think that you can simply ignore the warning on MSDN, I would think again. There is a reason that it is documented that way even if it does not always behave that way. There is another side of the coin which is, even if it doesn't behave that way, it doesn't mean things can't change in the future since they

are documented that way. This is a general statement that you do not see something behave one way and expect it to always be the case. There is a document on Handling IRPs that describes the behavior of IRP_MJ_CLOSE and IRP_MJ_CLEANUP, at this location.

THREAD ff556020 Cid 0aa4.0b1c Teb: 7ffde000 Win32Thread: 00000000 RUNNING on p rocessor 0 IRP List: ffa1b6b0: (0006,0094) Flags: 00000404 Mdl: 00000000 Not impersonating DeviceMap e13b0d20 Owning Process ff57d5c8 Im age: usedriver3.exe Wait Start TickCount 26769661 Ti cks: 0 Context Switch Count 33 UserTime 00:00:00.0000 KernelTime 00:00:00.0015 Start Address kernel32!BaseProcessStartThun k (0x77e4f35f) *** WARNING: Unable to verify checksum for usedrive r3.exe *** ERROR: Module load completed but symbols could not be loaded for usedriver3.exe Win32 Start Address usedriver3 (0x00401172) Stack Init faa12000 Current faa11c4c Base f aa12000 Limit faa0f000 Call 0 Priority 10 BasePriority 8 PriorityDecremen t 2 ChildEBP RetAddr faa11c70 804e0e0d example!Example_Close (FP O: [2,0,2]) (CONV: stdcall) [.\functi ons.c @ 275] faa11c80 80578ce9 nt!IofCallDriver+0x3f (FP O: [0,0,0]) faa11cb8 8057337c nt!IopDeleteFile+0x138 (F PO: [Non-Fpo]) faa11cd4 804e4499 nt!ObpRemoveObjectRoutine +0xde (FPO: [Non-Fpo]) faa11cf0 8057681a nt!ObfDereferenceObject+0 x4b (FPO: [EBP 0xfaa11d08] [0,0,0]) faa11d08 8057687c nt!ObpCloseHandleTableEnt ry+0x137 (FPO: [Non-Fpo]) faa11d4c 805768c3 nt!ObpCloseHandle+0x80 (F PO: [Non-Fpo]) faa11d58 804e7a8c nt!NtClose+0x17 (FPO: [1, 0,0]) faa11d58 7ffe0304 nt!KiSystemService+0xcb ( FPO: [0,0] TrapFrame @ faa11d64) 0012fe24 77f42397 SharedUserData!SystemCall Stub+0x4 (FPO: [0,0,0]) 0012fe28 77e41cb3 ntdll!ZwClose+0xc (FPO: [ 1,0,0]) 0012fe30 0040110d kernel32!CloseHandle+0x55 (FPO: [1,0,0]) WARNING: Stack unwind information not available. Fo llowing frames may be wrong. 0012ff4c 00401255 usedriver3+0x110d 0012ffc0 77e4f38c usedriver3+0x1255 0012fff0 00000000 kernel32!BaseProcessStart +0x23 (FPO: [Non-Fpo])

Using the Example

The example is split into two new user mode processes, usedriver2 and usedriver3. The userdriver2 will allow you to type in data and it will send it to the driver. The userdriver3 source will allow you to press Enter and it will read data from the driver. Obviously, if it reads multiple strings the way its currently implemented, you will only see the first string displayed.

There is one parameter that needs to be provided and this is the name of the resource to open. This is an arbitrary name that simply allows the driver to tie two handle instances together so multiple applications can share data at the same time! usedriver 2 HELLO usedriver3 HELLO userdriver2 Temp usedriver3 Temp will open \Device\Example\HELLO and \Device\Example\Temp and the appropriate versions will talk to the applications with the same handle. The current implementation creates resources case insensitive. Its very simple to change this, the RtlCompareUnicodeString functions last parameter specifies whether to compare strings case sensitive or case insensitive.

Building the Examples

This is something that I have not gone into in previous articles. The projects included with these articles can be unzipped using the directory structure in the ZIP itself. There are makefiles included in the project so you can simply do “nmake clean” then nmake to build these binaries.

The makefiles may need to be changed to point to the location of your DDK (which you can order from Microsoft for the cost of shipping and handling). These make files point to C:\NTDDK\xxx, you can then just change this to your location. If you do not have make in your path, you may want to make sure that the Visual Studio environment is setup in your command prompt. You go to the binaries directory of Visual Studio and just run VCVARS32.BAT.

There may be an error when it attempts to use rebase. These makefiles were simply copied from other projects so the rebase is actually not necessary. It was actually only being used before to strip out debug symbols. The error can be fixed by either removing the rebase sequence from the makefile or by creating the SYMBOLS directory under the BIN directory. The reason rebase is complaining is simply because the directory does not exist.

Conclusion

In this article, we learned a bit more about user-mode and kernel-mode interactions and how to create a very simple IPC. We learned about creating contexts in device drivers as well as how to allocate memory and use synchronization objects in the kernel.

PART 4: INTRODUCTION TO DEVICE STACKS

Introduction

This is the fourth edition of the Writing Device Drivers articles. This article will introduce the idea of device stacks and how devices interact with each other. We will use the previously created example device driver to demonstrate this topic. To do this we will introduce the idea of a filter driver in which we will create to attach to our own drivers device stack.

What is a Device Stack?

A stack is general terminology that can be envisioned as a pile of objects that just sit on top of each other. There is also an algorithm implementation that defines a stack as a method to store temporary objects in which the last object in is the first object out (also known as LIFO). Both descriptions are related. However, a device stack is not an algorithm nor does it have anything to do with temporary objects. Thus the simple description of a pile of objects that simply sit on top of each other is more related.

The best example of a device stack would be in relation to a stack of plates. The plates sit on top of each other just like a stack of devices. The other detail to remember is that we say device stack not driver stack. In the third tutorial we remember that, a single driver can actually implement multiple devices. This means that

a stack of devices could all be implemented in a single physical driver. This article and many others however do refer to device and driver interchangeably even though they are basically separate but related entities.

Filter Drivers

This is a very commonly used buzz word and I’m sure just about anyone who programs has heard of this. A filter driver is a driver that attaches to the top of a stack of devices in an effort of filter processing of requests to a device before they reach the device.

You may assume that all devices in a device stack are filters except for the last one but this is not the case. The devices in a device stack aside from filters generally depend on the architecture of that particular device. For example, you usually have higher level drivers that are near the top of the stack. In the most general case these higher level drivers communicate and interact with user mode requests. The devices in the stack start to break down the request for the next level device until the last device in the chain processes the request. Near the bottom of the device stack lie the lower level drivers like miniport drivers which may communicate to actual hardware for example.

The best example could be that of the file system. The higher level drivers maintain the notion of files and file system. They understand where the files are stored on the disk perhaps. The lower level drivers know nothing of files and simply understand requests to read sectors on a disk. They also understand how to queue these requests and optimize disk seeks but they have no knowledge of what is actually on the disk or how to interpret the data.

Every filter device that attaches to a device stack is put at the top. This means that, if another filter device attaches to the device stack after yours then it is now on top of you. You are never guaranteed to be at the top of the stack.

To attach to a device stack we will be using the following API implementation.

RtlInitUnicodeString(&usDeviceToFilter, L"\\Device\ \Example"); NtStatus = IoAttachDevice(pDeviceObject, &usDeviceToFilter, &pExampleFilterDeviceContext ->pNextDeviceInChain);

This API will actually open a handle to the device in order to attach and then close the handle. When this API attempts to close the handle our driver will be attached to the device stack so we must ensure that the IRP_MJ_CLEANUP and IRP_MJ_CLOSE can be correctly handled and do not cause a problem since they will be called!

There are a few other APIs one is called IoAttachDeviceToStack . This is actually what IoAttachDevice calls after opening a handle to the device.

IRP Handling

The next thing we need to talk about further is IRP handling. The IRP is created and sent to the first device in the device stack. This device can then process the IRP and complete it or pass it down to the next device in the stack. The general rules of an IRP are that when you receive the IRP you own it. If you then pass it down to the next device you no longer own it and can no longer access it. The last device to process the IRP must complete it.

In this example we will be creating IRPs simply for demonstration purposes. The demonstration will be quite simple and we will be sending IRPs to our own driver. There are some aspects of our implementation here which are omitted in our implementation and things done in a non-standard fashion simply because we control all end points. This is a demonstration and very simple. Owning all end points allows us to be more flexible in what we actually implement since we are in total control and can ensure that nothing goes wrong.

There are a number of simple steps that need to be followed when creating an IRP. Depending on the handling of the IRP these can vary a little however we will be going over a very simple case step by step.

Step One: Create the IRP

This is the obvious first step we need to create an IRP. This is very simple you can simply use a function named IoAllocateIrp . The following is a simple code example using the API.

MyIrp = IoAllocateIrp(pFileObject->DeviceObject->St ackSize, FALSE);

There are other APIs and macros which can also create an IRP for you. These are quicker ways to help create the IRP and set the parameters. The one thing to watch out for is to make sure that the function you use to create the IRP is able to be called at the IRQL level you will be using. The other part to check is, who is allowed to free the IRP. If the I/O Manager will manage and free the IRP or if you have to do it yourself.

The following is an example of one that sets parameters for us.

MyIrp = IoBuildAsynchronousFsdRequest(IRP_MJ_INTERN AL_DEVICE_CONTROL, pTopOfStackDe vice, NULL, 0, &StartOffset, &StatusBlock) ;

Step Two: Set the Parameters

This step depends on what functionality you want to do. You would need to setup the FILE_OBJECT, and the IO_STACK_PARAMETER and everything else. In our example we cheat. We dont provide a FILE_OBJECT and we set minimal parameters. Why? Well, this is just a simple example and we own all end points. Since we are in control of all end points we can essentially do whatever we want with the parameters. However, if you read up on IRP_MJ_xxx and the specific functionality for that driver, such as IOCTL, you will know what you need to set when sending IRPs around. We actually should comply with these mandates as well so other drivers could talk to us but I attempted to just keep this example very simple.

The following code is how we set our IRP parameters.

PIO_STACK_LOCATION pMyIoStackLocation = IoGetNextIr pStackLocation(MyIrp); pMyIoStackLocation->MajorFunction = IRP_MJ_INTERNAL _DEVICE_CONTROL; pMyIoStackLocation->Parameters.DeviceIoControl.IoCo ntrolCode = IOCTL_CREATE_NEW_R ESOURCE_CONTEXT; /* * METHOD_BUFFERED *

* Input Buffer = Irp->AssociatedIrp.SystemBuffe r * Ouput Buffer = Irp->AssociatedIrp.SystemBuffe r * * Input Size = Parameters.DeviceIoControl.In putBufferLength * Output Size = Parameters.DeviceIoControl.Ou tputBufferLength * * Since we are now doing the same job as the I/ O Manager, * to follow the rules our IOCTL specified METHO D_BUFFERED */ pMyIoStackLocation->Parameters.DeviceIoControl.Inp utBufferLength = sizeof(FILE_OBJECT); pMyIoStackLocation->Parameters.DeviceIoControl.Out putBufferLength = 0; /* * This is not really how you use IOCTL's but * this is simply an example using * an existing implementation. * We will simply set our File Object as the System Buffer. * Then the IOCTL handler will * know it's a pFileObject and implement the code t hat we * had here previously. */ MyIrp->AssociatedIrp.SystemBuffer = pFileObject; MyIrp->MdlAddress = NULL;

As you notice, we set the SystemBuffer to point to our File Object. This is not exactly how we really should have done this. We should have allocated a buffer and copied the data there. That way we could safely have the I/O Manager free the buffer or we could have freed the buffer when we destroy the IRP. Instead though, we did this quick example and we simply don’t allow the I/O Manager to free the IRP and we don’t free the SystemBuffer obviously.

Step Three: Send the IRP down

You need to send the IRP down to the driver. To do this, you simply specify the DEVICE_OBJECT and the IRP in the IoCallDriver API. You can essentially use whatever DEVICE_OBJECT you have. However, if you want to start at the top of the device stack, its best to find the top level device object using APIs such as IoGetRelatedDeviceObject . In our example, we have one that does the call to get the top level device and one that simply uses the Device Object we already have. If you read the debug output, you will notice that in the one we don’t go through the filter driver. This is because IoCallDriver is very simple. It just takes the Device Object and finds the appropriate function to call.

NtStatus = IoCallDriver(pFileObject->DeviceObject, MyIrp);

Step Four: Process & Clean up the IRP

The one thing we did before we sent the IRP down was to create a Completion Routine. This is a routine that will get notified when the IRP has been completed. We can do a few things in this case, we can allow the IRP to continue so we can do processing on its parameters or we can destroy it. We can also let the I/O Manager free it. This is actually dependent on how you created the IRP. To answer the question of "who should free it", you should read the DDK documentation on the API you used to allocate it. Implementing the wrong method can lead to disaster!

This is a simple example and we simply free it ourselves.

IoSetCompletionRoutine(MyIrp, Example_SampleComplet ionRoutine, NULL, TR UE, TRUE, TRUE); ... NTSTATUS Example_SampleCompletionRoutine(PDEVICE_OB JECT DeviceObject, PIRP Irp, PVOID Con text) { DbgPrint("Example_SampleCompletionRoutine \n"); IoFreeIrp(Irp); return STATUS_MORE_PROCESSING_REQUIRED; }

You may notice that, sometimes you see code that checks the STATUS_PENDING and may wait on an event. In our case we own all end points and this will not happen in this simple example. This is why some of these details are simply being omitted for simplicity. In the next articles, we will expand on these ideas and fill in the missing pieces. Its important to just digest one piece at a time.

Handling IRPs in your driver

Once you get an IRP, you own that IRP. You can do whatever you want with it. If you process it you must then either complete it when you are done or pass it down to another driver. If you pass it down to another driver you must forget about it. The driver you passed it to is now responsible for completing it.

The example filter driver we have implemented though is a bit different. It wants to process the parameters after we have provided the example driver with the IRP. To do this we must catch the completion and stop it from being completed. This is because we know the lower level driver should and will complete it. So, by setting our own completion routine, we can stop this. This is done with the following code.

pIoStackIrp = IoGetCurrentIrpStackLocation(Irp); IoCopyCurrentIrpStackLocationToNext(Irp); IoSetCompletionRoutine(Irp, PIO_COMPLETION_ROUTINE) ExampleFilter_Complet ionRoutine, NULL, TRUE, TRUE, TRUE); /* * IoCallDriver() simply calls the * appropriate entry point in the driver object ass ociated * with the device object. This is * how drivers are basically "chained" together, th ey must know * that there are lower driver so they * can perform the appropriate action and send down the IRP. * * They do not have to send the IRP down * they could simply process it completely themselv es if they wish. */ NtStatus = IoCallDriver( pExampleFilterDeviceContext->pNextDeviceI nChain, Irp); /* * Please note that our * implementation here is a simple one. We do not take into account * PENDING IRP's oranything complicated. We assume that once we get * to this locaiton the IRP has alreadybeen complet ed and our completetion

* routine was called or it wasn't completed and we are still able * to complete it here. * Our completetion routine makes sure that the IRP is still valid here. * */ if(NT_SUCCESS(NtStatus) { /* * Data was read? */ if(Irp->IoStatus.Information) { /* * Our filter device is dependent upon the co mpliation settings of * how we compiled example.sys * That means we need to dynamically figure o ut if we're * using Direct, Buffered or Neither. */ if(DeviceObject->Flags & DO_BUFFERED_IO) { DbgPrint("ExampleFilter_Read - Use Buffere d I/O \r\n"); /* * Implementation for Buffered I/O */ pReadDataBuffer = (PCHAR)Irp->AssociatedIr p.SystemBuffer; if(pReadDataBuffer && pIoStackIrp->Parameters.Read.Length > 0) { ExampleFilter_FixNullString(pReadDataB uffer, (UINT)Irp->IoStatus.Info rmation); } } else { if(DeviceObject->Flags & DO_DIRECT_IO) { DbgPrint("ExampleFilter_Read - Use Di rect I/O \r\n"); /* * Implementation for Direct I/O */ if(pIoStackIrp && Irp->MdlAddress) { pReadDataBuffer = MmGetSystemAddre ssForMdlSafe( Irp->MdlAddress, NormalPag ePriority); if(pReadDataBuffer && pIoStackIrp->Parameters.Read.L ength) { ExampleFilter_FixNullString(pR eadDataBuffer, (UINT)Irp->IoS tatus.Information); } } } else { DbgPrint("ExampleFilter_Read - Use Ne ither I/O \r\n");

/* Implementation for Neither I/O */ __try { if(pIoStackIrp->Parameters.Re ad.Length > 0 & & Irp->UserBuffer) { ProbeForWrite(Irp->UserBuff er, IoStackIrp->Parameters .Read.Length, TYPE_ALIGNMENT(char)); pReadDataBuffer = Irp->User Buffer; ExampleFilter_FixNullString (pReadDataBuffer, (UINT)Irp->I oStatus.Information); } } __except( EXCEPTION_EXECUTE_HANDLE R ) { NtStatus = GetExceptionCode(); } } } } } /* * Complete the IRP * */ Irp->IoStatus.Status = NtStatus; IoCompleteRequest(Irp, IO_NO_INCREMENT); .... NTSTATUS ExampleFilter_CompletionRoutine( PDEVICE_OBJECT DeviceObject, PIRP Irp, P VOID Context) { DbgPrint("ExampleFilter_CompletionRoutine Called \ r\n"); /* * We need to return * "STATUS_MORE_PROCESSING_REQUIRED" so that we ca n * use the IRP in our driver.If we complete this h ere we * would not be able to use it and the IRP would b e completed. This * also means that our driver * must also complete the IRP since it has not bee n completed yet. */ return STATUS_MORE_PROCESSING_REQUIRED; }

The IRP will then not be completed because we returned to the I/O Manager that more processing needs to be done. Now we can manipulate the IRP after the IoCallDriver , however we must now complete it when we are done. This is because we stopped the completion of the IRP. Remember our example does not take into account STATUS_PENDING because we own all end points and we are trying to keep this example as simple as possible.

The Filter Example

The example filter driver in this article attaches itself to the drivers stack that we created in article 3. If you remember that implementation, we were able to communicate between two user mode applications. One problem with doing this is that, if you typed in a number of strings, the user mode application only prints one string while it may have read three. This could have been fixed in the user mode application easily however how much fun would that be?

Instead we have created a filter driver that simply intercepts the IRP after the read and manipulates the IRP return parameters. It removes all the NULL terminators from the string and replaces them with spaces. It then simply NULL terminates the end of the string. Its not a perfect example obviously, since we overwrite the last character and dont attempt to even see if we need to, but this is just a simple example.

These examples just do the minimum necessary, so they work and try not to trap (in the simplest case). I would rather provide some explanation with a simple example than a full fledged example with all the bells and whistles. Those can already be found in the DDK and long articles on MSDN which explain everything all at once.

Using the example

To use the example you simply do the same as you did with article 3. The only difference is that, there is now another loader program that you can run after you have already loaded example.sys. This one will load examplefilter.sys and it will attach to example.sys. The user mode programs can run with or without examplefilter.sys. You can run it both ways and see the differences. Entry points all have debug statements so you can follow the code paths.

Conclusion

In this article we learned a little more about IRP handling (for the purpose of understanding device stacks) and device stacks. We also learned how to implement a very simple filter driver. In each article we will attempt to build upon these basic ideas, so that we can further understand how drivers work and how to develop drivers.

The next article in the series will attempt to combine everything learned over these 4 articles and further explain IRP Handling.

PART 5: INTRODUCTION TO THE TRANSPORT DEVICE INTERF ACE

Introduction

Welcome to the fifth installment of the driver development series. The title of this article is a little bit misleading. Yes, we will be writing a TDI Client for demonstration purposes however that is not the main goal of this tutorial. The main goal of this tutorial is to further explore how to handle and interact with IRPs. This tutorial will explore how to queue and handle the canceling of IRPs. The real title of this article should be "Introduction to IRP Handling" however it's not as catchy a title! Also, it's not a complete fib we will be doing this while demonstration implementing a TDI Client driver. So, I actually have to explain how that part is implemented as well. The supplied example is a very simple client/server chat program which we will be using to explore how to handle IRPs.

Sockets Refresher

We will first be starting off with something that you should probably already know. If you don't know you may want to read some other articles on the subject. Even so, I have supplied this quick refresher course as well as example source of how to implement winsock.

What is IP?

IP or "Internet Protocol" is essentially a protocol used to send data or packets between two computers. This protocol does not need any setup and only requires that, each machine on the network have a unique "IP Address". The "IP Address" can then be used to route packets between communication end points. This protocol provides routing but it does not provide reliability. Packets sent only by IP can arrive corrupted, out of order or not at all. There are however other protocols implemented on top of IP which provide these features. The "IP" Protocol lies at the Network Layer in the OSI model.

What is TCP?

TCP is known as "Transmission Control Protocol" and it sits on top of the "IP" protocol. This is also commonly referred to as "TCP/IP". The "IP" layer provides the routing and the "TCP" layer reliable, sequenced uncorrupted delivery of data. To distinguish between multiple TCP transmissions on the machine they are identified by a unique TCP port number. In this manner multiple applications or even the same application can open a communications pipeline and the underlying transport will be able to correctly route the data between each end point. The "TCP" protocol lies at the Transport in the OSI model. There are other protocols which then sit on top of TCP such as FTP, HTTP, etc. These protocols sit at the "Application Layer" of the OSI model.

Protocol Layering

In some sense any part of the communications stack can be replaced by an "equivalent" protocol. If FTP for example requires reliable transport and routing, then sitting on top of any protocol which provides this would still work. In that example if an application was using "SPX" instead of "TCP/IP" it shouldn't make a difference. In that sense if "TCP" or some implementation of "TCP" sat on top of an unreliable protocol like "IPX", it should work. The reason for "some implementation" should work is because, it obviously depends on how dependent the upper protocol is on the actual implementation and inner workings of the underlying protocol they are.

What are sockets?

A "socket" is generally referred to as a communications end point as implemented by a "sockets" library. The "sockets" library API was generally written to be a simple way (and portable in some cases) to implement networking applications from user mode. There are a few flavors of socket APIs but in Windows we use "WINSOCK". There are aspects of Winsock which can be implemented as portable (I once implemented a winsock application that was compiled on both Unix and Windows NT with minimal conflict but of course it was a very simple program) and there are others which are not directly portable.

Socket Server Application

The server side of a socket connection simply accepts incoming connections. Each new connection is given a separate handle so that the server can then communicate to each client individually. The following outlines the steps used in communications.

Step One: Create a Socket

The first step is to create a socket. The following code shows how to create a socket for streaming (TCP/IP).

hSocket = socket(PF_INET, SOCK_STREAM, 0); if(hSocket == INVALID_SOCKET) { /* Error */ }

This is then simply a handle to the network driver. You use this handle in other calls to the socket API.

Step Two: Bind the Socket

The second step is to bind a socket to a TCP port and IP Address. The following code demonstrates this behavior. The socket is created in our example simply using a number, however in general you should use macros to put the port into network byte order.

SockAddr.sin_family = PF_INET; SockAddr.sin_port = htons(4000); /* Must be in NETWORK BYTE ORDER */ /* * BIND the Socket to a Port */ uiErrorStatus = bind(hSocket, (struct sockaddr *)&SockAddr, sizeof(SOCKADDR_IN)); if(uiErrorStatus == INVALID_SOCKET) { /* Error */ }

This operation binds the socket handle with the port address. You can specify the IP Address as well however using "0" simply allows the driver to bind to any IP Address (the local one). You can also specify "0" for the port address to bind to a random port. However servers generally use a fixed port number since the clients still need to find them but there are exceptions.

Step Three: Listen on the Socket

This will put the socket into a listening state. The socket will be able to listen for connections after this call. The number specified is simply the back log of connections waiting to be accepted that this socket will allow.

if(listen(hSocket, 5) != 0) { /* Error */ }

Step Four: Accept Connections

The accept API will provide you with a new handle for each incoming connection. The following is a code example of using accept.

if((hNewClient = accept(pServerInfo->hServerSocket, (struct sockaddr *)&NewClientSockAddr, &uiLeng th)) != INVALID_SOCKET) {

The returned handle can then be used to send and receive data.

Step Five: Close the Socket

When you are done you need to close any and all handles just like anything else!

closesocket(hNewClient);

There is one extra detail omitted here about the select API being used to get notifications when a connection comes and when data is available. This is simply a refresher for further details you should consult a sockets tutorial or API reference like MSDN.

Socket Client Application

The client side of a sockets communications simply connects to a server and then sends/receives data. The following steps break down how to setup this communications.

Step One: Create a Socket

The first step is to create a socket. The following code shows how to create a socket for streaming (TCP/IP).

hSocket = socket(PF_INET, SOCK_STREAM, 0); if(hSocket == INVALID_SOCKET) { /* Error */ }

This is then simply a handle to the network driver. You use this handle in other calls to the socket API.

Step Two: Connect to a Server

You need to setup the address and port of the server to connect to and they must be in network byte order. You will then call the connect API to establish a connection between the client and server.

pClientConnectInfo->SockHostAddress.sin_family = PF _INET; pClientConnectInfo->SockHostAddress.sin_port = htons(4000); /* Network Byte Order! */ printf("Enter Host IP Address like: 127.0.0.1\n"); fgets(szHostName, 100, stdin); pClientConnectInfo->SockHostAddress.sin_addr.s_addr = inet_addr(szHostName); /* Network Byte Order! */ iRetVal = connect(hSocket, (LPSOCKADDR)&pClientConnectInfo- >SockHostAddress, size of(SOCKADDR_IN)); if(iRetVal == INVALID_SOCKET) { /* Error */ }

Step Three: Send and Receive Data

Once you are connected, you just need to send and receive data whenever you want, using the recv and send APIs.

iRetVal = send(hSocket, szBuffer, strlen(szBuffer), 0); if(iRetVal == SOCKET_ERROR) { /* Error */ } ... iRetVal = recv(hSocket, szBuffer, 1000, 0); if(iRetVal == 0 || iRetVal == SOCKET_ERROR) { /* Error */ }

Please note that these examples may refer to sending and receiving strings, however any binary data can be sent.

Step Four: Close the Socket

When you are done you need to close any and all handles just like anything else!

closesocket(hSocket);

There is one extra detail omitted here about the select API used to get notifications when data is available. This is simply a refresher and a lot of details of sockets have been omitted and so for further details you should consult a sockets tutorial or API reference like MSDN.

Transport Device Interface

The sockets primer was really to get you ready for the TDI API. The "Transport Device Interface" is a set of APIs which can be used by a driver to communicate with a Transport (Protocol) Driver such as TCP. The

TCP driver would implement this API set so that your driver can communicate to it. This is a little more complex than using sockets and the documentation on MSDN can be more confusing than helpful. So we will go over all the steps needed to make a client side connection. Once you understand this, you should be able to use the API to perform other operations such as creating a server for example.

The Architecture

The following diagram outlines the TDI/NDIS relationship. In general, TDI is a standard interface in which transport/protocol driver developers can implement in their drivers. In this manner developers that wish to use their protocol can implement a standard interface without the hassle of implementing separate interfaces for each protocol they wish to support. This does not mean that those developers are limited to only implementing TDI. They can also implement any proprietary interface that they wish on the top level of their driver. I am not an expert in NDIS, so I will leave these as simple explanations, so I hopefully won't get anything wrong! These are just "good to know" type information anyway and we don't need to understand any of these to use the TDI Client Driver.

The Protocol drivers will talk to the NDIS interface API on the lower end of the driver. The job of the protocol driver is just that, to implement a protocol and talk with NDIS. The upper layer of the driver can be a proprietary interface or TDI or both. By the way, these are NOT "NDIS Clients". They do not exist. There are websites out there that have referred to these drivers as "NDIS Clients" and that's completely wrong. I once asked an NDIS expert about "NDIS Clients" and they didn't know what I was talking about!

The next layer are the intermediate level drivers. These drivers can do translations, packet scheduling or filtering of data.

The final layer is the NDIS miniport drivers. This essentially talks with the physical NIC device.

You can find more information on the TDI and NDIS architectures on MSDN.

Step One: Open a Transport Address

The first step is to create a handle to a "Transport Address". This will require you to use ZwCreateFile to create a handle of an instance to a "Transport Address". The "Transport Address" is the IP Address of the LOCAL MACHINE. This is NOT THE REMOTE MACHINE! The reasoning behind letting you bind to a specific IP address is in the instance where multiple IP Addresses are associated with the local machine for example when there are multiple NICs installed. You can also simply specify "0.0.0.0" to grab any random NIC.

The method of opening this handle is a little obscure for those who are not used to developing drivers. You have to specify the "EA" or "Extedned Attributes" which are then passed to the driver via IRP_MJ_CREATE! Yes, it is possible to pass parameters into the open aside from adding to the end of the DOS Device Name (As we did in the previous article). You are also able to specify the local port at this time. If you are creating a server this would then be the time to specify the port. Since we are only implementing a client connection we don't care about the port so it's left at 0.

The following code illustrates how to open a Transport Address.

NTSTATUS TdiFuncs_OpenTransportAddress(PHANDLE pTdi Handle, PFILE_OBJECT *pFileObject) { NTSTATUS NtStatus = STATUS_INSUFFICIENT_RESOURC ES; UNICODE_STRING usTdiDriverNameString; OBJECT_ATTRIBUTES oaTdiDriverNameAttributes; IO_STATUS_BLOCK IoStatusBlock; char DataBlob[sizeof(FILE_FULL_EA_INFORMATION) + TDI_TRANSPORT_ADDRESS_LEN GTH + 300] = {0}; PFILE_FULL_EA_INFORMATION pExtendedAttributesIn formation = (PFILE_FULL_EA_INFORMATION)&DataBlob; UINT dwEASize = 0; PTRANSPORT_ADDRESS pTransportAddress = NULL; PTDI_ADDRESS_IP pTdiAddressIp = NULL; /* * Initialize the name of the device to be open ed. ZwCreateFile takes an * OBJECT_ATTRIBUTES structure as the name of t he device to open. * This is then a two step process. * * 1 - Create a UNICODE_STRING data structure from a unicode string. * 2 - Create a OBJECT_ATTRIBUTES data structu re from a UNICODE_STRING. * */ RtlInitUnicodeString(&usTdiDriverNameString, L" \\Device\\Tcp"); InitializeObjectAttributes(&oaTdiDriverNameAttr ibutes, &usTdiDriverNameString, OBJ_CASE_INSENSITIVE | OBJ_KERNEL_HANDLE, NULL, NULL); /* * The second step is to initialize the Extende d Attributes data structure. * * EaName = TdiTransportAddress, 0, TR ANSPORT_ADDRESS * EaNameLength = Length of TdiTransportAddre ss * EaValueLength = Length of TRANSPORT_ADDRESS */ RtlCopyMemory(&pExtendedAttributesInformation- >EaName,

TdiTransportAddress, TDI_TRANSPORT_ADDRESS_LENGTH); pExtendedAttributesInformation->EaNameLength = TDI_TRANSPORT_ADDRESS_L ENGTH; pExtendedAttributesInformation->EaValueLength = TDI_TRANSPORT_ADDRESS_L ENGTH + sizeof(TRANSPORT_ADDRESS) + sizeof(TDI_ADDRESS_IP); pTransportAddress = (PTRANSPORT_ADDRESS)(&pExtendedAttributesIn formation->EaName + TDI_TRANSPORT_ADDRESS_LENGTH + 1); /* * The number of transport addresses */ pTransportAddress->TAAddressCount = 1; /* * This next piece will essentially describe what * the tran sport being opened is. * AddressType = Type of transport * AddressLength = Length of the address * Address = A data structure that is essenti ally * related to the cho sen AddressType. */ pTransportAddress->Address[0].AddressType = TDI_ADDRESS_TYPE_IP; pTransportAddress->Address[0].AddressLength = sizeof(TDI_ADDRESS_IP); pTdiAddressIp = (TDI_ADDRESS_IP *)&pTransportAddress->Add ress[0].Address; /* * The TDI_ADDRESS_IP data structure is essent ially simmilar to * the usermode sockets data structure. * sin_port * sin_zero * in_addr * *NOTE: This is the _LOCAL ADDRESS OF THE CURR ENT MACHINE_ Just as with * sockets, if you don't care what port yo u bind this connection to t * hen just use "0". If you also only have one network card interface, * there's no reason to set the IP. "0.0.0 .0" will simply use the * current machine's IP. If you have mult iple NIC's or a reason to * specify the local IP address then you m ust set TDI_ADDRESS_IP * to that IP. If you are creating a serv er side component you may * want to specify the port, however usual ly to connectto another * server you really don't care what port the client is opening. */ RtlZeroMemory(pTdiAddressIp, sizeof(TDI_ADDRES S_IP)); dwEASize = sizeof(DataBlob); NtStatus = ZwCreateFile(pTdiHandle, FILE_READ_ EA | FILE_WRITE_EA, &oaTdiDriverNameAttributes, &IoStatusBlock, NULL, FILE_ATTRIBUTE_NORMAL , 0, FILE_OPEN_IF, 0, pExtendedAttributesInformation, dwEASize); if(NT_SUCCESS(NtStatus)) {

NtStatus = ObReferenceObjectByHandle(*pTd iHandle, GENERIC_READ | GENERIC_WRIT E, NULL, KernelMode, (PVOID *)pFileObject, NULL) ; if(!NT_SUCCESS(NtStatus)) { ZwClose(*pTdiHandle); } } return NtStatus; }

This is described on MSDN.

Step Two: Open a Connection Context

The second step is to open a Connection Context. This is the handle that you will actually be using in all subsequent operations to be performed on this connection. This is also done by ZwCreateFile and it is also performed on the same device "\Device\Tcp". This device actually allows you to open three different handles. The three handles transport handle, the connection context and a control handle. A common mistake is to think that a handle open succeeded and it's actually a handle open to the wrong handle! This is because they use the "Extended Attributes" to determine which handle is being opened. Apparently, if the driver doesn't recognize the EA value, it then simply opens the default handle type, "Control"! This is documented in the description of the create on MSDN.

The following code demonstrates opening up a connection context. Note that you can also specify a pointer value called a "CONNECTION_CONTEXT" which is just a pointer to user defined data. Later you may notice that some event callbacks will provide this pointer back to you. This is essentially what you can use this context value for.

NTSTATUS TdiFuncs_OpenConnection(PHANDLE pTdiHandle , PFILE_OBJECT *pFileObject) { NTSTATUS NtStatus = STATUS_INSUFFICIENT_RESOURC ES; UNICODE_STRING usTdiDriverNameString; OBJECT_ATTRIBUTES oaTdiDriverNameAttributes; IO_STATUS_BLOCK IoStatusBlock; char DataBlob[sizeof(FILE_FULL_EA_INFORMATION) + TDI_CONNECTION_CONTEXT_LENGTH + 300] = {0}; PFILE_FULL_EA_INFORMATION pExtendedAttributesIn formation = (PFILE_FULL_EA_INFORMATION)&Dat aBlob; UINT dwEASize = 0; /* * Initialize the name of the device to be open ed. ZwCreateFile * takes an OBJECT_ATTRIBUTES structure as the name of the device * to open. This is then a two step process. * * 1 - Create a UNICODE_STRING data structure from a unicode string. * 2 - Create a OBJECT_ATTRIBUTES data structu re from a UNICODE_STRING. * */ RtlInitUnicodeString(&usTdiDriverNameString, L" \\Device\\Tcp"); InitializeObjectAttributes(&oaTdiDriverNameAttr ibutes,

&usTdiDriverNameString, OBJ_CASE_INSENSITIVE | OBJ_KERNEL_HANDL E, NULL, NULL); /* * The second step is to initialize the Extende d Attributes data structure. * * EaName = TdiConnectionContext, 0, Your User Defined Context Data * (Actually a pointer to it) * EaNameLength = Length of TdiConnectionCont ext * EaValueLength = Entire Length */ RtlCopyMemory(&pExtendedAttributesInformation- >EaName, TdiConnectionContext, TDI_C ONNECTION_CONTEXT_LENGTH); pExtendedAttributesInformation->EaNameLength = TDI_CONNE CTION_CONTEXT_LENGTH; pExtendedAttributesInformation->EaValueLength = TDI_CONNECTI ON_CONTEXT_LENGTH; /* Must be at least TDI_CONNECTION_CONTEXT_LENGTH * / dwEASize = sizeof(DataBlob); NtStatus = ZwCreateFile(pTdiHandle, FILE_READ_EA | FILE_WRITE_EA, &oaTdiDriver NameAttributes, &IoStatusBlock, NULL, FILE_ATTRIBUTE_NORMAL, 0, FILE_OPEN_IF, 0, pExtendedAttributesInformation, dwEASize); if(NT_SUCCESS(NtStatus)) { NtStatus = ObReferenceObjectByHandle(*pTd iHandle, GENERIC_READ | GENERIC_WR ITE, NULL, KernelMode, (PVOID *)pFileObject, NULL ); if(!NT_SUCCESS(NtStatus)) { ZwClose(*pTdiHandle); } } return NtStatus; }


Step Three: Associate The Transport Address and Connection Context

You need to associate the two handles, the transport and connection, before you can perform any operations. This is done by sending an IOCTL to the device. If you remember before how to send an IOCTL we need to allocate an IRP, set the parameters and send it to the device. This however is simplified since the TDI header files provide macros and other functions which can do this for you. The TdiBuildInternalDeviceControlIrp is actually a macro for calling IoBuildDeviceIoControlRequest . Some of the parameters to this macro are actually ignored but are useful just for comments (such as the supplied IOCTL!). This API is simple and we use it here for demonstration purposes however there are advantages to using other mechanisms for creating IRP's such as IoAllocateIrp which will be described

later. The other macros that we will be using simply set the parameters of the IO_STACK_LOCATION for the next lower driver.

The one thing you may notice different here than what we talked about last time is the "STATUS_PENDING". This will be discussed later in this tutorial.

The following code demonstrates how to do this.

NTSTATUS TdiFuncs_AssociateTransportAndConnection(H ANDLE hTransportAddress, PFILE_OBJECT p foConnection) { NTSTATUS NtStatus = STATUS_INSUFFICIENT_RESOURC ES; PIRP pIrp; IO_STATUS_BLOCK IoStatusBlock = {0}; PDEVICE_OBJECT pTdiDevice; TDI_COMPLETION_CONTEXT TdiCompletionContext; KeInitializeEvent(&TdiCompletionContext.kComple teEvent, NotificationEvent, FALSE); /* * The TDI Device Object is required to send th ese * requests to the TDI Driver. */ pTdiDevice = IoGetRelatedDeviceObject(pfoConnec tion); /* * Step 1: Build the IRP. TDI defines several macros and functions * that can quickly create IRP's, etc. for variuos purposes. * While this can be done manually it' s easiest to use the macros. * * http://msdn.microsoft.com/library/en-us/net work/hh/network/ * 34bldmac_f430860a-9ae2-4379-bffc-6b0a 81092e7c.xml.asp?frame=true */ pIrp = TdiBuildInternalDeviceControlIrp(TDI_ASS OCIATE_ADDRESS, pTdiDevice, pfoConnection, &TdiCompletion Context.kCompleteEvent, &IoStatusBlock); if(pIrp) { /* * Step 2: Add the correct parameters into the IRP. */ TdiBuildAssociateAddress(pIrp, pTdiDevice, pfoConnection, NULL, NU LL, hTransportAddress); NtStatus = IoCallDriver(pTdiDevice, pIrp); /* * If the status returned is STATUS_PENDIN G this means that the IRP * will not be completed synchronously and the driver has queued the * IRP for later processing. This is fine but we do not want * to return this thread, we are a synchro nous call so we want * to wait until it has completed. The EVE NT that we provided will * be set when the IRP completes. */ if(NtStatus == STATUS_PENDING) {

KeWaitForSingleObject(&TdiCompletionCon text.kCompleteEvent, Executive, Kerne lMode, FALSE, NULL); /* * Find the Status of the completed IRP */ NtStatus = IoStatusBlock.Status; } } return NtStatus; }


Step Four: Connect

To create the client side of a TCP connection, we need to connect!

NTSTATUS TdiFuncs_Connect(PFILE_OBJECT pfoConnectio n, UINT uiAddress, USH ORT uiPort) { NTSTATUS NtStatus = STATUS_INSUFFICIENT_RESOURC ES; PIRP pIrp; IO_STATUS_BLOCK IoStatusBlock = {0}; PDEVICE_OBJECT pTdiDevice; TDI_CONNECTION_INFORMATION RequestConnectionIn fo = {0}; TDI_CONNECTION_INFORMATION ReturnConnectionInf o = {0}; LARGE_INTEGER TimeOut = {0}; UINT NumberOfSeconds = 60*3; char cBuffer[256] = {0}; PTRANSPORT_ADDRESS pTransportAddress =(PTRANSPO RT_ADDRESS)&cBuffer; PTDI_ADDRESS_IP pTdiAddressIp; TDI_COMPLETION_CONTEXT TdiCompletionContext; KeInitializeEvent(&TdiCompletionContext.kComple teEvent, NotificationEvent , FALSE); /* * The TDI Device Object is required to send th ese * requests to the TDI Driver. */ pTdiDevice = IoGetRelatedDeviceObject(pfoConnec tion); /* * Step 1: Build the IRP. TDI defines several macros and functions * that can quickly create IRP's, etc. for variuos purposes. * While this can be done manually it's eas iest to use the macros. * * http://msdn.microsoft.com/library/en-us/net work/hh/network/ * 34bldmac_f430860a-9ae2-4379-bffc-6b0a81092 e7c.xml.asp?frame=true */ pIrp = TdiBuildInternalDeviceControlIrp(TDI_CON NECT, pTdiDevice, pfoConnection, &TdiCompletionContext .kCompleteEvent, &IoStatusBlock); if(pIrp) {

/* * Step 2: Add the correct parameters into the IRP. */ /* * Time out value */ TimeOut.QuadPart = 10000000L; TimeOut.QuadPart *= NumberOfSeconds; TimeOut.QuadPart = -(TimeOut.QuadPart); /* * Initialize the RequestConnectionInfo whi ch specifies * the address of the REMOTE computer */ RequestConnectionInfo.RemoteAddress = (PVOID)pTransportAddress; RequestConnectionInfo.RemoteAddressLength = sizeof(PTRANSPORT_ADDRESS) + sizeof(TDI_ADDRESS_IP); /* * The number of transport addresses */ pTransportAddress->TAAddressCount = 1; /* * This next piece will essentially descr ibe what the * transport b eing opened is. * AddressType = Type of transport * AddressLength = Length of the addr ess * Address = A data structure t hat is essentially * related to the cho sen AddressType. */ pTransportAddress->Address[0].AddressType = TDI_A DDRESS_TYPE_IP; pTransportAddress->Address[0].AddressLengt h = sizeo f(TDI_ADDRESS_IP); pTdiAddressIp = (TDI_ADDRESS_IP *)&pTransportAddress-> Address[0].Address; /* * The TDI_ADDRESS_IP data structure is es sentially simmilar * to the usermode sockets data structure. * sin_port * sin_zero * in_addr */ /* * Remember, these must be in NETWORK BYTE ORDER (Big Endian) */ /* Example: 1494 = 0x05D6 (Little Endian) or 0xD605 (Big Endian)*/ pTdiAddressIp->sin_port = uiPort; /* Example: 10.60.2.159 = 0A.3C.02.9F (Little Endia n) or 9F.02.3C.0A (Big Endian) */

pTdiAddressIp->in_addr = uiAddress; TdiBuildConnect(pIrp, pTdiDevice, pfoConnec tion, NULL, NULL, &TimeOut, &RequestConnectionInfo, &ReturnConnectionInfo); NtStatus = IoCallDriver(pTdiDevice, pIrp); /* * If the status returned is STATUS_PENDIN G this means * that the IRP will not be completed syn chronously * and the driver has queued the IRP for l ater processing. * This is fine but we do not want to retu rn this thread, * we are a synchronous call so we want to wait until * it has completed. The EVENT that we pr ovided will be * set when the IRP completes. */ if(NtStatus == STATUS_PENDING) { KeWaitForSingleObject(&TdiCompletionCon text.kCompleteEvent, Executive, KernelMode , FALSE, NULL); /* * Find the Status of the completed IRP */ NtStatus = IoStatusBlock.Status; } } return NtStatus; }


Step Five: Send and Receive Data

To send data you simply create a TDI_SEND IOCTL and pass it to the transport device. The following code implements the send:

NTSTATUS TdiFuncs_Send(PFILE_OBJECT pfoConnection, PVOID pData, UINT uiSendLength, UINT *pDataSent) { NTSTATUS NtStatus = STATUS_INSUFFICIENT_RESOURC ES; PIRP pIrp; IO_STATUS_BLOCK IoStatusBlock = {0}; PDEVICE_OBJECT pTdiDevice; PMDL pSendMdl; TDI_COMPLETION_CONTEXT TdiCompletionContext; KeInitializeEvent(&TdiCompletionContext.kComple teEvent, Notifi cationEvent, FALSE); /* * The TDI Device Object is required to * send these requests to the TDI Driver. */

pTdiDevice = IoGetRelatedDeviceObject(pfoConnec tion); *pDataSent = 0; /* * The send requires an MDL which is what you may remember from DIRECT_IO. * However, instead of using an MDL we need to create one. */ pSendMdl = IoAllocateMdl((PCHAR )pData, uiSendL ength, FALSE, FALSE, NULL); if(pSendMdl) { __try { MmProbeAndLockPages(pSendMdl, KernelMod e, IoModifyAccess); } __except (EXCEPTION_EXECUTE_HANDLER) { IoFreeMdl(pSendMdl); pSendMdl = NULL; }; if(pSendMdl) { /* * Step 1: Build the IRP. TDI defines several macros and functions * that can quickly create IRP' s, etc. for variuos purposes. * While this can be done manua lly it's easiest to use * the macros. */ pIrp = TdiBuildInternalDeviceControlIrp (TDI_SEND, pTdiDevice, pfoConnection, &TdiCompletionContext.kComplet eEvent, &IoStatusBlock); if(pIrp) { /* * Step 2: Add the correct paramete rs into the IRP. */ TdiBuildSend(pIrp, pTdiDevice, pfoC onnection, NULL, NULL, pSendM dl, 0, uiSendLength); NtStatus = IoCallDriver(pTdiDevice, pIrp); /* * If the status returned is STATU S_PENDING this means that the * IRP will not be completed synch ronously and the driver has * queued the IRP for later proces sing. This is fine but we do * not want to return this not wan t to return this not want to * return this to wait until it ha s completed. The EVENT * that we providedwill be set whe n the IRP completes. */ if(NtStatus == STATUS_PENDING) { KeWaitForSingleObject(&TdiCompl etionContext.kCompleteEvent, Executiv e, KernelMode, FALSE, NULL);

} NtStatus = IoStatusBlock.Status; *pDataSent = (UINT)IoStatusBlock.In formation; /* * I/O Manager will free the MDL * if(pSendMdl) { MmUnlockPages(pSendMdl); IoFreeMdl(pSendMdl); } */ } } } return NtStatus; }

The same can be done for receive using the TDI_RECIEVE however our implementation does not use this. If you notice, you can actually create notification callbacks to tell you when there is data or other events. This is what we have done and the API wrapper that I implemented to create any event handler is as follows:

NTSTATUS TdiFuncs_SetEventHandler(PFILE_OBJECT pfoT diFileObject, LONG InEventType, PVOID InEventHandler, PVOID InEventContext) { NTSTATUS NtStatus = STATUS_INSUFFICIENT_RESOURC ES; PIRP pIrp; IO_STATUS_BLOCK IoStatusBlock = {0}; PDEVICE_OBJECT pTdiDevice; LARGE_INTEGER TimeOut = {0}; UINT NumberOfSeconds = 60*3; TDI_COMPLETION_CONTEXT TdiCompletionContext; KeInitializeEvent(&TdiCompletionContext.kComple teEvent, NotificationEvent, F ALSE); /* * The TDI Device Object is required to send th ese * requests to the TDI Dr iver. */ pTdiDevice = IoGetRelatedDeviceObject(pfoTdiFil eObject); /* * Step 1: Build the IRP. TDI defines several macros and functions * that can quickly create IRP's, etc. for variuos purposes. * While this can be done manually it's easiest to use the macros. * */ pIrp = TdiBuildInternalDeviceControlIrp(TDI_SET _EVENT_HANDLER, pTdiDevice, pfoConnection, &TdiCompleti onContext.kCompleteEvent, &IoStatusBlock); if(pIrp) { /*

* Step 2: Set the IRP Parameters */ TdiBuildSetEventHandler(pIrp, pTdiDevice, p foTdiFileObject, NULL, NULL, InEventType, InEventHan dler, InEventContext); NtStatus = IoCallDriver(pTdiDevice, pIrp); /* * If the status returned is STATUS_PENDIN G this means that * the IRP will not be completed synchrono usly and the driver has * queued the IRP for later processing. T his is fine but we do not * want to return this thread, we are a sy nchronous call so we want * to wait until it has completed. The EV ENT that we provided * will be set when the IRP completes. */ if(NtStatus == STATUS_PENDING) { KeWaitForSingleObject(&TdiCompletionCon text.kCompleteEvent, Executive, Kern elMode, FALSE, NULL); /* * Find the Status of the completed IRP */ NtStatus = IoStatusBlock.Status; } } return NtStatus; }

The code which uses this API and implements the callback are as follows:

NtStatus = TdiFuncs_SetEventHandler( pTdiExampleContext->TdiHandle.pfo Transport, TDI_EVENT_RECEIVE, TdiExample_ClientEventReceive, (PVOID)pTdiExampleContext); ... NTSTATUS TdiExample_ClientEventReceive(PVOID TdiEve ntContext, CONNECTION_CONTEXT Connect ionContext, ULONG ReceiveFlags, ULONG BytesIndicated, ULONG BytesAvailable, ULONG *BytesTaken, PVOID Tsdu, PIRP *IoRequestPacket) { NTSTATUS NtStatus = STATUS_SUCCESS; UINT uiDataRead = 0; PTDI_EXAMPLE_CONTEXT pTdiExampleContext = (PTDI_EXAMPLE_CONTEXT)TdiEv entContext; PIRP pIrp; DbgPrint("TdiExample_ClientEventReceive 0x%0x, %i, %i\n", ReceiveFlags, BytesIndicated, Byt esAvailable);

*BytesTaken = BytesAvailable; /* * This implementation is extremely simple. We do not queue * data if we do not have an IRP to put it ther e. We also * assume we always get the full data packet se nt every recieve. * These are Bells and Whistles that can easily be added to * any implementation but would help to make th e implementation * more complex and harder to follow the underl ying idea. Since * those essentially are common-sense add ons t hey are ignored and * the general implementation of how to Queue I RP's and * recieve data are implemented. * */ pIrp = HandleIrp_RemoveNextIrp(pTdiExampleConte xt->pReadIrpListHead); if(pIrp) { PIO_STACK_LOCATION pIoStackLocation = IoGetCurrentIrpStackL ocation(pIrp); uiDataRead = BytesAvailable > pIoStackLocation->P arameters.Read.Length ? pIoStackLocation->Parameters.Read.Le ngth : BytesAvailable; pIrp->Tail.Overlay.DriverContext[0] = NULL; RtlCopyMemory(pIrp->AssociatedIrp.SystemBuf fer, Tsdu, uiDataRead); pIrp->IoStatus.Status = NtStatus; pIrp->IoStatus.Information = uiDataRead; IoCompleteRequest(pIrp, IO_NETWORK_INCREMEN T); } /* * The I/O Request can be used to recieve the r est of the data. * We are not using it in this example however and will actually * be assuming that we always get all the data . * */ *IoRequestPacket = NULL; return NtStatus; }

Don't get scared with the HandleIrp_RemoveNextIrp . we will actually be describing how to queue IRP requests later in this article.


Step Six: Disconnect

This is nothing special you just disconnect the connection by implementing the TDI_DISCONNECT IOCTL.

NTSTATUS TdiFuncs_Disconnect(PFILE_OBJECT pfoConnec tion)

{ NTSTATUS NtStatus = STATUS_INSUFFICIENT_RESOURC ES; PIRP pIrp; IO_STATUS_BLOCK IoStatusBlock = {0}; PDEVICE_OBJECT pTdiDevice; TDI_CONNECTION_INFORMATION ReturnConnectionInf o = {0}; LARGE_INTEGER TimeOut = {0}; UINT NumberOfSeconds = 60*3; TDI_COMPLETION_CONTEXT TdiCompletionContext; KeInitializeEvent(&TdiCompletionContext.kComple teEvent, NotificationEve nt, FALSE); /* * The TDI Device Object is required to send * these requests to the TDI Driver. */ pTdiDevice = IoGetRelatedDeviceObject(pfoConnec tion); /* * Step 1: Build the IRP. TDI defines several macros and functions * that can quickly create IRP's, etc. for variuos purposes. * While this can be done manually it's easiest to use the macros. * */ pIrp = TdiBuildInternalDeviceControlIrp(TDI_DIS CONNECT, pTdiDevice, pfoConnection, &TdiCompletionCont ext.kCompleteEvent, &IoStatusBlock); if(pIrp) { /* * Step 2: Add the correct parameters into the IRP. */ /* * Time out value */ TimeOut.QuadPart = 10000000L; TimeOut.QuadPart *= NumberOfSeconds; TimeOut.QuadPart = -(TimeOut.QuadPart); TdiBuildDisconnect(pIrp, pTdiDevice, pfoCon nection, NULL, NULL, &TimeOut, TDI_DISCONNEC T_ABORT, NULL, &ReturnConnectionInfo); NtStatus = IoCallDriver(pTdiDevice, pIrp); /* * If the status returned is STATUS_PENDIN G this means that the * IRP will not be completed synchronously and the driver has * queued the IRP for later processing. T his is fine but we do * not want to return this thread, we are a synchronous call so * we want to wait until it has completed. The EVENT that * we provided will be set when the IRP co mpletes. */ if(NtStatus == STATUS_PENDING)

{ KeWaitForSingleObject(&TdiCompletionCon text.kCompleteEvent, Executive, Kernel Mode, FALSE, NULL); /* * Find the Status of the completed IRP */ NtStatus = IoStatusBlock.Status; } } return NtStatus; }


Step Seven: Disassociate the Handles

This is very simple; we just implement another IOCTL call as follows.

NTSTATUS TdiFuncs_DisAssociateTransportAndConnectio n(PFILE_OBJECT pfoConnection) { NTSTATUS NtStatus = STATUS_INSUFFICIENT_RESOURC ES; PIRP pIrp; IO_STATUS_BLOCK IoStatusBlock = {0}; PDEVICE_OBJECT pTdiDevice; TDI_COMPLETION_CONTEXT TdiCompletionContext; KeInitializeEvent(&TdiCompletionContext.kComple teEvent, NotificationEvent , FALSE); /* * The TDI Device Object is required to send th ese requests to the TDI Driver. * */ pTdiDevice = IoGetRelatedDeviceObject(pfoConnec tion); /* * Step 1: Build the IRP. TDI defines several macros and * functions that can quickly create IR P's, etc. for * variuos purposes. While this can be done manually * it's easiest to use the macros. * */ pIrp = TdiBuildInternalDeviceControlIrp(TDI_DIS ASSOCIATE_ADDRESS, pTdiDevice, pfoConnection, &TdiCompletionContext.kCompleteEvent, &IoStatusBlock); if(pIrp) { /* * Step 2: Add the correct parameters into the IRP. */ TdiBuildDisassociateAddress(pIrp, pTdiDevic e, pfoConnection, NULL, NULL);

NtStatus = IoCallDriver(pTdiDevice, pIrp); /* * If the status returned is STATUS_PENDIN G this means that the * IRP will not be completed synchronously and the driver has * queued the IRP for later processing. T his is fine but we * do not want to return this thread, we a re a synchronous call * so we want to wait until it has complet ed. The EVENT that we * provided will be set when the IRP compl etes. */ if(NtStatus == STATUS_PENDING) { KeWaitForSingleObject(&TdiCompletionCon text.kCompleteEvent, Executive, KernelM ode, FALSE, NULL); /* * Find the Status of the completed IRP */ NtStatus = IoStatusBlock.Status; } } return NtStatus; }


Step Eight: Close the Handles

This function is called on both handles, the Transport and the Connection Context.

NTSTATUS TdiFuncs_CloseTdiOpenHandle(HANDLE hTdiHan dle, PFILE_OBJECT pfoT diFileObject) { NTSTATUS NtStatus = STATUS_SUCCESS; /* * De-Reference the FILE_OBJECT and Close The H andle */ ObDereferenceObject(pfoTdiFileObject); ZwClose(hTdiHandle); return NtStatus; }


Other Resources

The TDI Interface will get a bit easier once you get familiar with it. One of the biggest things to get right when writing any driver is your IRP handling. TDI does seem a little bit more complex than sockets but it is a kernel interface.

If you have ever investigated TDI or NDIS you have probably run into Thomas Divine. If you are looking to purchase complex TDI or NDIS examples, you can find them and other resources on the website of his company. You can also find tutorials of his on various other websites.

IRP Handling

The last article touched on some very basic concepts of IRPs and how to handle them. To keep that article simple, there are actually large gaps in what was described. So in this article we will pick up the pace and attempt to fill in as many of those gaps as we can. You should have a decent bit of exposure to driver development at this time that we should be able to do this quite easily however it will be a lot of information and not all of it is in the example code. You will need to experiment with IRP handling yourself. It is the essential part of developing a driver.

Driver Requests

When writing a driver there are two different times that you will be exposed to IRPs. These are IRPs that are requested to your driver and IRPs that you create to request processing from other drivers. As we remember, there is a stack of drivers and each driver in the stack has their own stack location in the IRP. Each time an IRP is sent down the stack the current stack location of that IRP is advanced. When it comes to your driver you have a few choices.

Forward and Forget

You can forward the IRP to the next driver in the stack using IoCallDriver . This is what we did in the other driver tutorial. We forwarded the IRP on and forgot about it. There was one problem though, we didn't take into about STATUS_PENDING. STATUS_PENDING is a method of implementing asynchronous operations. The lower level driver is notifying the caller that they are not finished with the IRP. They may also be completing this IRP on a separate thread. The rule is that if you return STATUS_PENDING, you must also call IoMarkIrpPending before returning. This is now a problem though if you have forwarded the IRP to the next driver. You are not allowed to touch it after the call! So you have essentially two choices.

IoMarkIrpPending(Irp); IoCallDriver(pDeviceObject, Irp); return STATUS_PENDING;

The second choice would be to set a completion routine. We should remember those from the code in part 4 however we used them then to simply stop the IRP from completing by returning STATUS_MORE_PROCESSING_REQUIRED instead of STATUS_SUCCESS.

IoSetCompletionRoutine(Irp, CompletionRoutine, N ULL, TRUE, TRUE, TRUE); return IoCallDriver(pDeviceObject, Irp); ... NTSTATUS CompletionRoutine(PDEVICE_OBJECT Device Object, PIRP Irp, PVOID Contex t) { if(Irp->PendingReturned) { IoMarkIrpPending(Irp);

} return STATUS_SUCCESS; }

You could again stop the processing here and if you did, you would not need to do IoMarkIrpPending . There is circular logic here, if you call IoMarkIrpPending then you must return STATUS_PENDING from your driver. If you return STATUS_PENDING from your driver then you must call IoMarkIrpPending . Remember though if you stop processing of a completion, it means that you must then complete it! We did this in part 4.

One thing to note is, it's possible that if a completion routine isn't supplied, that the I/O Manager may be nice enough to propagate this "IoMarkIrpPending" information for you. However information is so scattered on this subject that you may not want to trust that and just make sure everything you do is correct.

Forward and Post Process

This is what we actually did in Part 4 with a slight difference. We need to take into account the pending architecture and if the IRP returns pending from the lower level driver, we need to wait until the lower level driver completes it. Once the driver has completed it we need to wake up our original thread so that we can do processing and complete the IRP. As an optimization, we only want to set the event if pending was returned. There is no reason to add overhead of setting and waiting on events if everything is being processed synchronously! The following is a code example of this.

IoSetCompletionRoutine(Irp, CompletionRoutine, &kCompleteEvent, TRUE, TRUE, TRUE ); NtStatus = IoCallDriver(pDeviceObject, Irp); if(NtStatus == STATUS_PENDING) { KeWaitForSingleObject(&kCompleteEvent, Executive, KernelMode, FALSE, NULL); /* * Find the Status of the completed IRP */ NtStatus = IoStatusBlock.Status; } /* * Do Post Processing */ IoCompleteRequest(pIrp, IO_NO_INCREMENT); return NtStatus; ... NTSTATUS CompletionRoutine(PDEVICE_OBJECT Device Object, PIRP Irp, PVOI D Context) { if(Irp->PendingReturned) {

KeSetEvent(Context, IO_NO_INCREMENT, FALSE ); } return STATUS_MORE_PROCESSING_REQUIRED; }

Queue and Pend

You have the option to queue the IRP and process it at a later time or on another thread. This is allowed since you own the IRP while it is at your driver stack level. You have to take into account that the IRP can be canceled. The problem is that if the IRP is canceled, you really don't want to perform any processing since the result will be thrown away. The other problem we want to solve is that, if there are active IRPs associated with a process or thread that process or thread cannot be completely terminated until all active IRPs have been completed. This is very tricky and documentation on how to do this is scarce. However we will show you how to do it here.

Grab your lock

The first thing you need to do is acquire your spinlock that protects your IRP list. This will help synchronize the execution between your queuing logic and your cancel routine. There is a system cancel spinlock that can also be acquired and in some cases it needs to be if you are using certain system provided queuing mechanisms. However since the cancel spinlock is system wide, what do you think is more likely? That another processor would grab your spinlock or that it would grab the cancel spinlock? Most likely it would end up grabbing the cancel spinlock and this can be a performance hit. On a single processor machine, it obviously doesn't matter which one you use but you should attempt to implement your own spinlock.

Set a Cancel Routine

Your cancel routine will also need to grab your spinlock to synchronize execution and remove IRPs from the list. Setting a cancel routine makes sure that if this IRP is canceled, then you know about it and can remove it from your IRP list. Remember, you STILL MUST COMPLETE THE IRP! There's no way around it. If an IRP is canceled it just doesn't disappear from out under your feet. If it did then while you processed the IRP, if it was canceled, you'd be in big trouble! The purpose of the cancel routine is just while it is in the queue it can be removed from the queue at any time if it's canceled without any hassle.

Check Cancel Flag

You then must check the cancel flag of the IRP. If it is not canceled then you will call IoMarkIrpPending and queue the IRP onto your linked list or whatever you have. You then must make sure that you return STATUS_PENDING from your driver.

If it has been canceled we need to know if it called your cancel routine. You do this by setting the cancel routine to NULL. If the return value is NULL then your cancel routine was called. If the return value is not NULL then the cancel routine was not called. That just means it was canceled before you set the cancel routine.

You now have two choices remember that only one location can complete the IRP. If the cancel routine was called then as long as the cancel routine doesn't complete the IRP, if it's not in your IRP list, then you can free it. If the cancel routine always completes it, then you must not complete it. If the cancel routine was not called then you obviously must complete it. No matter what happens you must remember two things. The first is that somewhere in your driver you must complete this IRP. The second thing to remember is that you must never complete it twice!

When you remove an IRP from the list it's the same thing. You should always check to make sure the IRP has not been canceled. You will also set the cancel routine to NULL before removing the IRP to process it. That way even if it is canceled now you don't care, the result will just be thrown away. The best thing to do now is just to see the code.

Irp->Tail.Overlay.DriverContext[0] = (PVOID)pTdiExampleContext->pWriteIrpListHe ad; NtStatus = HandleIrp_AddIrp(pTdiExampleContext->pWr iteIrpListHead, Irp, TdiExample_CancelRoutine, TdiExample_ IrpCleanUp, NULL); if(NT_SUCCESS(NtStatus)) { KeSetEvent(&pTdiExampleContext->kWriteIrpReady, IO_NO_INCREMENT, FALSE); NtStatus = STATUS_PENDING; } ... /************************************************** ******************** * * HandleIrp_AddIrp * * This function adds an IRP to the IRP List. * ************************************************** ********************/ NTSTATUS HandleIrp_AddIrp(PIRPLISTHEAD pIrpListHead , PIRP pIrp, PDRIVER_CANCEL pDriverCan celRoutine, PFNCLEANUPIRP pfnCleanUpI rp, PVOID pContext) { NTSTATUS NtStatus = STATUS_UNSUCCESSFUL; KIRQL kOldIrql; PDRIVER_CANCEL pCancelRoutine; PIRPLIST pIrpList; pIrpList = (PIRPLIST)KMem_AllocateNonPagedMemor y(sizeof(IRPLIST), pIrpLis tHead->ulPoolTag); if(pIrpList) { DbgPrint("HandleIrp_AddIrp Allocate Memory = 0x%0x \r\n", pIrpList); pIrpList->pContext = pContext; pIrpList->pfnCleanUpIrp = pfnCleanUpIrp; pIrpList->pIrp = pIrp; pIrpList->pfnCancelRoutine = pDriverCancelR outine; /* * The first thing we need to to is acquir e our spin lock. * * The reason for this is a few things. * * 1. All access to this list is synchr onized, the obvious reason * 2. This will synchronize adding this IRP to the * list with the cancel routine . */ KeAcquireSpinLock(&pIrpListHead->kspIrpList Lock, &kOldIrql);

/* * We will now attempt to set the cancel ro utine which will be called * when (if) the IRP is ever canceled. Thi s allows us to remove an IRP * from the queue that is no longer valid. * * A potential misconception is that if the IRP is canceled it is no * longer valid. This is not true the IRP does not self-destruct. * The IRP is valid as long as it has not b een completed. Once it * has been completed this is when it is no longer valid (while we * own it). So, while we own the IRP we ne ed to complete it at some * point. The reason for setting a cancel routine is to realize * that the IRP has been canceled and compl ete it immediately and * get rid of it. We don't want to do proce ssing for an IRP that * has been canceled as the result will jus t be thrown away. * * So, if we remove an IRP from this list f or processing and * it's canceled the only problem is that w e did processing on it. * We complete it at the end and there's no problem. * * There is a problem however if your code is written in a way * that allows your cancel routine to compl ete the IRP unconditionally. * This is fine as long as you have some ty pe of synchronization * since you DO NOT WANT TO COMPLETE AN IRP TWICE!!!!!! */ IoSetCancelRoutine(pIrp, pIrpList->pfnCance lRoutine); /* * We have set our cancel routine. Now, ch eck if the IRP has * already been can celed. * We must set the cancel routine before ch ecking this to ensure * that once we queue the IRP it will defin ately be called if the * IRP is ever canceled. */ if(pIrp->Cancel) { /* * If the IRP has been canceled we can then check if our * cancel routine has been called. */ pCancelRoutine = IoSetCancelRoutine(pIr p, NULL); /* * if pCancelRoutine == * NULL then our cancel rou tine has been called. * if pCancelRoutine != * NULL then our cancel ro utine has not been called. * * The I/O Manager will set the cancel routine to NULL * before calling the cancel routine. * We have a decision to make here, we need to write the code * in a way that we only complete and c lean up the IRP once. * We either allow the cancel routine t o do it or we do it here. * Now, we will already have to clean u p the IRP here if the * pCancelRoutine != NULL. * * The solution we are going with here is that we will only clean * up IRP's in the cancel routine if t he are in the list. * So, we will not add any IRP to the l ist if it has * already been canceled once we get to this location.

* */ KeReleaseSpinLock(&pIrpListHead->kspIrp ListLock, kOldIrql); /* * We are going to allow the clean up f unction to complete the IRP. */ pfnCleanUpIrp(pIrp, pContext); DbgPrint("HandleIrp_AddIrp Complete Fre e Memory = 0x%0x \r\n", pIrpList); KMem_FreeNonPagedMemory(pIrpList); } else { /* * The IRP has not been canceled, so we can simply queue it! */ pIrpList->pNextIrp = NULL; IoMarkIrpPending(pIrp); if(pIrpListHead->pListBack) { pIrpListHead->pListBack->pNextIrp = pIrpList; pIrpListHead->pListBack = pIrpList; } else { pIrpListHead->pListFront = pIrpListH ead->pListBack = pIrpList; } KeReleaseSpinLock(&pIrpListHead->kspIrp ListLock, kOldIrql); NtStatus = STATUS_SUCCESS; } } else { /* * We are going to allow the clean up funct ion to complete the IRP. */ pfnCleanUpIrp(pIrp, pContext); } return NtStatus; } /************************************************** ******************** * * HandleIrp_RemoveNextIrp * * This function removes the next valid IRP. * ************************************************** ********************/ PIRP HandleIrp_RemoveNextIrp(PIRPLISTHEAD pIrpListH ead) {

PIRP pIrp = NULL; KIRQL kOldIrql; PDRIVER_CANCEL pCancelRoutine; PIRPLIST pIrpListCurrent; KeAcquireSpinLock(&pIrpListHead->kspIrpListLock , &kOldIrql); pIrpListCurrent = pIrpListHead->pListFront; while(pIrpListCurrent && pIrp == NULL) { /* * To remove an IRP from the Queue we first want to * reset the cancel routine. */ pCancelRoutine = IoSetCancelRoutine(pIrpLis tCurrent->pIrp, NULL); /* * The next phase is to determine if this I RP has been canceled */ if(pIrpListCurrent->pIrp->Cancel) { /* * We have been canceled so we need to determine if our * cancel routine has already been call ed. pCancelRoutine * will be NULL if our cancel routine h as been called. * If will not be NULL if our cancel ro utine has not been * called. However, we don't care in e ither case and we * will simply complete the IRP here si nce we have to implement at * least that case anyway. * * Remove the IRP from the list. */ pIrpListHead->pListFront = pIrpListCurr ent->pNextIrp; if(pIrpListHead->pListFront == NULL) { pIrpListHead->pListBack = NULL; } KeReleaseSpinLock(&pIrpListHead->kspIrp ListLock, kOldIrql); pIrpListCurrent->pfnCleanUpIrp(pIrpList Current->pIrp, pIrpLis tCurrent->pContext); DbgPrint("HandleIrp_RemoveNextIrp Compl ete Free Memory = 0x%0 x \r\n", pIrpListCurrent); KMem_FreeNonPagedMemory(pIrpListCurrent ); pIrpListCurrent = NULL; KeAcquireSpinLock(&pIrpListHead->kspIrp ListLock, &kOldIrql); pIrpListCurrent = pIrpListHead->pListF ront; } else { pIrpListHead->pListFront = pIrpListCurr ent->pNextIrp;

if(pIrpListHead->pListFront == NULL) { pIrpListHead->pListBack = NULL; } pIrp = pIrpListCurrent->pIrp; KeReleaseSpinLock(&pIrpListHead->kspIrp ListLock, kOldIrql); DbgPrint("HandleIrp_RemoveNextIrp Compl ete Free Memory = 0x%0x \r\n", pIrpListCurrent); KMem_FreeNonPagedMemory(pIrpListCurrent ); pIrpListCurrent = NULL; KeAcquireSpinLock(&pIrpListHead->kspIrp ListLock, &kOldIrql); } } KeReleaseSpinLock(&pIrpListHead->kspIrpListLock , kOldIrql); return pIrp; } /************************************************** ******************** * * HandleIrp_PerformCancel * * This function removes the specified IRP from the list. * ************************************************** ********************/ NTSTATUS HandleIrp_PerformCancel(PIRPLISTHEAD pIrpL istHead, PIRP pIrp) { NTSTATUS NtStatus = STATUS_UNSUCCESSFUL; KIRQL kOldIrql; PIRPLIST pIrpListCurrent, pIrpListPrevious; KeAcquireSpinLock(&pIrpListHead->kspIrpListLock , &kOldI rql); pIrpListPrevious = NULL; pIrpListCurrent = pIrpListHead->pListFront; while(pIrpListCurrent && NtStatus == STATUS_UNS UCCESSFUL) { if(pIrpListCurrent->pIrp == pIrp) { if(pIrpListPrevious) { pIrpListPrevious->pNextIrp = pIrpLis tCurrent->pNextIrp; } if(pIrpListHead->pListFront == pIrpList Current) { pIrpListHead->pListFront = pIrpListC urrent->pNextIrp; } if(pIrpListHead->pListBack == pIrpListC urrent)

{ pIrpListHead->pListBack = pIrpListP revious; } KeReleaseSpinLock(&pIrpListHead->kspIrp ListLock, kOldIrql); NtStatus = STATUS_SUCCESS; /* * We are going to allow the clean up f unction to complete the IRP. */ pIrpListCurrent->pfnCleanUpIrp(pIrpList Current->pIrp, pIrpL istCurrent->pContext); DbgPrint("HandleIrp_PerformCancel Compl ete Free Memory = 0x%0x \r\n", pIrpListCurrent); KMem_FreeNonPagedMemory(pIrpListCurrent ); pIrpListCurrent = NULL; KeAcquireSpinLock(&pIrpListHead->kspIrp ListLock, &kOldIrql); } else { pIrpListPrevious = pIrpListCurrent; pIrpListCurrent = pIrpListCurrent->pNex tIrp; } } KeReleaseSpinLock(&pIrpListHead->kspIrpListLock , kOldIrql); return NtStatus; } /************************************************** ******************** * * TdiExample_CancelRoutine * * This function is called if the IRP is ever ca nceled * * CancelIo() from user mode, IoCancelIrp() from the Kernel * ************************************************** ********************/ VOID TdiExample_CancelRoutine(PDEVICE_OBJECT Device Object, PIRP pIrp) { PIRPLISTHEAD pIrpListHead = NULL; /* * We must release the cancel spin lock */ IoReleaseCancelSpinLock(pIrp->CancelIrql); DbgPrint("TdiExample_CancelRoutine Called IRP = 0x%0x \r\n", pIrp); /* * We stored the IRPLISTHEAD context in our Dri verContext on the IRP

* before adding it to the queue so it should n ot be NULL here. */ pIrpListHead = (PIRPLISTHEAD)pIrp->Tail.Overlay .DriverContext[0]; pIrp->Tail.Overlay.DriverContext[0] = NULL; /* * We can then just throw the IRP to the Perfor mCancel * routine since it will find it in the queue, remove it and * then call our clean up routine. Our clean u p routine * will then complete the IRP. If this does no t occur then * our completion of the IRP will occur in anot her context * since it is not in the list. */ HandleIrp_PerformCancel(pIrpListHead, pIrp); } /************************************************** ******************** * * TdiExample_IrpCleanUp * * This function is called to clean up the IRP i f it is ever * canceled after we have given it to the queuei ng routines. * ************************************************** ********************/ VOID TdiExample_IrpCleanUp(PIRP pIrp, PVOID pContex t) { pIrp->IoStatus.Status = STATUS_CANCELLED; pIrp->IoStatus.Information = 0; pIrp->Tail.Overlay.DriverContext[0] = NULL; DbgPrint("TdiExample_IrpCleanUp Called IRP = 0x %0x \r\n", pIrp); IoCompleteRequest(pIrp, IO_NO_INCREMENT); }

Alternatively you can use something like cancel safe IRP queues.

Process and Complete

This is where you simply process the request in line and complete it. If you don't return STATUS_PENDING then you are fine. This is what we have been doing with all the driver requests in most of the tutorials. We process them and then when we are done. We simply call IoCompleteRequest which is a mandatory call.

Creating IRPs

There was an extreme brief description of how to create and send IRPs in the previous article. We will go over those steps again here in more detail. We will also learn the difference between the APIs that we can use to create IRPs.

Step One: Create the IRP

There are a few APIs that can be used to create an IRP. As we already know, however there is a difference between them that we need to understand. The source in article 4 was very sloppy with IRP handling and this was simply to introduce IRPs without having to explain everything that we are explaining here.

There are Asynchronous IRPs and Synchronous IRPs. If you create an IRP using IoAllocateIrp or IoBuildAsynchronousFsdRequest , you have created an Asynchronous IRP. This means that you should set a completion routine and when the IRP is completed you need to call IoFreeIrp . You are in control of these IRPs and you must handle them appropriately.

If you create an IRP using IoBuildDeviceIoControlRequest or IoBuildSynchronousFsdRequest , then you have created a Synchronous IRP. Remember, TdiBuildInternalDeviceControlIrp is a macro and creates a synchronous IRP. These IRPs are owned and managed by the I/O Manager! Do not free them! This is a common mistake I have seen with code on the internet that they call IoFreeIrp on failure! These IRPs MUST be completed using IoCompleteRequest . If you pass this IRP down to IoCallDriver , you do not need to complete it as the driver below will do it for you. If you do intercept the IRP with a completion routine, you will need to call IoCompleteRequest after you are done with it though.

Also remember before you consider creating an IRP make sure that you understand what IRQL your code will be called at. The benefit of using IoAllocateIrp is that it can be used at DISPATCH_LEVEL where as IoBuildDeviceIoControlRequest cannot.

Step Two: Setup the IRP Parameters

This is very simple and taking the TDI example the macro TdiBuildSend shows us how to do this. We use the IoGetNextIrpStackLocation and we simply set the parameters. We also set the Mdl and any other attributes we need to on the IRP itself.

Step Four: Send to the driver stack

This is very simple and we have done it over and over again. We simply use IoCallDriver to send the IRP down the stack.

Step Five: Wait and Clean up

If the driver returned any status besides "STATUS_PENDING" you are done. If you created the IRP asynchronously, then you either freed the IRP in the completion routine or set it for more processing here in which you do that now and free it with IoAllocateIrp .

If you created a synchronous IRP, you either let the I/O Manager handle it and you're done or you set the completion routine to return more processing in which case you do it here than call IoCompleteRequest .

If the status returned is "STATUS_PENDING" you now have a few choices. You can either wait here depending on the IRP or you can leave and complete it asynchronously. It all depends on your architecture. If you have created the IRP as asynchronous then your completion routine you set must check if the IRP was set to "Pending" and then set your event. That way you don't waste processing if there's no need. This is also why you don't wait on the event unless STATUS_PENDING was returned. Imagine how slow everything would be if all calls waited on the event no matter what!

If your IRP was created synchronously then the I/O Manager will set this event for you. You don't need to do anything unless you want to return the status more processing from the completion routine. Please read the section on "How Completion Works" to further understand what to do here.

Non-Paged Driver Code

If you remember in the first tutorial we learned about #pragma and the ability to put our driver code into different sections. There was the INIT section which was discardable and the PAGE section which put the memory into pagable code area. What about code that acquires a spinlock? What do we do when the code has to be non-pagable? We just don't specify #pragma ! The default state of a loaded driver is to be in Non-Paged Memory we are actually forcing it into Paged memory with #pragma since we don't want the system to run out of physical memory when there's no need to be non-paged.

If you look at the code, you will notice that some of the #pragma 's are commented out. These are the functions that need to be non-paged as they use spinlocks and run at > APC_LEVEL. The reason I commented them out as opposed to just not putting them in is that I didn't want you to think I just forgot them and add them! I wanted to show that I made a decision to leave them out!

/* #pragma alloc_text(PAGE, HandleIrp_FreeIrpListWi thCleanUp) */ /* #pragma alloc_text(PAGE, HandleIrp_AddIrp) */ /* #pragma alloc_text(PAGE, HandleIrp_RemoveNextIrp ) */ #pragma alloc_text(PAGE, HandleIrp_CreateIrpList) #pragma alloc_text(PAGE, HandleIrp_FreeIrpList) /* #pragma alloc_text(PAGE, HandleIrp_PerformCancel ) */

How Completion Works?

The completion works in a way that each device's STACK LOCATION may have an associated completion routine. This completion routine is actually called for the driver above it not for the current driver! The current driver knows when he completes it. So when the driver does complete it the completion routine of the current stack location is read and if it exists it's called. Before it is called the current IO_STACK_LOCATION is moved to point to the previous driver's location! This is important as we will see in a minute. If that driver does not complete it, it must propagate the pending status up by calling "IoMarkIrpPending " as we mentioned before. This is because if the driver returns STATUS_PENDING, it must mark the IRP as pending. If it doesn't return the same status as the lower level driver, it doesn't need to mark the IRP as pending. Perhaps it intercepted the STATUS_PENDING and waited for the completion. It could then stop the completion of the IRP and then complete it again while returning a status other than STATUS_PENDING.

That is probably a bit confusing so you refer back up to the talk on how to "Forward and Post Process". Now if your driver created the IRP you do not have to mark the IRP as pending! You know why? Because you don't have an IO_STACK_LOCATION! You are not on the device's stack! You will actually start to corrupt memory if you do this! You have two choices here. You have a few different choices here and none of them involve calling "IoMarkIrpPending "!!!

You will notice that example code may actually show a completion routine calling "IoMarkIrpPending " even though it created the IRP! This is not what should happen. In fact, if you look at real code, if a Synchronous IRP is created the completion routine usually doesn't exist or exists solely to return the status more processing.

I implemented a completion routine in our TDI Client driver. We create synchronous IRPs there however if you check out bit of debugging as follows:

kd> kb ChildEBP RetAddr Args to Child fac8ba90 804e4433 00000000 80d0c9b8 00000000 netdrv!TdiFuncs_CompleteIrp [.\tdifuncs.c @ 829 ] fac8bac0 fbb20c54 80d1d678 80d0c9b8 00000000 nt!Iop fCompleteRequest+0xa0 fac8bad8 fbb2bd9b 80d0c9b8 00000000 00000000 tcpip! TCPDataRequestComplete+0xa4

fac8bb00 fbb2bd38 80d0c9b8 80d0ca28 80d1d678 tcpip! TCPDisassociateAddress+0x4b fac8bb14 804e0e0d 80d1d678 80d0c9b8 c000009a tcpip!TCPDispatchInternalDeviceControl+0x9b fac8bb24 fc785d65 ffaaa3b0 80db4774 00000000 nt!Iof CallDriver+0x3f fac8bb50 fc785707 ff9cdc20 80db4774 fc786099 netdrv!TdiFuncs_DisAssociateTransportAndConnect ion+0x94 [.\tdifuncs.c @ 772] fac8bb5c fc786099 80db4774 ffaaa340 ff7d1d98 netdrv!TdiFuncs_FreeHandles+0xd [.\tdifuncs.c @ 112] fac8bb74 804e0e0d 80d33df0 ffaaa340 ffaaa350 netdrv!TdiExample_CleanUp+0x6e [.\functions.c @ 459] fac8bb84 80578ce9 00000000 80cda980 00000000 nt!Iof CallDriver+0x3f fac8bbbc 8057337c 00cda998 00000000 80cda980 nt!Iop DeleteFile+0x138 fac8bbd8 804e4499 80cda998 00000000 000007dc nt!Obp RemoveObjectRoutine+0xde fac8bbf4 8057681a ffb3e6d0 000007dc e1116fb8 nt!Obf DereferenceObject+0x4b fac8bc0c 80591749 e176a118 80cda998 000007dc nt!Obp CloseHandleTableEntry+0x137 fac8bc24 80591558 e1116fb8 000007dc fac8bc60 nt!Obp CloseHandleProcedure+0x1b fac8bc40 805916f5 e176a118 8059172e fac8bc60 nt!ExS weepHandleTable+0x26 fac8bc68 8057cfbe ffb3e601 ff7eada0 c000013a nt!ObK illProcess+0x64 fac8bcf0 80590e70 c000013a ffa25c98 804ee93d nt!Psp ExitThread+0x5d9 fac8bcfc 804ee93d ffa25c98 fac8bd48 fac8bd3c nt!PsE xitSpecialApc+0x19 fac8bd4c 804e7af7 00000001 00000000 fac8bd64 nt!KiD eliverApc+0x1c3 kd> dds esp fac8ba94 804e4433 nt!IopfCompleteRequest+0xa0 fac8ba98 00000000 ; This is the PDEVICE_OBJECT, i t's NULL!! fac8ba9c 80d0c9b8 ; This is IRP fac8baa0 00000000 ; This is our context (NULL) kd> !irp 80d0c9b8 Irp is active with 1 stacks 2 is current (= 0x80d0c a4c) No Mdl Thread ff7eada0: Irp is completed. Pendin g has been returned cmd flg cl Device File Completion-Conte xt [ f, 0] 0 0 80d1d678 00000000 fc786579-0000000 0 \Driver\Tcpip netdrv!TdiFuncs_Complet eIrp Args: 00000000 00000000 00000000 000000 00 If there's only 1 stack how can it be on 2?

As you can see we are at IO_STACK_LOCATION #2, which does not exist. So the IRP actually starts out at a high IO_STACK_LOCATION which does not exist. If you remember, we need to call IoGetNextIrpStackLocation to set the parameters! This means that if we call IoMarkIrpPending here, we will essentially be accessing memory we shouldn't be as IoMarkIrpPending actually sets bits in the IO_STACK_LOCATION! The one thing that is also odd is that the device object is NULL. This is most likely because our stack location does not exist! We do not have an associated device object since we are not apart of this device stack. This is valid. By the way, the stack number may be incremented beyond the number of stacks for the I/O Manager and for the originator of the request. It's just not valid to attempt to actually use these stack locations!

Why STATUS_PENDING?

As if I haven't already confused you enough we need to talk about STATUS_PENDING and IoMarkIrpPending . What's the use? The use is because we can process IRP's asynchronously and the upper level drivers and I/O Manager need to know! The first part, STATUS_PENDING is returned as an optimization. So if we want to wait we ONLY do it for asynchronous operations. The second part is that the IoMarkIrpPending is actually what propagates the "PendingReturned " status on the IRP. That way we can optimize, so we don't always have to call KeSetEvent and only do it in the case where STATUS_PENDING was returned!

The other use is that a driver in the middle of the stack can change this status from STATUS_PENDING to STATUS_SUCCESS and not propagate the whole pending all the way up the driver stack. This way again the optimizations come into play and we don't have to do a lot of the extra handling that occurs on asynchronous operations. Remember that the IRP has two code paths, the return value up the stack and the completion path which may occur on a different threads. So you see why they need to be synchronized as well as propagate this status up both paths.

Overlapped I/O

The "STATUS_PENDING" architecture is essentially how Overlapped I/O is implemented. Just because the example source in this article uses ReadFileEx and WriteFileEx doesn't mean that ReadFile and WriteFile would not work here. They also work. If you look at the CreateFile API, I added a flag to enable Overlapped I/O. If you remove this flag the I/O Manager will actually block on STATUS_PENDING rather than return to the application. It will sit on an event until the I/O is completed. This is essentially why the user mode application was implemented using asynchronous I/O. Give these different methods a try!

Other Resources

The following are other resources and articles on IRP Handling that you may want to refer to and read.

These are "cheat sheets" which simply show sample code on how to handle IRPs. I am skeptical on the information in Cheat Sheet 2 on the IRP Completion routines which mark the Synchronous IRPs as Pending! Remember what I talked about the IRP completion routine is called with the stack location of that device. If you allocated that IRP, it doesn't mean you are on the device stack! I have not tried the code myself, so I could be missing something in the implementation.

There are many other resources out on the web and the URLs I provided will probably be gone or moved someday!

Example Source

The example source will build six binaries as listed here.

CHATCLIENT.EXE - Winsock Chat Client CHATCLIENTNET.EXE - Lightbulb Chat Client CHATSERVER.EXE - Winsock Chat Server DRVLOAD.EXE - Example TDI Client Driver Loade r NETDRV.SYS - Example TDI Client Driver NETLIB.LIB - Lightbulb Library

The TDI Client Driver that was created can be used using a simple API set as implemented in NETLIB.LIB. I named it the "LightBulb " API set as a play on "Sockets". There is essentially two clients where one uses Winsock and one uses Lightbulbs simply for example purposes.

Driver Architecture

The architecture of the driver is very simple. It simply queues all read and write IRPs. It has a special write thread that it created in the system process. This is just to demonstrate queuing IRPs and performing Asynchronous operations. The call to write network data can return to user mode without having to wait for the data to be sent or having to copy the data. The read is the same the IRPs are queued and when the data receive callback occurs those are completed. The source is fully commented.

Building the Source

First as always make sure that all makefiles point to the location of your DDK. The current makefiles assume the root of the same drive the source is on at \NTDDK\INC. The second is to make sure that your Visual Studio environment variables are setup using VCVARS32.BAT.

I created a new make file at the root of the "network" directory which you can then use to build all directories. The first command you can use is "nmake dir". This command will fail if any of the directory already exists. What it will do is pre-create all directories needed to build the source. Sometimes the source build will fail if the directories do not already exist.

C:\Programming\development\DEBUG\private\src\driver s\network>nmake dir Microsoft (R) Program Maintenance Utility Version 6.00.8168.0 Copyright (C) Microsoft Corp 1988-1998. All rights reserved. mkdir ..\..\..\..\bin

The second thing that you can do is "nmake" or "nmake all" to build the sources. It will go into each directory and build all 6 binaries in the correct order.

C:\Programming\development\DEBUG\private\src\driver s\network>nmake Microsoft (R) Program Maintenance Utility Version 6.00.8168.0 Copyright (C) Microsoft Corp 1988-1998. All rights reserved. cd chatclient nmake Microsoft (R) Program Maintenance Utility Version 6.00.8168.0 Copyright (C) Microsoft Corp 1988-1998. All rights reserved. cl /nologo /MD /W3 /Oxs /Gz /Zi /I "..\..\ ..\..\inc" /D "WIN32" /D "_W INDOWS" /Fr.\obj\i386\\ /Fo.\obj\i386\\ /Fd.\obj\i 386\\ /c .\client.c client.c link.exe /LIBPATH:..\..\..\..\lib /DEBUG / PDB:..\..\..\..\..\bin\SYMBOL S\chatclient.PDB /SUBSYSTEM:CONSOLE /nologo kernel 32.lib Advapi32.lib WS2_32. LIB /out:..\..\..\..\..\bin\chatclient.exe .\obj\i3 86\client.obj kernel32.lib A dvapi32.lib WS2_32.LIB rebase.exe -b 0x00400000 -x ..\..\..\..\..\ bin\SYMBOLS -a ..\..\..\..\.. \bin\chatclient REBASE: chatclient - unable to split symbols (2)

The last option you have is "nmake clean" which will then go into each directory and delete the object files. This will then cause that project to be rebuilt upon typing "nmake" or "nmake all". Of course you can type "nmake and "nmake clean" in any of the application directories as well however this is a convenient way to build all binaries at one time.

C:\Programming\development\DEBUG\private\src\driver s\network>nmake clean Microsoft (R) Program Maintenance Utility Version 6.00.8168.0 Copyright (C) Microsoft Corp 1988-1998. All rights reserved. cd chatclient nmake clean Microsoft (R) Program Maintenance Utility Version 6.00.8168.0

Copyright (C) Microsoft Corp 1988-1998. All rights reserved. Deleted file - C:\Programming\development\DEBUG\pri vate\src\drivers\network\chat client\obj\i386\client.obj Deleted file - C:\Programming\development\DEBUG\pri vate\src\drivers\network\chat client\obj\i386\client.sbr Deleted file - C:\Programming\development\DEBUG\pri vate\src\drivers\network\chat client\obj\i386\vc60.pdb

Chat Server

The chat server is a very simple implementation. It simply accepts connections and puts these connections into a list. Any time it receives data from any client it simply broadcasts this to all other clients.

Chat Clients

There are two chat clients but they both are essentially implemented the same. The only difference is that one talks to the Winsock API and the other uses our "Lighbulb " API. These clients simply print any incoming data and send any data that the user typed in. They are console applications so any time the user types in input, the incoming output is blocked until you are finished typing.

Chat Protocol

The chat protocol is extremely simple. The first packet sent will be the name of the client and used to identify him to all other clients. The rest are simply broadcast as strings. There is no packet header. So the server and clients all assume that each bit of chat text sent will be read in one receive! This is extremely prone for error and was just used as an example. To beef it up you may want to consider actually creating a protocol!

Bugs!

There are essentially three bugs that are known in the source code. Two of them are actually things just left out of the implementation and the other is just something I saw that I didn't feel like fixing. This is example code you are lucky it compiles! Have you ever seen books where they give code that you know would not compile! Well, here at least this is working in the most simplest of cases. The bugs are there for you to fix. I figure that I'll give some guidance and you can get better acquainted with the code by fixing these bugs. I did run some of the driver verifier tests on the source to make sure there were no bluntly obvious bugs but there has not been extensive testing. Then again this isn't a commercial software. There could be other bugs, if you find any see if you can fix them. If you need some help let me know.

Bug One: TDI Client Detect Disconnect

There is no implementation to detect when the client disconnects from the server. If the server is aborted while the client is connected it simply does not know and continues to attempt to send data. The return value from TDI_SEND is ignored and there are no other registered events to get notified of a disconnect. The implementation is simply not there. This is now your job. You must implement a method to detect when the connection has disconnected. There are a variety of implementations that could do this.

Bug Two: No Protocol

There is no protocol implemented between the clients and server. A protocol should be implemented that does not rely on receiving the entire packet ever read and be more flexible! Perhaps add even a simple file transfer!

Bug Three: Incorrect Display

There is a bug that involves two connected clients. This bug actually will occur using either client implementats, TDI or Sockets. The bug occurs when one client is about to type a message but it doesn't send it. The other client then sends 5 or so messages. The client that didn't send any message then sends his message. This message is corrupted, the name is overwritten with the data being sent. As a hint, you may want to investigate the data being sent and pay attention to the "\r\n" pairings.

Conclusion

This article implemented a simple chat program that used sockets and an implementation of a TDI Client. There was also a lot of information on how to handle IRPs along with links to other locations to further your education. IRPs are the backbone of driver development and they are key to understand how to write device drivers for Windows. Please remember that there are a lot of misinformation, missing information and bad examples out there so make sure that you visit a few different sites and attempt a few techniques so that you can distinguish what is correct and what is incorrect.

PART 6: INTRODUCTION TO DISPLAY DRIVERS

Introduction

It has been a while since I have updated this series and I have found some free time to write the next version. In this article, we will take a look at how to write a simple display driver. A display driver is a special type of driver which fits into a framework that is unlike what we have talked about so far in this series.

The example driver for this article will show how to write a basic display driver which does not have any hardware associated with it. Instead this display driver will implement graphics to memory and an application will be used to display those graphics. This method was demonstrated in an article I wrote for the C/C++ User's Journal however that article was about extending VMWare to support multiple monitors. This article will only be focusing on display drivers themselves and will not use VMWare but require just your local machine.

Display driver architecture

The first place to start is to show the display driver architecture as it is in Windows NT. I will make a comment here that Windows Vista introduces a new display driver model known as LDDM. This is essential in supporting the new Desktop Window Manager however Windows Vista still supports the old display driver model in conjunction with the old Window Manager. This article will not be covering LDDM.

The display driver model consists of two pieces, the miniport driver and the display driver. The miniport driver is loaded into system space and is responsible for enumerating devices and managing device resources. The display driver is loaded into session space and is responsible for implementing the actual GDI graphics calls. The driver is responsible for implementing these calls however it wants which can be done in software or deferred to the graphics card itself. The display driver has full control over how a line is drawn or how a transparency effect is implemented.

The following diagram shows the Windows display driver architecture:

The display miniport

The miniport driver is loaded into system space and is responsible for managing display device resources and enumerating devices. This driver however uses another driver as its framework which is VIDEOPRT.SYS. This driver exports APIs which your driver will link against and use. Surprised a driver can export APIs? Don't be. Drivers use the PE format and have export and import tables. You can export APIs from your driver and allow other drivers to link against them just like a DLL. In fact all the APIs you use you are just linking against the kernel and other drivers.

I will note there is a slight difference between linking against kernel and user mode drivers. If a driver links against a driver that is not currently loaded into memory, that driver will become loaded into memory however the DriverEntry for that driver will not be called. The DriverEntry itself is not called until the driver is directly loaded using ZwLoadDriver , loaded by the system or with the service API as we were shown previously. In any case you can export APIs from one driver and link against and use those APIs from another driver. There is no API to "GetProcAddress" in the kernel so you would need to write one.

In any case, VideoPrt.SYS exports APIs which your miniport driver will call. This driver does a few things one of which is to implement common code so that video driver writers do not need to rewrite the same code. This code includes video device enumeration between the WIN32 subsystem (WIN32K.SYS) and your miniport. The VideoPrt.SYS will also create the device objects for the display and when you call the initialization routine it will thunk your driver object's entry points to point to VideoPrt.SYS!

The VideoPrt.SYS APIs all start with "VideoPort" and the first one you call is "VideoPortInitialize ". If you notice the first two arguments are the ones passed into your DriverEntry routine however it simply calls them "Context1" and "Context2" as if your video miniport driver is "special". Don't be fooled, this driver entry is the same as what we worked with before and the first "Context1" is actually your driver object. Once you pass your driver object to VideoPortInitialize all your entry points to your driver are thunked to point to VideoPrt.Sys. Instead you pass in different function pointers in "VIDEO_HW_INITIALIZATION_DATA " which VideoPrt.SYS will call instead when it needs to.

This means that you do not need to directly deal with IRPs in a video miniport. The VideoPrt.SYS will instead handle them, break them down and then determine when you need to be informed about the data. Instead you do deal with what they call "VRP" or "Video Request Packet". This is essentially a mild, broken down version of the IRP in a different data structure. You simply need to return there is no special handling of this data structure as there is with IRPs.

The documentation specifies that you should only use the "VideoPort" APIs in a miniport however since this is also just a regular system level driver you can still link against any kernel API you wish and I have done this before. This is not the case with the display driver itself as we will see later.

Since we do not have any hardware our miniport driver will be pretty thin and easy. The following code shows how the video miniport DriverEntry is constructed:

/************************************************** ******************** * * DriverEntry * * This is the entry point for this video minipo rt driver * ************************************************** ********************/ ULONG DriverEntry(PVOID pContext1, PVOID pContext2) {

VIDEO_HW_INITIALIZATION_DATA hwInitData; VP_STATUS vpStatus; /* * The Video Miniport is "technically" restrict ed to calling * "Video*" APIs. * There is a driver that encapsulates this dri ver by setting your * driver's entry points to locations in itself . It will then * handle your IRP's for you and determine whic h of the entry * points (provided below) into your driver tha t should be called. * This driver however does run in the context of system memory * unlike the GDI component. */ VideoPortZeroMemory(&hwInitData, sizeof(VIDEO_HW_INI TIALIZATION_DATA)); hwInitData.HwInitDataSize = sizeof(VIDEO_HW_IN ITIALIZATION_DATA); hwInitData.HwFindAdapter = FakeGfxC ard_FindAdapter; hwInitData.HwInitialize = FakeGfxC ard_Initialize; hwInitData.HwStartIO = FakeGfxC ard_StartIO; hwInitData.HwResetHw = FakeGfxC ard_ResetHW; hwInitData.HwInterrupt = FakeGfxC ard_VidInterrupt; hwInitData.HwGetPowerState = FakeGfxC ard_GetPowerState; hwInitData.HwSetPowerState = FakeGfxC ard_SetPowerState; hwInitData.HwGetVideoChildDescriptor = FakeGfxCard_G etChildDescriptor; vpStatus = VideoPortInitialize(pContext1, pContext2, &h wInitData, NULL); return vpStatus; }

I mentioned before you simply pass the DriverObject directly through to the VideoPrt.SYS driver as shown above. You also fill in a data structure which contains entries into your driver which the VideoPrt.SYS driver will call to perform various actions. The "HwStartIO " is where you would handle IOCTLs and you can use IOCTLs between the display driver and the Video Miniport. The display driver would simply call "EngDeviceIoControl " and this IOCTL will be handled in the miniport's HwStartIO .

The following shows how I have implemented the video miniport functions:

/*#pragma alloc_text(PAGE, FakeGfxCard_ResetHW) Cannot be Paged*/ /*#pragma alloc_text(PAGE, FakeGfxCard_VidInterrupt ) Cannot be Paged*/ #pragma alloc_text(PAGE, FakeGfxCard_GetPowerState) #pragma alloc_text(PAGE, FakeGfxCard_SetPowerState) #pragma alloc_text(PAGE, FakeGfxCard_GetChildDescri ptor) #pragma alloc_text(PAGE, FakeGfxCard_FindAdapter) #pragma alloc_text(PAGE, FakeGfxCard_Initialize) #pragma alloc_text(PAGE, FakeGfxCard_StartIO) /************************************************** ******************** * * FakeGfxCard_ResetHW * * This routine would reset the hardware when a soft reboot is * performed. Returning FALSE from this routine would force * the HAL to perform an INT 10h and set Mode 3 (Text). * * We are not real hardware so we will just ret urn TRUE so the HAL * does nothing. * ************************************************** ********************/ BOOLEAN FakeGfxCard_ResetHW(PVOID HwDeviceExtension , ULONG Columns, ULO NG Rows)

{ return TRUE; } /************************************************** ******************** * * FakeGfxCard_VidInterrupt * * Checks if it's adapter generated an interrup t and dismisses it * or returns FALSE if it did not. * ************************************************** ********************/ BOOLEAN FakeGfxCard_VidInterrupt(PVOID HwDeviceExte nsion) { return FALSE; } /************************************************** ******************** * * FakeGfxCard_GetPowerState * * Queries if the device can support the re quested power state. * ************************************************** ********************/ VP_STATUS FakeGfxCard_GetPowerState(PVOID HwDeviceE xtension, ULONG HwId, PVIDEO_POWER_MANAGEMENT Video PowerControl) { return NO_ERROR; } /************************************************** ******************** * * FakeGfxCard_SetPowerState * * Sets the power state. * ************************************************** ********************/ VP_STATUS FakeGfxCard_SetPowerState(PVOID HwDeviceE xtension, ULONG HwId, PVIDEO_POWER_MANAGEMENT Video PowerControl) { return NO_ERROR; } /************************************************** ******************** * * FakeGfxCard_GetChildDescriptor * * Returns an identifer for any child device supported * by the miniport. * ************************************************** ********************/ ULONG FakeGfxCard_GetChildDescriptor (PVOID HwDevic eExtension, PVIDEO_CHILD_ENUM_INFO ChildEnumInfo, PVIDEO_ CHILD_TYPE pChildType, PVOID pChildDescriptor, PULONG pUId, PULONG p Unused) { return ERROR_NO_MORE_DEVICES; } /************************************************** ******************** * * FakeGfxCard_FindAdapter * * This function performs initialization spe cific to devices * maintained by this miniport driver. * ************************************************** ********************/

VP_STATUS FakeGfxCard_FindAdapter(PVOID HwDeviceExt ension, PVOID HwContext, PWSTR ArgumentString, PVIDEO_PORT_CONFIG_INFO ConfigInfo, PUC HAR Again) { return NO_ERROR; } /************************************************** ******************** * * FakeGfxCard_Initialize * * This initializes the device. * ************************************************** ********************/ BOOLEAN FakeGfxCard_Initialize(PVOID HwDeviceExtens ion) { return TRUE; } /************************************************** ******************** * * FakeGfxCard_StartIO * * This routine executes requests on behalf of the GDI Driver * and the system. The GDI driver is allowed t o issue IOCTLs * which would then be sent to this routine to be performed * on it's behalf. * * We can add our own proprietary IOCTLs here to be processed * from the GDI driver. * ************************************************** ********************/ BOOLEAN FakeGfxCard_StartIO(PVOID HwDeviceExtension , PVIDEO_REQUEST_PACKET RequestPacket ) { RequestPacket->StatusBlock->Status = 0; RequestPacket->StatusBlock->Information = 0; return TRUE; }

Since I don't have any hardware I simply implement enough of a miniport to make the system happy. The only possible API I would intend to use would be "StartIO" if I needed to access or perform an operation on the system that the display driver is not capable of doing with its limited API set. However in this implementation there is nothing we need done. Remember, the main purpose of the miniport is to enumerate hardware devices/resources and manage them. If you don't have any then that removes everything but the necessary to keep the driver model happy.

The display driver

The display driver links against WIN32K.SYS and is only allowed to call Eng* APIs. These APIs are actually found in the kernel and in user mode. Prior to NT4 the display drivers were in user mode. In any case the same API set used by display drivers is also used by printer drivers. Conforming to this API set also allows the display driver to be movable to user or kernel with minimal work.

The display driver however is not loaded into system memory but instead session space. Session space is the kernel equivalent of process isolation. In user mode processes have their own virtual memory address space and in the kernel sessions have their own virtual memory address space. System space is the kernel memory which is global to all sessions.

A session is an instance of a logged on user which contains its own Window Manager, Desktop(s), shell and applications. This is most notable in Windows XP "Fast User Switching" in which you can log multiple users onto a single machine. Each user is actually in a unique session with a unique range of kernel memory known as session space.

This can be a problem when designing a video driver. It means you cannot simply pass random memory down to your miniport if your miniport may process that memory outside the context of the current session. This is for example passing this memory to be processed in another thread which could reside in the system process for example.

If the system process is not associated with your session then you will be accessing a different memory range than you think. When this occurs you get the "A driver has not been correctly ported to Terminal Services" blue screen.

The display driver is not anything like the drivers we have worked with so far. It is still in PE format but it is not like the miniport which is a normal kernel driver linking against a different frame work. This driver cannot use kernel APIs by linking directly to them and should not use them for the exact reason specified above. If the API passes the memory outside of session space then you have a blue screen unless you ensure you only pass system memory. This is another reason to only use the Eng* API set however you could request a function pointer table from the miniport driver; nothing actually prevents you from doing so.

In any case the display driver behaves more like a DLL than normal drivers do and it is essentially treated as one. This driver's framework is tied to WIN32K.SYS which implements the Window Manager as well as GDI. This driver is compiled using "-entry:DrvEnableDriver@12 /SUBSYSTEM:NATIVE " where DrvEnableDriver is the entry point for the display driver.

DrvEnableDriver

This is the initial entry point for a display driver and it is not related to DriverEntry in any way. This API passes in a DRVENABLEDATA structure which is to be filled in with a table of functions which are the entries to the driver. The table contains a list which is an index value followed by the function pointer. The index value specifies the function type such as "INDEX_DrvCompletePDEV " which specifies that the function pointer is a pointer to the DrvCompletePDEV handler in the driver. Some APIs are optional and some are required.

This entry point is simply responsible for returning the list of your functions. You may also do any initialization you may need to do here. The following is the code from the sample display driver in this article:

/* * Display Drivers provide a list of function entry points for specific GDI * tasks. These are identified by providing a pre-d efined "INDEX" value (pre- * defined * by microsoft) followed by the function entry poi nt. There are levels of * flexibility * on which ones you are REQUIRED and which ones ar e technically OPTIONAL. * */ DRVFN g_DrvFunctions[] = { { INDEX_DrvAssertMode, (PFN) GdiExamp le_DrvAssertMode }, { INDEX_DrvCompletePDEV, (PFN) GdiExamp le_DrvCompletePDEV }, { INDEX_DrvCreateDeviceBitmap, (PFN) GdiExamp le_DrvCreateDeviceBitmap },

{ INDEX_DrvDeleteDeviceBitmap, (PFN) GdiExamp le_DrvDeleteDeviceBitmap }, { INDEX_DrvDestroyFont, (PFN) GdiExamp le_DrvDestroyFont }, { INDEX_DrvDisablePDEV, (PFN) GdiExamp le_DrvDisablePDEV }, { INDEX_DrvDisableDriver, (PFN) GdiExamp le_DrvDisableDriver }, { INDEX_DrvDisableSurface, (PFN) GdiExamp le_DrvDisableSurface }, { INDEX_DrvSaveScreenBits, (PFN) GdiExamp le_DrvSaveScreenBits }, { INDEX_DrvEnablePDEV, (PFN) GdiExamp le_DrvEnablePDEV }, { INDEX_DrvEnableSurface, (PFN) GdiExamp le_DrvEnableSurface }, { INDEX_DrvEscape, (PFN) GdiExamp le_DrvEscape }, { INDEX_DrvGetModes, (PFN) GdiExamp le_DrvGetModes }, { INDEX_DrvMovePointer, (PFN) GdiExamp le_DrvMovePointer }, { INDEX_DrvNotify, (PFN) GdiExamp le_DrvNotify }, // { INDEX_DrvRealizeBrush, (PFN) GdiExample _DrvRealizeBrush }, { INDEX_DrvResetPDEV, (PFN) GdiExamp le_DrvResetPDEV }, { INDEX_DrvSetPalette, (PFN) GdiExamp le_DrvSetPalette }, { INDEX_DrvSetPointerShape, (PFN) GdiExamp le_DrvSetPointerShape }, { INDEX_DrvStretchBlt, (PFN) GdiExamp le_DrvStretchBlt }, { INDEX_DrvSynchronizeSurface, (PFN) GdiExamp le_DrvSynchronizeSurface }, { INDEX_DrvAlphaBlend, (PFN) GdiExamp le_DrvAlphaBlend }, { INDEX_DrvBitBlt, (PFN) GdiExamp le_DrvBitBlt }, { INDEX_DrvCopyBits, (PFN) GdiExamp le_DrvCopyBits }, { INDEX_DrvFillPath, (PFN) GdiExamp le_DrvFillPath }, { INDEX_DrvGradientFill, (PFN) GdiExamp le_DrvGradientFill }, { INDEX_DrvLineTo, (PFN) GdiExamp le_DrvLineTo }, { INDEX_DrvStrokePath, (PFN) GdiExamp le_DrvStrokePath }, { INDEX_DrvTextOut, (PFN) GdiExamp le_DrvTextOut }, { INDEX_DrvTransparentBlt, (PFN) GdiExamp le_DrvTransparentBlt }, }; ULONG g_ulNumberOfFunctions = sizeof(g_DrvFunctions ) / sizeof(DRVFN); /************************************************** ******************* * DrvEnableDriver * * This is the initial driver entry point. This i s the "DriverEntry" * equivlent for Display and Printer drivers. Thi s function must * return a function table that represents all th e supported entry * points into this driver. * ************************************************** *******************/ BOOL DrvEnableDriver(ULONG ulEngineVersion, ULONG ulDataSize, DRVENABLEDATA *pDrvEnableDat a) { BOOL bDriverEnabled = FALSE; /* * We only want to support versions > NT 4 * */ if(HIWORD(ulEngineVersion) >= 0x3 && ulDataSize >= sizeof(DRVENABLEDATA)) { pDrvEnableData->iDriverVersion = DDI_DRIVER_ VERSION; pDrvEnableData->pdrvfn = g_DrvFuncti ons; pDrvEnableData->c = g_ulNumberO fFunctions; bDriverEnabled = TRUE; } return bDriverEnabled; }

DrvDisableDriver

This function handler is called when the display driver is being unloaded. In this handler you can perform any clean up necessary for what you have created in the DrvEnableDriver call. The following code is from the sample driver:

/************************************************** ******************* * GdiExample_DrvDisableDriver * * This function is used to notify the driver whe n the driver is * getting ready to be unloaded. * ************************************************** *******************/ VOID GdiExample_DrvDisableDriver(VOID) { /* * No Clean up To Do */ }

DrvGetModes

The API called after the driver is loaded and enabled is DrvGetModes . This API is used to query the modes supported by the device. These modes are used to populate the "Settings" tab in the "Display Properties" dialog. The modes can be cached so the operating system does not think of them as being dynamic and changing. The operating system believes this to be a static list and while there are times and ways that this API may be called more than once for the most part it should not be considered dynamic.

The API is generally called twice the first time it simply asks for the size required to store the modes and the second time it calls with the correct size. The following code fragment is from the sample driver which only supports 640x480x32:

/************************************************** ******************* * GdiExample_DrvGetModes * * This API is used to enumerate display modes. * * This driver only supports 640x480x32 * ************************************************** *******************/ ULONG GdiExample_DrvGetModes(HANDLE hDriver, ULONG cjSize, DEVMOD EW *pdm) { ULONG ulBytesWritten = 0, ulBytesNeeded = sizeof (DEVMODEW); ULONG ulReturnValue; ENGDEBUGPRINT(0, "GdiExample_DrvGetModes\r\n", N ULL); if(pdm == NULL) { ulReturnValue = ulBytesNeeded; } else { ulBytesWritten = sizeof(DEVMODEW); memset(pdm, 0, sizeof(DEVMODEW)); memcpy(pdm->dmDeviceName, DLL_NAME, sizeof(D LL_NAME)); pdm->dmSpecVersion = DM_SPECVERSION; pdm->dmDriverVersion = DM_SPECVERSION; pdm->dmDriverExtra = 0; pdm->dmSize = sizeof(DEVMODEW);

pdm->dmBitsPerPel = 32; pdm->dmPelsWidth = 640; pdm->dmPelsHeight = 480; pdm->dmDisplayFrequency = 75; pdm->dmDisplayFlags = 0; pdm->dmPanningWidth = pdm->dmPelsWidth; pdm->dmPanningHeight = pdm->dmPelsHeight; pdm->dmFields = DM_BITSPERPEL | DM _PELSWIDTH | DM_PELSHEIGHT | DM _DISPLAYFLAGS | DM_DISPLAYFREQUENC Y; ulReturnValue = ulBytesWritten; } return ulReturnValue; }

DrvEnablePDEV

Once a mode is chosen this API is then called which will allow the driver to enable the "physical device". The purpose of this API is to allow the display driver to create its own private context which will be passed into the other display entry points. The reason for this private context is that a single display driver may handle multiple display devices and as such would need to distinguish one display device from another. The return value for this API is a pointer to the context or instance of the supplied display device.

The selected display setting is passed into this API via the DEVMODE parameter however the sample driver does not use this method since it's hard coded to setup 800x600x32 mode only.

This API aside from creating an instance structure must also initialize the GDIINFO and DEVINFO data structures at a minimum. These parameters are important as if you fill in supporting a certain feature and you really do not you can have graphic corruption as a side effect or even blue screen. The next two parameters that I will mention are the hDev and hDriver parameters. The hDriver parameter is actually the DEVICE_OBJECT for the display driver and can be used with APIs such as EngDeviceIoControl to communicate with the miniport driver.

The hDev is the handle to GDI however since the device is in the process of being created it is actually useless. It is recommended that you wait until the DrvCompletePDEV call before saving and using this handle. The following code is from the sample driver's DrvEnablePDEV :

/************************************************** ******************* * GdiExample_DrvEnablePDEV * * This function will provide a description of th e Physical Device. * The data returned is a user defined data conte xt to be used as a * handle for this display device. * * The hDriver is a handle to the miniport driver associated with * this display device. This handle can be used t o communicate to * the miniport through APIs to send things like IOCTLs. * ************************************************** *******************/ DHPDEV GdiExample_DrvEnablePDEV(DEVMODEW *pdm, PWST R pwszLogAddr, ULONG cPat, HSURF *phsurfPatterns, ULONG cjC aps, GDIINFO *pGdiInfo, ULONG cjDevInfo, DEVINFO *pDevInfo, HDEV hdev, PWSTR pwszDeviceName, HANDLE hDri ver) { PDEVICE_DATA pDeviceData = NULL;

ENGDEBUGPRINT(0, "GdiExample_DrvEnablePDEV Ente r \r\n", NULL); pDeviceData = (PDEVICE_DATA) EngAllocMem(0, sizeof(DEVICE_DATA), FAKE_GFX_TAG); if(pDeviceData) { memset(pDeviceData, 0, sizeof(DEVICE_DATA)) ; memset(pGdiInfo, 0, cjCaps); memset(pDevInfo, 0, cjDevInfo); { pGdiInfo->ulVersion = 0x5000; pGdiInfo->ulTechnology = DT_RASDISPLAY; pGdiInfo->ulHorzSize = 0; pGdiInfo->ulVertSize = 0; pGdiInfo->ulHorzRes = RESOLUTION _X; pGdiInfo->ulVertRes = RESOLUTION _Y; pGdiInfo->ulPanningHorzRes = 0; pGdiInfo->ulPanningVertRes = 0; pGdiInfo->cBitsPixel = 8; pGdiInfo->cPlanes = 4; pGdiInfo->ulNumColors = 20; pGdiInfo->ulVRefresh = 1; pGdiInfo->ulBltAlignment = 1; pGdiInfo->ulLogPixelsX = 96; pGdiInfo->ulLogPixelsY = 96; pGdiInfo->flTextCaps = TC_RA_ABLE; pGdiInfo->flRaster = 0; pGdiInfo->ulDACRed = 8; pGdiInfo->ulDACGreen = 8; pGdiInfo->ulDACBlue = 8; pGdiInfo->ulAspectX = 0x24; pGdiInfo->ulNumPalReg = 256; pGdiInfo->ulAspectY = 0x24; pGdiInfo->ulAspectXY = 0x33; pGdiInfo->xStyleStep = 1; pGdiInfo->yStyleStep = 1; pGdiInfo->denStyleStep = 3; pGdiInfo->ptlPhysOffset.x = 0; pGdiInfo->ptlPhysOffset.y = 0; pGdiInfo->szlPhysSize.cx = 0; pGdiInfo->szlPhysSize.cy = 0; pGdiInfo->ciDevice.Red.x = 6700; pGdiInfo->ciDevice.Red.y = 3300; pGdiInfo->ciDevice.Red.Y = 0; pGdiInfo->ciDevice.Green.x = 2100; pGdiInfo->ciDevice.Green.y = 7100; pGdiInfo->ciDevice.Green.Y = 0; pGdiInfo->ciDevice.Blue.x = 1400; pGdiInfo->ciDevice.Blue.y = 800; pGdiInfo->ciDevice.Blue.Y = 0; pGdiInfo->ciDevice.AlignmentWhite.x = 3 127; pGdiInfo->ciDevice.AlignmentWhite.y = 3 290; pGdiInfo->ciDevice.AlignmentWhite.Y = 0 ; pGdiInfo->ciDevice.RedGamma = 20000; pGdiInfo->ciDevice.GreenGamma = 20000; pGdiInfo->ciDevice.BlueGamma = 20000; pGdiInfo->ciDevice.Cyan.x = 1750; pGdiInfo->ciDevice.Cyan.y = 3950; pGdiInfo->ciDevice.Cyan.Y = 0; pGdiInfo->ciDevice.Magenta.x = 4050; pGdiInfo->ciDevice.Magenta.y = 2050;

pGdiInfo->ciDevice.Magenta.Y = 0; pGdiInfo->ciDevice.Yellow.x = 4400; pGdiInfo->ciDevice.Yellow.y = 5200; pGdiInfo->ciDevice.Yellow.Y = 0; pGdiInfo->ciDevice.MagentaInCyanDye = 0 ; pGdiInfo->ciDevice.YellowInCyanDye = 0; pGdiInfo->ciDevice.CyanInMagentaDye = 0 ; pGdiInfo->ciDevice.YellowInMagentaDye = 0; pGdiInfo->ciDevice.CyanInYellowDye = 0; pGdiInfo->ciDevice.MagentaInYellowDye = 0; pGdiInfo->ulDevicePelsDPI = 0; pGdiInfo->ulPrimaryOrder = PRIMARY_ORDE R_CBA; pGdiInfo->ulHTPatternSize = HT_PATSIZE_ 4x4_M; pGdiInfo->flHTFlags = HT_FLAG_ADDITIVE_ PRIMS; pGdiInfo->ulHTOutputFormat = HT_FORMAT_ 32BPP; *pDevInfo = gDevInfoFrameBuffer; pDevInfo->iDitherFormat = BMF_32BPP; } pDeviceData->pVideoMemory = EngMapFile(L"\\ ??\\c:\\video.dat", RESOLUTION_X*RESOLUTION_Y*4, &pDevice Data->pMappedFile); pDeviceData->hDriver = hDriver; pDevInfo->hpalDefault = EngCreatePalette(PA L_BITFIELDS, 0, NULL, 0xFF0000, 0xFF00, 0xFF); } ENGDEBUGPRINT(0, "GdiExample_DrvEnablePDEV Exit \r\n", NULL); return (DHPDEV)pDeviceData; }

DrvCompletePDEV

This call is made after the enable to notify the display driver that the device object is now completed. The only parameters are the private data structure created in the enable call and the completed handle to the GDI device. Unless you have more initialization to do you generally can just save the GDI handle and move on. The following is the code from the sample driver:

/************************************************** ******************* * GdiExample_DrvCompletePDEV * * This is called to complete the process of enab ling the device. * * ************************************************** *******************/ void GdiExample_DrvCompletePDEV(DHPDEV dhpdev, HDE V hdev) { PDEVICE_DATA pDeviceData = (PDEVICE_DATA)dhpdev ; ENGDEBUGPRINT(0, "GdiExample_DrvCompletePDEV En ter \r\n", NULL); pDeviceData->hdev = hdev; ENGDEBUGPRINT(0, "GdiExample_DrvCompletePDEV Ex it \r\n", NULL); }

DrvDisablePDEV

This API is called when the PDEV is no longer needed and will be destroyed. This is called after DrvDisableSurface if there is a surface enabled. Our implementation of this API is very simple and will just perform some clean up of what was created during the creation of the private PDEV structure:

/************************************************** *******************

* GdiExample_DrvDisablePDEV * * This is called to disable the PDEV we created. * * ************************************************** *******************/ void GdiExample_DrvDisablePDEV(DHPDEV dhpdev) { PDEVICE_DATA pDeviceData = (PDEVICE_DATA)dhpdev ; UINT dwBytesReturned = 0; ENGDEBUGPRINT(0, "GdiExample_DrvDisablePDEV\r\n ", NULL); if(pDeviceData->pMappedFile) { EngUnmapFile(pDeviceData->pMappedFile); } EngFreeMem(dhpdev); }

DrvEnableSurface

This API is called after the PDEV has completed to ask the display driver to create a surface. Also as noted in the comments below you have two choices when creating a surface. You can create a surface in which the display driver will manage it or you can create one in which GDI will manage for you. The following code chose the option of managing its own device surface.

The entire purpose is to define a drawing surface in which GDI will also be able to draw onto. Display drivers have their own device surfaces and thus will generally want to manage its surface. In doing this it must describe the surface in a way which GDI can understand and be able to draw on it. This means defining the start address and even the pitch as display drivers do not generally have linear buffers for all modes. In our case we use the memory mapped file we created to be our video memory:

/************************************************** ******************* * GdiExample_DrvEnableSurface * * This API is used to enable the physical device surface. * * You have two choices here. * * 1. Driver Manages it's own surface * EngCreateDeviceSurface - Create the han dle * EngModifySurface - Let GDI Know about t he object. * * 2. GDI Manages the surface * EngCreateBitmap - Create a handle in a format that * GDI Understands * EngAssociateSurface - Let GDI Know abou t the object. * * ************************************************** *******************/ HSURF GdiExample_DrvEnableSurface(DHPDEV dhpdev) { HSURF hsurf; SIZEL sizl; PDEVICE_DATA pDeviceData = (PDEVICE_DATA)dhpdev ; ENGDEBUGPRINT(0, "GdiExample_DrvEnableSurface\r \n", NULL); pDeviceData->pDeviceSurface = (PDEVICE_SURFACE)EngAllocMem(FL_ZERO_MEMORY,

sizeof(DEVICE_SURFACE), FAKE_GFX_TAG); sizl.cx = 800; sizl.cy = 600; hsurf = (HSURF)EngCreateDeviceSurface( (DHSURF)pDeviceData->pDeviceSurface, si zl, BMF_32BPP); EngModifySurface(hsurf, pDeviceData->hdev, HOOK_FILLPATH | HOOK_STROKEPATH | HOOK_L INETO | HOOK_TEXTOUT | HOOK_BITBLT | HOOK_COPYBI TS, MS_NOTSYSTEMMEMORY, (DHSURF)pDeviceData- >pDeviceSurface, pDeviceData->pVideoMemory, 800*4, NULL); return(hsurf); }

DrvDisableSurface

This API is called to destroy the drawing surface created in the DrvEnableSurface call. This is called before destroying the PDEV. The following is the code from the example program:

/************************************************** ******************* * GdiExample_DrvDisableSurface * * This API is called to disable the GDI Surface. * * ************************************************** *******************/ void GdiExample_DrvDisableSurface(DHPDEV dhpdev) { PDEVICE_DATA pDeviceData = (PDEVICE_DATA)dhpdev ; ENGDEBUGPRINT(0, "GdiExample_DrvDisableSurface\ r\n", NULL); EngDeleteSurface(pDeviceData->hsurf); pDeviceData->hsurf = NULL; EngFreeMem(pDeviceData->pDeviceSurface); pDeviceData->pDeviceSurface = NULL; }

Sequencing

So, let's go through this one more time for clarity.

• DrvEnableDriver : The driver is loaded.

• DrvGetModes : Get the buffer size to hold all supported display modes.

• DrvGetModes : Get the display modes.

• DrvEnablePDEV : Inform the display driver to initialize to a mode selected in the DEVMODE data structure

and return an instance handle.

• DrvCompletePDEV : Inform the driver that the device initialization is complete.

• DrvEnableSurface : Get the driver to supply a drawing surface.

<GDI Calls>

• DrvDisableSurface : Destroy the drawing surface.

• DrvDisablePDEV : Destroy the instance structure.

• DrvDisableDriver : Unload the display driver.

So how does the drawing work?

The "GDI Calls" are essentially handling things like "BitBlt " in your display driver which is actually in DrvBitBlt . You may notice that with our driver it doesn't implement any graphical commands itself. This is because we do not have hardware to accelerate drawing features and I decided that it's a lot less work to just call the routines provided to you by Windows that already implement these features in software. As in the example, DrvBitBlt can simply be diverted to EngBitBlt . These will simply render directly to our video buffer which in our case is a memory mapped file.

You may be wondering "how do I get to my PDEV or my surface object from these Drv* calls". Well, the SURFOBJ passed into these APIs does contain a pointer to the surface object. These are found at the dhsurf and dhpdev members of the SURFOBJ structure. The dhsurf member is the handle the device created provided the SURFOBJ represents a device managed surface. This can be determined by checking the STYPE_DEVICE flag set on the SURFOBJ.

Display driver escape codes

In my tutorials on device drivers we learned that it is possible to use "DeviceIoControl " from user mode to implement and communicate our own commands between the application and the driver. This is also possible with display drivers however it is a little different and instead of being called "IOCTLs" they are called "Escape Codes".

In user mode you can send "Escape Codes" to the display driver using one of two methods. The first is ExtEscape which simply sends the data you provide to the driver. Your display driver would then handle this in its DrvEscape routine.

The second method is DrawEscape which can be handled in DrvDrawEscape in your driver. The difference is that DrawEscape allows you to provide a Window DC with your data and the clipping for that window will be provided to your driver. This allows you to easily implement extended drawing commands which can behave correctly in the windowing environment as your driver will be informed of the proper clipping area.

OpenGL support

OpenGL support is done through the use of an "ICD" or "Installable Client Driver". This is a concept originally created by SGI to help improve the performance of OpenGL on Windows by letting the vendor implement the graphics pipeline completely. When OpenGL32.DLL gets loaded it simply asks the video driver for it's ICD and if there is one it's loaded into the process space and OpenGL APIs are serviced by the ICD. The ICD is in full control of the graphics pipeline and thus each vendor and driver version may have a different implementation.

The usual case is to buffer the OpenGL commands and flush them to the card using the ExtEscape API. The ICD kit is now maintained by Microsoft and it is not free if you wish to develop for it.

The other method of supporting OpenGL is through something called a "Mini Client Driver" or "MCD". This is Microsoft's original method for OpenGL support and is similar to an ICD but the MCD lives in the kernel. This method is not used by any driver vendor that I know of and is very slow which is the reason for the ICD implementation.

DirectX support

In XPDM, Direct Draw support is done in the GDI driver. This is through the DrvEnableDirectDraw interface. The user mode portion and some of the kernel for the DirectX graphics pipeline is implemented by Microsoft supplied system components. The API will simply return back a list of callback interfaces the DirectDraw layer in the kernel will use to perform specific actions in the hardware.

Direct3D is initialized through the DrvGetDirectDrawInfo in which the GDI driver will claim to support Direct3D. The supplied callbacks will be called several times to get the appropriate interfaces into the driver which implement the various features of Direct3D. This is described on MSDN.

What is a mirror driver?

A mirror driver is a not well documented feature in which you can load a video driver that will "mirror" another display driver. That is they will receive the same calls as the display driver they are mirroring. A mirror driver is documented to not support DrvGetModes however if you do implement it the returned modes will be cached and you cannot dynamically change the modes. Although I have heard that implementing DrvGetModes can help with loading and unloading the display driver on mode switches I was unable to get this to work.

To load a mirror driver the registry key for this device needs to set the "Attach.ToDesktop" value to 1 and then you call ChangeDisplaySettingsEx with "CDS_UPDATEREGISTRY" on the mirror driver. You then set the mode you wish to switch to and call ChangeDisplaySettingsEx again on the mirror driver.

The mirror driver does not properly unload at mode switch and generally if there are references to a drawing surface the driver will not unload. So, in my experience to get a mirror driver to mode switch you need an application that will detect WM_DISPLAYCHANGE messages. You also need to set "Attach.ToDesktop " to 0 after you load the display driver. This will help unload the display driver and on WM_DISPLAYCHANGE you can then go through the procedure to unload the mirror driver.

If you wish to immediately unload the mirror driver without a display change you simply need to follow the same steps as what loaded it. Set "Attach.ToDesktop " to 0 and then perform the "CDS_UPDATEREGISTRY". You can then call "ChangeDisplaySettingsEx " again with no parameters to force unloading. Although this seems to work again everything is done by referencing the display surface so if there are outstanding references to the display surface the driver will not be unloaded. The mirror driver sample in the DDK does not do all of this and has some missing pieces such as not implementing the WM_DISPLAYCHANGE and not resetting the "Attach.ToDesktop " value after loading the mirror driver.

The example

The example driver in this article simply shares a memory mapped file between an application and the display driver. The display driver will write graphics commands to the memory mapped file and the application simply acts as a monitor and will just refresh itself ~70 times a second. This is not efficient but it is just an example. The display driver is installed as a regular hardware driver and is seen just as an ATI or NVIDIA driver would be.

To install the example you will simply need to use the "Add New Hardware" wizard in the control panel. You must select "Hardware is already installed" and "Manually select hardware from a list". The following picture shows the list of devices for which you scroll down to the bottom and select "Add a new hardware device":

Then you simply want to select "Have Disk" and find the .INF file that is provided with this project. You will then need to scroll down this new list and find "Toby Opferman Sample Video Display" as shown in the following picture:

You will see the following dialog when installing just select "Continue Anyway" unless you do not want to install the driver. The next thing you do is just enable the second monitor using the display settings and the

third tab. Run the application monitor program provided with this article and you will be shown the second monitor in that application window:

Homework

Reading and seeing is a good way to learn however I believe you learn more if you actually try and do something! What I want you to do is take my example and add more display modes! This will require changes to the application and you can either attempt to make the application detect these display changes through various methods including WM_DISPLAYCHANGE or simply require the user to restart the application and prompt or enumerate devices to get the new display settings and adjust the window appropriately.

Here is a little hint. When a new mode is selected you do not always get a DrvDisableSurface , DrvDisablePDEV , then a DrvEnablePDEV on the new setting. You may instead get a DrvAssertMode . This is called to switch one PDEV to another as this API passes in a BOOL to inform the driver to enable or disable the supplied PDEV.

Conclusion

This article showed how to create a very basic display driver to handle GDI commands. The display driver architecture mentioned in the article only covered XPDM and not the new LDDM as found in Windows Vista. This is also essentially the extreme basics of "where to get started". Even so hopefully you have learned a little something about display drivers and the Windows operating system.

Core Topics in Windows Driver Development

Documents

Transcript of Core Topics in Windows Driver Development