Post on 24-Dec-2015
DirectXDirectX®® And Streaming And Streaming Video DriversVideo Drivers
Jeff Noyle, Jeff Noyle, Development LeadDevelopment Lead
Gary Sullivan, Gary Sullivan, Software Design EngineerSoftware Design Engineer
William Messmer, William Messmer, Software Design EngineerSoftware Design Engineer
Eric Rudolph,Eric Rudolph, Software Design Engineer Software Design Engineer
Microsoft CorporationMicrosoft Corporation
SpeakersSpeakers ““DirectX Graphics Drivers,” Jeff Noyle, DirectX Graphics Drivers,” Jeff Noyle,
Lead Developer, DirectDrawLead Developer, DirectDraw®®/Direct3D/Direct3D®®, , Microsoft CorporationMicrosoft Corporation
““DirectX VA Video Acceleration Drivers,” DirectX VA Video Acceleration Drivers,” Gary Sullivan, Software Design Engineer, Gary Sullivan, Software Design Engineer, DMD Video Services Group, DMD Video Services Group, Microsoft CorporationMicrosoft Corporation
““Writing AVStream Minidrivers for Writing AVStream Minidrivers for WindowsWindows®® XP,” William Messmer, XP,” William Messmer, Software Design Engineer, Software Design Engineer, Digital Audio-Video, Microsoft CorporationDigital Audio-Video, Microsoft Corporation
““Testing Your WDM Driver with DirectShowTesting Your WDM Driver with DirectShow®®,” ,” Eric Rudolph, SDE, DirectShow Editing Eric Rudolph, SDE, DirectShow Editing Services, MicrosoftServices, Microsoft
DirectX Graphics DriversDirectX Graphics Drivers
Jeff NoyleJeff NoyleDevelopment LeadDevelopment LeadDirectDraw/Direct3DDirectDraw/Direct3DMicrosoft CorporationMicrosoft Corporation
PrerequisitesPrerequisites
I’m assumingI’m assuming Basic familiarity with DirectDraw and Basic familiarity with DirectDraw and
Direct3D concepts:Direct3D concepts: System ArchitectureSystem Architecture SurfacesSurfaces Page flippingPage flipping
The DDK can be hard to readThe DDK can be hard to read
AgendaAgenda
Single-source issuesSingle-source issues Windows 9x issuesWindows 9x issues OS-independent issuesOS-independent issues DirectX 7.0 implementation detailsDirectX 7.0 implementation details Changes in DirectX 8.0Changes in DirectX 8.0 What can you do next?What can you do next?
Single-Source IssuesSingle-Source IssuesStuff you should know if you want Stuff you should know if you want one code-base to support one code-base to support Windows 9x OS versions and Windows 9x OS versions and Windows NTWindows NT®® OS versions OS versions
Allocating System Memory Allocating System Memory Per-SurfacePer-Surface (Do NOT use this process to allocate (Do NOT use this process to allocate
surface memory itself...See later)surface memory itself...See later) Normally system memory is charged Normally system memory is charged
against a particular processagainst a particular process Can’t free it in some other process Can’t free it in some other process
(as in ctrl-alt-del mechanism)(as in ctrl-alt-del mechanism) Use EngAllocPrivateUserMem and Use EngAllocPrivateUserMem and
EngFreePrivateUserMemEngFreePrivateUserMem Uses DirectDraw object to locate Uses DirectDraw object to locate
proper process contextproper process context
YUV/FOURCC SurfacesYUV/FOURCC Surfaces
System memory YUV/FOURCC System memory YUV/FOURCC surfaces on NT systemssurfaces on NT systems
DirectDraw Kernel-mode “pretends” DirectDraw Kernel-mode “pretends” that these surfaces are 8bpp RGB for the that these surfaces are 8bpp RGB for the purposes of allocating memorypurposes of allocating memory
DXTn:DXTn: Height: height in 4x4 blocksHeight: height in 4x4 blocks Width: width in blocks * sizeof(block)Width: width in blocks * sizeof(block) You must undo these transformations at You must undo these transformations at
CreateSurface timeCreateSurface time
YUV/FOURCC SurfacesYUV/FOURCC Surfaces
NT kernel mode doesn’t understand NT kernel mode doesn’t understand any FOURCC formats, so:any FOURCC formats, so: The driver must handle video memory The driver must handle video memory
allocation for these typesallocation for these types The driver must handle Lock forThe driver must handle Lock for
these typesthese types
Windows 2000 Issue Windows 2000 Issue (Fixed In Windows XP)(Fixed In Windows XP) During allocation of an AGP surface...During allocation of an AGP surface... If the driver fails to allocate and:If the driver fails to allocate and:
returns DDHAL_DRIVER_HANDLEDreturns DDHAL_DRIVER_HANDLED AND sets an error code in ddRValAND sets an error code in ddRVal AND sets the surface’s lpVidMemHeapAND sets the surface’s lpVidMemHeap
to non-zeroto non-zero
Then the system will ignore the error Then the system will ignore the error So NULL the lpVidMemHeap on error!So NULL the lpVidMemHeap on error!
Atomic Surface CreationAtomic Surface Creation
On Windows 9x, drivers are givenOn Windows 9x, drivers are givena list of surfacesa list of surfaces
On Windows NT, drivers are given On Windows NT, drivers are given surfaces one-at-a-time, unless:surfaces one-at-a-time, unless: Driver reports GUID_NTPrivateDriverCapsDriver reports GUID_NTPrivateDriverCaps and sets DDHAL_PRIVATECAP_ and sets DDHAL_PRIVATECAP_
ATOMICSURFACECREATIONATOMICSURFACECREATION
Windows NT ExtraWindows NT Extra
You can use the You can use the GUID_NTPrivateDriverCaps to request GUID_NTPrivateDriverCaps to request notification of primary surface:notification of primary surface: Set DDHAL_PRIVATECAP_ Set DDHAL_PRIVATECAP_
NOTIFYPRIMARYCREATIONNOTIFYPRIMARYCREATION
Windows 9x IssuesWindows 9x Issues
System-To-Video BltsSystem-To-Video Blts
To speed up some titles, implement To speed up some titles, implement system-to-video blts system-to-video blts
All you need to implement is SRCCOPY, All you need to implement is SRCCOPY, no stretchno stretch But you should implement sub-rectsBut you should implement sub-rects
DirectDraw assumes your driver DirectDraw assumes your driver requires system memory to be requires system memory to be pagelocked during Bltpagelocked during Blt If this is not true, set If this is not true, set
DDCAPS2_NOPAGELOCKREQUIREDDDCAPS2_NOPAGELOCKREQUIRED
OS-Independent IssuesOS-Independent Issues
HeapVidmemAllocAlignedHeapVidmemAllocAligned
It’s an “Eng” function in It’s an “Eng” function in Windows NT versionsWindows NT versions
It’s a ddraw.dll export in Windows 9xIt’s a ddraw.dll export in Windows 9x You can use this to allocateYou can use this to allocate
surface memorysurface memory You must have passed the heap to You must have passed the heap to
DirectDraw previouslyDirectDraw previously You must fill in the fpHeapOffset, You must fill in the fpHeapOffset,
fpVidmem and lpVidmemHeapfpVidmem and lpVidmemHeapof the surfaceof the surface
Heap Offsets ExplainedHeap Offsets Explained
HeapHeap
fpStartfpStart
fpEndfpEnd(points TO(points TOlast byte)last byte)
““0”0”
SurfaceSurface
Return value fromReturn value fromHVMAA and fpHeapOffsetHVMAA and fpHeapOffset
Return values from Return values from HeapVidmemAllocAlignedHeapVidmemAllocAlignedare these offsets:are these offsets:
(Note fpStart is set to 0x1000(Note fpStart is set to 0x1000by DirectDraw for AGP heaps)by DirectDraw for AGP heaps)
DDSCAPS_VIDEOMEMORYDDSCAPS_VIDEOMEMORY
Remember that this includes AGP Remember that this includes AGP unless combined with unless combined with DDSCAPS_LOCALVIDMEMDDSCAPS_LOCALVIDMEM
At GetAvailDriverMem time,At GetAvailDriverMem time,a request that specifies a request that specifies DDSCAPS_VIDEOMEMORY (and not DDSCAPS_VIDEOMEMORY (and not any explicit type: local or non-local) any explicit type: local or non-local) should include both types in the totalshould include both types in the total
GetScanLineGetScanLine
Implement this, if you can!Implement this, if you can! DirectX 8.0 uses it a lot for DirectX 8.0 uses it a lot for
presentation-Blt timingpresentation-Blt timing Set DDCAPS_READSCANLINE, Set DDCAPS_READSCANLINE,
so DirectX 8.0 knowsso DirectX 8.0 knows
CreateSurfaceExCreateSurfaceEx
More on this laterMore on this later NEVER fail CreateSurfaceEx for system NEVER fail CreateSurfaceEx for system
memory surfaces, even if you don’t memory surfaces, even if you don’t understand the pixel formatunderstand the pixel format Just return DDHAL_DRIVER_HANDLED Just return DDHAL_DRIVER_HANDLED
and DD_OKand DD_OK (Otherwise new system-memory formats (Otherwise new system-memory formats
used by the reference rasterizerused by the reference rasterizercan’t be created)can’t be created)
Alpha-In-The-PrimaryAlpha-In-The-Primary
If your driver can do this in 32bpp:If your driver can do this in 32bpp: Create an A8R8G8B8 render targetCreate an A8R8G8B8 render target Blt that to the primary surface IGNORING Blt that to the primary surface IGNORING
the alpha channelthe alpha channel (And stretch/shrink (please))(And stretch/shrink (please))
Then you should set:Then you should set: DDHALINFO.vmiData.ddpfDisplay. dwFlags DDHALINFO.vmiData.ddpfDisplay. dwFlags
|= DDPF_ALPHAPIXELS|= DDPF_ALPHAPIXELS DDHALINFO.vmiData.ddpfDisplay. DDHALINFO.vmiData.ddpfDisplay.
dwRGBAlphaBitMask = 0xFF000000dwRGBAlphaBitMask = 0xFF000000
Windowed Applications Windowed Applications And Blt QueuingAnd Blt Queuing Don’t allow “many” presentation-bltsDon’t allow “many” presentation-blts
in your queue in your queue That is, don’t allow a large latency between That is, don’t allow a large latency between
scheduling and retiring a presentation-bltscheduling and retiring a presentation-blt
WHQL enforces low latency for WHQL enforces low latency for DirectX 8.0 driversDirectX 8.0 drivers Check DDBLT_PRESENTATION, and don’t Check DDBLT_PRESENTATION, and don’t
allow more than threeallow more than three
More info in ddraw.hMore info in ddraw.h
DDBLT_WAIT And DDBLT_WAIT And DDBLT_DONOTWAITDDBLT_DONOTWAIT Drivers should never look at theseDrivers should never look at these They are set by the application/ They are set by the application/
DirectDraw runtimeDirectDraw runtime They are handled by the They are handled by the
DirectDraw runtimeDirectDraw runtime Sometimes DirectDraw spins, and wants Sometimes DirectDraw spins, and wants
to do that in user-modeto do that in user-mode Applies to DDFLIP_WAIT as wellApplies to DDFLIP_WAIT as well
DDBLT_ASYNCDDBLT_ASYNC
Ignore this flagIgnore this flag Always perform your blts Always perform your blts
asynchronously, if possibleasynchronously, if possible
What Are DDROPS?What Are DDROPS?
We don’t know eitherWe don’t know either An idea of the original designer of An idea of the original designer of
DirectDraw, but never implementedDirectDraw, but never implementedor specifiedor specified
In short: ignore!In short: ignore!
Blt And YUV SurfacesBlt And YUV Surfaces
DirectShow can gain performance DirectShow can gain performance benefits if it knows it can use Blt to benefits if it knows it can use Blt to copy Overlay surfacescopy Overlay surfaces
Check to see if you can support Check to see if you can support DDCAPS2_COPYFOURCCDDCAPS2_COPYFOURCC
This means you can SRCCOPY, no sub-This means you can SRCCOPY, no sub-rects, no stretch, no overlap between rects, no stretch, no overlap between two FOURCC surfaces of the same typetwo FOURCC surfaces of the same type
Update Overlay, Etc.Update Overlay, Etc.
If multiple overlays are created, but you If multiple overlays are created, but you have hardware for only one:have hardware for only one: Succeed all CreateSurface callsSucceed all CreateSurface calls Fail the UpdateOverlay callFail the UpdateOverlay call
Flip FlagsFlip Flags
DDFLIP_NOVSYNCDDFLIP_NOVSYNC This means: flip immediately; do not wait This means: flip immediately; do not wait
for vertical blankfor vertical blank The hardware must be capable of re-The hardware must be capable of re-
latching the new primary surface address latching the new primary surface address immediately, or at least on theimmediately, or at least on thenext scanlinenext scanline
In other words, don’t allow the remaining In other words, don’t allow the remaining raster scans to read from the oldraster scans to read from the oldback bufferback buffer
Flip FlagsFlip Flags
DDFLIP_INTERVALnDDFLIP_INTERVALn Please don’t implement by busy-waitingPlease don’t implement by busy-waiting
in the driverin the driver But please do implement if your hardware But please do implement if your hardware
can defer flips for n framescan defer flips for n frames
Gamma RampsGamma Ramps
DirectDraw and Direct3D’s gamma DirectDraw and Direct3D’s gamma ramps are passed through the GDI DDI ramps are passed through the GDI DDI call SetDeviceGammaRampcall SetDeviceGammaRamp
This call is poorly prototypedThis call is poorly prototyped This is the struct you will be passed:This is the struct you will be passed:
structstruct{{
WORDWORD red[256];red[256]; //WORDs not BYTEs//WORDs not BYTEsWORD WORD green[256];green[256];WORDWORD blue[256];blue[256];
};};
DirectX 7.0 Implementation DirectX 7.0 Implementation DetailsDetails
Overview Of DirectX 7.0 ModelOverview Of DirectX 7.0 Model
Direct3D refers to surfacesDirect3D refers to surfacesvia “handles”via “handles”
Driver keeps a look-up table indexedDriver keeps a look-up table indexedby handleby handle
Driver keeps everything it needs to Driver keeps everything it needs to know about a surface in this tableknow about a surface in this table
CreateSurfaceExCreateSurfaceEx
Called after CreateSurfaceCalled after CreateSurface Assigns a Direct3D-allocated handle Assigns a Direct3D-allocated handle
to the surface(s)to the surface(s) Driver runs attachment lists, creates Driver runs attachment lists, creates
internal structures for eachinternal structures for eachsurface in listsurface in list
CreateSurfaceEx Is HardCreateSurfaceEx Is Hard
Driver has to run surfaceDriver has to run surfaceattachment listattachment list
Z buffer might be attached, orZ buffer might be attached, orseparate surfaceseparate surface
Cubic Environment Maps areCubic Environment Maps arethe hardest...the hardest...
Cubemap AttachmentsCubemap Attachments(Abstract View)(Abstract View)
PositivePositive XX
Mip Sub-Mip Sub-LevelLevel
NegativeNegative XX
Mip Sub-Mip Sub-LevelLevel
PositivePositive YY
Mip Sub-Mip Sub-LevelLevel
......
......
...... ......
Cubemaps (Struct View)Cubemaps (Struct View)
Positive XPositive X
lpAttachListlpAttachList lpLinklpLinklpAtt..lpAtt..
lpLinklpLinklpAtt..lpAtt..
lpLinklpLinklpAtt..lpAtt..
+ X Mip+ X MiplpAttachListlpAttachList
+ X Mip+ X Mip
lpLinklpLinklpAtt..lpAtt..
lpLinklpLinklpAtt..lpAtt..
Negative XNegative X
lpAttachListlpAttachList
- X Mip- X MiplpAttachListlpAttachList
lpLinklpLink
lpAtt..lpAtt..
Positive YPositive Y
lpAttachListlpAttachList
Drivers CannotDrivers Cannot
Keep pointers to DirectDraw’s surface Keep pointers to DirectDraw’s surface structures in their own structuresstructures in their own structures
Flip confusion (explained later)Flip confusion (explained later) OverheadOverhead
Under DirectX 8.0, we don’t keep the Under DirectX 8.0, we don’t keep the DirectDraw structureDirectDraw structure
...So DirectX 8.0 drivers CAN’T store ...So DirectX 8.0 drivers CAN’T store pointers – they will crashpointers – they will crash
Flip Confusion ExplainedFlip Confusion Explained
User ModeUser ModeBack Back BufferBuffer
Handle AHandle A
DriverDriverSurface BSurface B
User ModeUser ModeFront Front BufferBuffer
Handle AHandle A
Driver Driver Surface ASurface A
Before Flip:Before Flip:
After FlipAfter Flip
User ModeUser ModeBack Back BufferBuffer
Handle AHandle A
DriverDriverSurface BSurface B
User ModeUser ModeFront Front BufferBuffer
Handle BHandle B
Driver Driver Surface ASurface A
The user-modeThe user-modestructures nowstructures nowrefer to differentrefer to differentpieces of pieces of memory.memory.
=> You cannot => You cannot storestorepointers to thepointers to theuser-mode user-mode structsstructsin the driver in the driver structs.structs.
Aliasing: What It IsAliasing: What It Is
Video memory is a shared resourceVideo memory is a shared resource On mode switch, all must be given upOn mode switch, all must be given up But the application may be writing But the application may be writing
directly to video memorydirectly to video memory We re-map the application’s view of We re-map the application’s view of
video memory to a dummy page, then video memory to a dummy page, then allow the mode switch to proceedallow the mode switch to proceed Only done at app’s request: Only done at app’s request:
DDLOCK_NOSYSLOCKDDLOCK_NOSYSLOCK
Aliasing: How It’s DoneAliasing: How It’s Done
When the driver returns a pointer to When the driver returns a pointer to video memory at CreateSurface time:video memory at CreateSurface time: The offset into the frame buffer is The offset into the frame buffer is
calculated, and then an equivalentcalculated, and then an equivalentaliased pointer is returned toaliased pointer is returned tothe applicationthe application
If the pointer lies outside of video memory, If the pointer lies outside of video memory, no aliasing is done (we don’t knowno aliasing is done (we don’t knowenough to do so)enough to do so)
Aliasing: How To Break ItAliasing: How To Break It
On Windows NT systems, the driver On Windows NT systems, the driver must NOT return a pointer outside of must NOT return a pointer outside of video memory at Lock timevideo memory at Lock time This pointer will not be aliasedThis pointer will not be aliased The application will crash if a modeThe application will crash if a mode
switch happensswitch happens
Drivers should allocate system memory Drivers should allocate system memory at CreateSurface time at CreateSurface time (PLEASE_ALLOC_USERMEM)(PLEASE_ALLOC_USERMEM)
Changes For DirectX 8.0Changes For DirectX 8.0
Driver Capabilities Are Driver Capabilities Are Constant Across ModesConstant Across Modes This means everything in D3DCAPS8This means everything in D3DCAPS8 The caps are allowed to be “nothing” The caps are allowed to be “nothing”
in some modes, e.g., 24bppin some modes, e.g., 24bpp You are allowed to support different You are allowed to support different
back buffer formatsback buffer formats That is, the one that matches theThat is, the one that matches the
front bufferfront buffer
Pixel Formats In DirectX 8.0Pixel Formats In DirectX 8.0
Goodbye DDPIXELFORMATGoodbye DDPIXELFORMAT Hello D3DFORMATHello D3DFORMAT
All FOURCCs are D3DFORMATsAll FOURCCs are D3DFORMATs D3DFMT has this formD3DFMT has this form
Vendor ID (0=Microsoft) Nonzero FormatVendor ID (0=Microsoft) Nonzero Format (Use your PCI Vendor ID)(Use your PCI Vendor ID) => FOURCC Number => FOURCC Number
Byte 3Byte 3 Byte 2Byte 2 Byte 1Byte 1
Byte 0Byte 0
D3DFORMAT ExamplesD3DFORMAT Examples
D3DFMT_A1R5G5B5D3DFMT_A1R5G5B5 0x000000190x00000019
IHV-defined FormatIHV-defined Format 0xACAT00010xACAT0001 (PCI ID 0xACAT, not FOURCC, format 1)(PCI ID 0xACAT, not FOURCC, format 1)
FOURCC “UYVY”FOURCC “UYVY” 0x555956590x55595659 (Byte 2 is non-zero)(Byte 2 is non-zero)
IHV-Def’d Texture FormatsIHV-Def’d Texture Formats
Since Direct3D doesn’t understandSince Direct3D doesn’t understand These formats cannot be “managed”These formats cannot be “managed” Applications can lock theseApplications can lock these
surfaces directlysurfaces directly (In fact this is the only way to fill such (In fact this is the only way to fill such
surfaces with data)surfaces with data)
DirectX 8.0 Format Op-listDirectX 8.0 Format Op-list
The format op-list tells DirectX 8.0 The format op-list tells DirectX 8.0 everything about capabilities thateverything about capabilities thatvary with surface formatvary with surface format
For each format, the driver sets bitsFor each format, the driver sets bitsthat indicate:that indicate: Can Texture from this formatCan Texture from this format Render to this formatRender to this format Switch display mode to this formatSwitch display mode to this format Has caps in modes of this formatHas caps in modes of this format
Format Op-List TricksFormat Op-List Tricks
The runtime searches for the first entry The runtime searches for the first entry that has all required capabilitiesthat has all required capabilities
Example: Application wishes to render Example: Application wishes to render to 565 textureto 565 texture
Runtime will search for an Op-ListRuntime will search for an Op-Listentry with:entry with: D3DFORMAT_OP_TEXTURE | D3DFORMAT_OP_TEXTURE |
D3DFORMAT_OP_OFFSCREEN D3DFORMAT_OP_OFFSCREEN _RENDERTARGET_RENDERTARGET
Format Op-List TricksFormat Op-List Tricks
Driver A can render to 565 textureDriver A can render to 565 texture Sets this entry:Sets this entry:
Format = D3DFMT_R5G6B5Format = D3DFMT_R5G6B5 Ops = D3DFORMAT_OP_TEXTURE | Ops = D3DFORMAT_OP_TEXTURE |
D3DFORMAT_OP_OFFSCREEN D3DFORMAT_OP_OFFSCREEN _RENDERTARGET_RENDERTARGET
Format Op-List TricksFormat Op-List Tricks
Driver B can NOT render and texture Driver B can NOT render and texture from the same surface, but can dofrom the same surface, but can doboth operations individuallyboth operations individually
Sets TWO entriesSets TWO entries Format1 = D3DFMT_R5G6B5Format1 = D3DFMT_R5G6B5 Ops1 = D3DFORMAT_OP_TEXTUREOps1 = D3DFORMAT_OP_TEXTURE Format2 = D3DFMT_R5G6B5Format2 = D3DFMT_R5G6B5 Ops2 = D3DFORMAT_OP_OFFSCREEN Ops2 = D3DFORMAT_OP_OFFSCREEN
_RENDERTARGET_RENDERTARGET
What Can What Can YouYou Do Next? Do Next?
If you develop DX Graphics Drivers:If you develop DX Graphics Drivers: You need a relationship with Microsoft’s You need a relationship with Microsoft’s
DirectX team, and should contact IHV DirectX team, and should contact IHV Program Manager:Program Manager: Michele Boland (MBoland@microsoft.com)Michele Boland (MBoland@microsoft.com)
Install and run against DEBUG runtimes Install and run against DEBUG runtimes Available in the DirectX SDKAvailable in the DirectX SDK Will output debug messages forWill output debug messages for
common errorscommon errors
DirectX VADirectX VAVideo AccelerationVideo AccelerationDriversDrivers
Gary SullivanGary SullivanGarySull@microsoft.comGarySull@microsoft.com Software Design EngineerSoftware Design EngineerDMD Video Services GroupDMD Video Services GroupMicrosoft CorporationMicrosoft Corporation
AgendaAgenda
DirectX VA design and statusDirectX VA design and status Current and future requirementsCurrent and future requirements
and testsand tests Future plans and potential extensionsFuture plans and potential extensions What can you do next?What can you do next?
DirectX VA DirectX VA Prime DirectivePrime Directive
DirectX VADirectX VA
Any otherAny other
Motion CompMotion Comp Inverse DCTInverse DCT VLDVLD
MPEG-2MPEG-2 MPEG-4MPEG-4
H.263++H.263++
Decouple software decoder operation Decouple software decoder operation
from hardware accelerator design to from hardware accelerator design to
achieveachieve full interoperabilityfull interoperability
MPEG-1MPEG-1H.261H.261
What Is DXVA?What Is DXVA?What Can It Achieve?What Can It Achieve? Interoperable interface between video Interoperable interface between video
decoding software and advanced-decoding software and advanced-capability graphics acceleratorscapability graphics accelerators
Increases video capability for theIncreases video capability for theconsumer’s PCconsumer’s PC
Increases the demand for advanced graphics Increases the demand for advanced graphics accelerators and video applicationsaccelerators and video applications
Decreases implementation effort forDecreases implementation effort forsoftware decoder writerssoftware decoder writers
Decreases support burden for graphics Decreases support burden for graphics accelerator companiesaccelerator companies
Decreases testing burden for OEMsDecreases testing burden for OEMs
DirectX VADirectX VAGeneral StatusGeneral Status
Spec went 1.0 with DirectX 8.0 Beta 2 (October ’00)Spec went 1.0 with DirectX 8.0 Beta 2 (October ’00) See See http://www.microsoft.com/hwdev/DirectX_VAhttp://www.microsoft.com/hwdev/DirectX_VA OEMs love it – it enables separate WHQL qualification OEMs love it – it enables separate WHQL qualification
of decoders and driversof decoders and drivers Software decoder companies are developing with it Software decoder companies are developing with it
(Mediamatics, Intervideo, Ravisent, Cyberlink, (Mediamatics, Intervideo, Ravisent, Cyberlink, MGI/Zoran, MbyN, …)MGI/Zoran, MbyN, …)
Hardware accelerator companies are supporting it in Hardware accelerator companies are supporting it in drivers (ATI, Nvidia, Intel, SiS, S3, SiliconMotion, …)drivers (ATI, Nvidia, Intel, SiS, S3, SiliconMotion, …)
DirectX VA CapabilitiesDirectX VA Capabilities
Emphasis on MPEG-2 and DVD “sub-picture”Emphasis on MPEG-2 and DVD “sub-picture” Support of all important video coding standards Support of all important video coding standards
(H.261, H.263, MPEG-1,(H.261, H.263, MPEG-1,MPEG-2, MPEG-4)MPEG-2, MPEG-4) And some non-standard variations onAnd some non-standard variations on
the standardsthe standards Alpha graphic blending (e.g., DVD subpicture)Alpha graphic blending (e.g., DVD subpicture) Three basic degrees of decoding configuration Three basic degrees of decoding configuration
capability:capability: Motion compensation on accelerator with host residual Motion compensation on accelerator with host residual
difference decodingdifference decoding Motion compensation and IDCT on acceleratorMotion compensation and IDCT on accelerator Full raw bitstream decodingFull raw bitstream decoding
Externally-defined encryption supportExternally-defined encryption support
How Does DXVA Operate?How Does DXVA Operate?
Operation with Windows 2000 Overlay Mixer Operation with Windows 2000 Overlay Mixer (OVM) or new Windows XP Video Mixing (OVM) or new Windows XP Video Mixing Renderer (VMR)Renderer (VMR)
Requires DirectX 8.0 or Windows XPRequires DirectX 8.0 or Windows XP Decoders use it through existing Windows Decoders use it through existing Windows
2000 “IAMVideoAccelerator” API2000 “IAMVideoAccelerator” API Drivers use it through corresponding Drivers use it through corresponding
Windows 2000 “MoComp” DDIWindows 2000 “MoComp” DDI DirectVA specifies payload content of data DirectVA specifies payload content of data
buffers that previously had accelerator-buffers that previously had accelerator-specific formatsspecific formats
Host Versus AcceleratorHost Versus AcceleratorFunctional SplitFunctional Split Bitstream processing either on hostBitstream processing either on host
or acceleratoror accelerator Accelerator handles the primary data Accelerator handles the primary data
flow and performs the intensiveflow and performs the intensivesignal processingsignal processing
PCI/AGP is the bridge between the twoPCI/AGP is the bridge between the two Reconstruction loop maintained in Reconstruction loop maintained in
graphics Accelerator memorygraphics Accelerator memory Host processing converts standard-Host processing converts standard-
specific streams into generic specific streams into generic Accelerator work unitsAccelerator work units
Today’s DirectX VAToday’s DirectX VA
Compressed VideoCompressed VideoSourceSource
Variable-LengthVariable-LengthDecodingDecoding
GraphicGraphicSourceSource Graphic DecoderGraphic Decoder Graphic BlendingGraphic Blending
Residual DifferenceResidual DifferenceDecoding (IDCT)Decoding (IDCT)
MotionMotionCompensationCompensation Sum & ClipSum & Clip
OVM/VMR/3DOVM/VMR/3D
Frame StorageFrame Storage
(Content Protection Supported Outside of Scope)(Content Protection Supported Outside of Scope)
Constrained Parameter Constrained Parameter ProfilesProfiles Strategy is to define a general interface Strategy is to define a general interface
and a number of constrained-parameter and a number of constrained-parameter profiles, with decoder data structure profiles, with decoder data structure configuration settingsconfiguration settings
Profiles defined:Profiles defined: MPEG-2 Main Profile with and withoutMPEG-2 Main Profile with and without
DVD Subpicture DVD Subpicture Several H.263/MPEG-4 profiles Several H.263/MPEG-4 profiles MPEG-1MPEG-1 H.261 with and without deblockingH.261 with and without deblocking
post-processingpost-processing
Defined Buffer TypesDefined Buffer Types Picture-level decoding parameter buffersPicture-level decoding parameter buffers Buffers for bitstream decoding:Buffers for bitstream decoding:
Bitstream data buffersBitstream data buffers Bitstream slice control buffersBitstream slice control buffers Inverse quantization matrix buffersInverse quantization matrix buffers
Buffers for macroblock-level decoding:Buffers for macroblock-level decoding: Macroblock control buffersMacroblock control buffers Residual difference data buffersResidual difference data buffers
Buffers for graphic blending:Buffers for graphic blending: Alpha+YUV graphic buffersAlpha+YUV graphic buffers AI44 graphic buffersAI44 graphic buffers DVD DPXD graphic buffersDVD DPXD graphic buffers DVD highlight definition buffersDVD highlight definition buffers DVD display control command buffersDVD display control command buffers Alpha blend combination buffersAlpha blend combination buffers
Deblocking filter control buffersDeblocking filter control buffers Picture resampling buffersPicture resampling buffers Read-back data buffersRead-back data buffers
DXVA Requirement PlansDXVA Requirement PlansPrimary GoalsPrimary Goals Clear specification for MPEG-2 Clear specification for MPEG-2
interoperability (and front-end DVD interoperability (and front-end DVD subpicture) is the primary goalsubpicture) is the primary goal
Driver and decoder that claim video Driver and decoder that claim video acceleration must support DXVAacceleration must support DXVA
Specific “minimal interoperability set” Specific “minimal interoperability set” for each defined profilefor each defined profile
July ’01 Stated July ’01 Stated RequirementsRequirements MPEG2_A and MPEG2_C requiredMPEG2_A and MPEG2_C required MPEG1_A requiredMPEG1_A required H263_A required (?!)H263_A required (?!) Arithmetic accuracy requiredArithmetic accuracy required IDCT accuracy requiredIDCT accuracy required Picture resolutions up to 720x576Picture resolutions up to 720x576 Uncompressed surface types must Uncompressed surface types must
include NV12 in supported listinclude NV12 in supported list Must have “front end” capability to Must have “front end” capability to
convert to YUY2 from format in useconvert to YUY2 from format in use
July ’01 Actual TestsJuly ’01 Actual Tests
StRowe test decoder developedStRowe test decoder developed Test driver also developedTest driver also developed Released DCT400 driver tests cover Released DCT400 driver tests cover
MPEG2_A, _B, _C, _D profilesMPEG2_A, _B, _C, _D profiles Pass/Fail based on MPEG2_A and _BPass/Fail based on MPEG2_A and _B Tests are currently of functional Tests are currently of functional
operation and visual performanceoperation and visual performance Contact us (?!) if any test problemsContact us (?!) if any test problems Don’t ship untested features (?!)Don’t ship untested features (?!)
Structure Of Motion Comp DataStructure Of Motion Comp Data
All standards send only luma motion All standards send only luma motion vectors, deriving chroma vectorsvectors, deriving chroma vectorsfrom luma vectorsfrom luma vectors
Each standard derives chroma vectors Each standard derives chroma vectors in its own wayin its own way
Switches for configuring the motion Switches for configuring the motion comp are provided to minimize host comp are provided to minimize host “translation” requirements“translation” requirements
MPEG-2 Dual-Prime motion vectors MPEG-2 Dual-Prime motion vectors derived on hostderived on host
DXVA Macroblock ControlDXVA Macroblock ControlExampleExample
/* Basic form for P and B pictures *//* Basic form for P and B pictures */
typedef struct _DXVA_MBctrl_P_OffHostIDCT_1typedef struct _DXVA_MBctrl_P_OffHostIDCT_1
{{
WORD wMBaddress;WORD wMBaddress;
WORD wMBtype;WORD wMBtype;
DWORD dwMB_SNL;DWORD dwMB_SNL;
WORD wPatternCode;WORD wPatternCode;
UINT8 NumCoef[6];UINT8 NumCoef[6];
DXVA_MVvalue MVector[4];DXVA_MVvalue MVector[4];
} DXVA_MBctrl_P_OffHostIDCT_1;} DXVA_MBctrl_P_OffHostIDCT_1;
Structure Of Residual DataStructure Of Residual DataBackground (1 of 2)Background (1 of 2)
Things that vary within and across standards:Things that vary within and across standards: Coefficient scan schemesCoefficient scan schemes Intra Coefficient prediction schemesIntra Coefficient prediction schemes VLC schemesVLC schemes Inverse quantization schemesInverse quantization schemes Mismatch-control schemesMismatch-control schemes ……
These things need lots of logic – not always These things need lots of logic – not always justified for accelerator implementationjustified for accelerator implementation
Structure Of Residual DataStructure Of Residual DataBackground (2 of 2)Background (2 of 2)
Things that do not vary within and Things that do not vary within and across standardsacross standards IDCT definitionIDCT definition
Conformance rules may slightly differ – but Conformance rules may slightly differ – but multi-standard conformance not a big problemmulti-standard conformance not a big problem
Many zero-valued coefficientsMany zero-valued coefficients Predicted-versus-Intra operationPredicted-versus-Intra operation Only a few currently-specifiedOnly a few currently-specified
inverse scansinverse scans
Structure Of Residual DataStructure Of Residual DataThe Chosen MethodThe Chosen Method
Keep standard-specific issues on the Keep standard-specific issues on the host to the extent possiblehost to the extent possible
Support host-based or accelerator-Support host-based or accelerator-based IDCTbased IDCT
Send only non-zero coefficientsSend only non-zero coefficients Send index or run-length for Send index or run-length for
coefficientscoefficients
Residual Difference ExampleResidual Difference Example(Off-Host IDCT 16b TCOEFF)(Off-Host IDCT 16b TCOEFF)
typedef struct _DXVA_TCoefSingletypedef struct _DXVA_TCoefSingle{{ WORD wIndexWithEOB;WORD wIndexWithEOB; SHORT TCoefValue;SHORT TCoefValue;} DXVA_TCoefSingle, *LPDXVA_TCoefSingle;} DXVA_TCoefSingle, *LPDXVA_TCoefSingle;
/* Macros for Reading EOB and Index Values *//* Macros for Reading EOB and Index Values */#define readDXVA_TCoefSingleIDX(ptr) ((ptr)->wIndexWithEOB >> 1)#define readDXVA_TCoefSingleIDX(ptr) ((ptr)->wIndexWithEOB >> 1)#define readDXVA_TCoefSingleEOB(ptr) ((ptr)->wIndexWithEOB & 1)#define readDXVA_TCoefSingleEOB(ptr) ((ptr)->wIndexWithEOB & 1)
/* Macros for Writing EOB and Index Values *//* Macros for Writing EOB and Index Values */#define writeDXVA_TCoefSingleIndexWithEOB(ptr, idx, eob) ((ptr)-#define writeDXVA_TCoefSingleIndexWithEOB(ptr, idx, eob) ((ptr)-
>wIndexWithEOB = ((idx) << 1) | (eob))>wIndexWithEOB = ((idx) << 1) | (eob))#define setDXVA_TCoefSingleIDX(ptr, idx) ((ptr)->wIndexWithEOB |= #define setDXVA_TCoefSingleIDX(ptr, idx) ((ptr)->wIndexWithEOB |=
((idx) << 1))((idx) << 1))#define setDXVA_TCoefSingleEOB(ptr) ((ptr)->wIndexWithEOB |= #define setDXVA_TCoefSingleEOB(ptr) ((ptr)->wIndexWithEOB |=
1)1)
Decoding ConfigurationsDecoding Configurations(Part 1 of 2)(Part 1 of 2)
Bitstream decoding vs. Host VLDBitstream decoding vs. Host VLD Encryption:Encryption:
Bitstream data if bitstream decodingBitstream data if bitstream decoding Macroblock control commands and/or residual Macroblock control commands and/or residual
difference data if Host VLDdifference data if Host VLD Type of encryption protocol supportedType of encryption protocol supported
For Host VLD:For Host VLD: Host-based residual difference decoding versus Host-based residual difference decoding versus
Accelerator-based residual differenceAccelerator-based residual differencedecoding versus bothdecoding versus both
Macroblock control commands in raster-scanMacroblock control commands in raster-scanorder versus arbitrary orderorder versus arbitrary order
Decoding ConfigurationsDecoding Configurations(Part 2 of 2)(Part 2 of 2)
For host-based residual difference decodingFor host-based residual difference decoding 8b vs. 16b differences8b vs. 16b differences If 8b differences, overflow supported, or notIf 8b differences, overflow supported, or not If 8b differences, subtract second pass, or notIf 8b differences, subtract second pass, or not Interleaved chroma or notInterleaved chroma or not Host clips range of data, or notHost clips range of data, or not Intra residuals unsigned, or notIntra residuals unsigned, or not
For accelerator-based difference decodingFor accelerator-based difference decoding Specific IDCT supportSpecific IDCT support Inverse scan on host or acceleratorInverse scan on host or accelerator Coefficients sent in groups of four, or singlyCoefficients sent in groups of four, or singly
Alpha Blending Alpha Blending ConfigurationsConfigurations AYUV alpha blend graphic loadingAYUV alpha blend graphic loading
AI44 or IA44 +palette or DPXD+HighlightAI44 or IA44 +palette or DPXD+Highlightor AYUVor AYUV
Alpha blend combination operation:Alpha blend combination operation: Front-end versus back-endFront-end versus back-end Picture resizing or notPicture resizing or not Only use picture destination area or notOnly use picture destination area or not Graphic resizing or notGraphic resizing or not Whole plane alpha or notWhole plane alpha or not
Longer Term RequirementsLonger Term Requirements
Include H263_A, _B, _C in tested Include H263_A, _B, _C in tested requirementsrequirements
Include mathematical motion comp and Include mathematical motion comp and IDCT accuracy in testsIDCT accuracy in tests
Add speed performance testingAdd speed performance testing Picture resolutions up to 1920x1088Picture resolutions up to 1920x1088 Six or more uncompressed surfacesSix or more uncompressed surfaces Specific FOURCC surface types for Specific FOURCC surface types for
uncompressed surfacesuncompressed surfaces
Kill Superfluous ConfigsKill Superfluous Configs
bConfigRasterOrder = 0bConfigRasterOrder = 0 bConfigResidDiffHost = 1bConfigResidDiffHost = 1 (bConfigResid8Subtraction = 1 with (bConfigResid8Subtraction = 1 with
bConfigSpatialResid8 = 1) or bConfigSpatialResid8 = 1) or (bConfigResidDiffHost = 1 with (bConfigResidDiffHost = 1 with (bConfigSpatialResid8 = 0 and (bConfigSpatialResid8 = 0 and bConfigSpatialHost8or9Clipping = 0))bConfigSpatialHost8or9Clipping = 0))
bConfigIntraResidUnsigned = 0bConfigIntraResidUnsigned = 0 bConfigSpatialResidInterleaved = 0bConfigSpatialResidInterleaved = 0 bConfigHostInverseScan = 0bConfigHostInverseScan = 0 bConfig4GroupedCoefs = 0bConfig4GroupedCoefs = 0
Enhance Blending ConfigsEnhance Blending Configs
Eliminate duplication of AI44 & IA44 Eliminate duplication of AI44 & IA44 (bConfigDataType = 0 & 1)(bConfigDataType = 0 & 1)
Require both AYUV and AI44/IA44 Require both AYUV and AI44/IA44 (bConfigDataType = 3 and 0/1)(bConfigDataType = 3 and 0/1)
Require front-end blend Require front-end blend (bConfigBlendType = 0)(bConfigBlendType = 0)
bConfigPictureResizing = 1bConfigPictureResizing = 1 bConfigOnlyUsePicDestRectArea = 0bConfigOnlyUsePicDestRectArea = 0 bConfigGraphicResizing = 1bConfigGraphicResizing = 1 bConfigWholePlaneAlpha = 1bConfigWholePlaneAlpha = 1
Hot Issue:Hot Issue:WMV/H.263/MPEG-4WMV/H.263/MPEG-4 Codecs beyond MPEG-2 need supportCodecs beyond MPEG-2 need support H263_A profile needs:H263_A profile needs:
Different derivation of chroma motionDifferent derivation of chroma motion
H263_B profile needs:H263_B profile needs: Rounding controlRounding control Motion vectors over picture boundariesMotion vectors over picture boundaries 8x8 motion vectors8x8 motion vectors Alternative inverse scan (or host inverse scan)Alternative inverse scan (or host inverse scan)
H263_C profile needs:H263_C profile needs: Deblocking filter support (also in H263_B?!)Deblocking filter support (also in H263_B?!)
Desirable Future ExtensionsDesirable Future Extensions
De-interlacingDe-interlacing Interoperable encryption / DRMInteroperable encryption / DRM Compressed-video encoding (includingCompressed-video encoding (including
ME, DCT, and so on)ME, DCT, and so on) Inverse-telecineInverse-telecine Hue/contrast/brightness/gamma/color Hue/contrast/brightness/gamma/color
correctionscorrections Future decoding methods (MPEG-4v2,Future decoding methods (MPEG-4v2,
WMV, H.26L)WMV, H.26L) Frame rate conversionFrame rate conversion Precise separable re-samplingPrecise separable re-sampling Gen lock/frame rate synchronizationGen lock/frame rate synchronization TV out controlTV out control
New GUIDsNew GUIDsReducing Memory UseReducing Memory Use
Add three new GUIDs to parallel MPEG2_A, Add three new GUIDs to parallel MPEG2_A, MPEG2_B, and MPEG2_DMPEG2_B, and MPEG2_D
New GUID adds raw bitstream decoding to the New GUID adds raw bitstream decoding to the “minimal interoperability set” of the “minimal interoperability set” of the corresponding existing GUIDcorresponding existing GUID
Driver with raw bitstream support then need Driver with raw bitstream support then need not allocate buffers for macroblock-level not allocate buffers for macroblock-level processing with these GUIDsprocessing with these GUIDs
Drivers could also not expose bitstream Drivers could also not expose bitstream processing with existing GUIDs toprocessing with existing GUIDs tosave memorysave memory
Interoperable EncryptionInteroperable Encryption
Define an interoperable encryption Define an interoperable encryption schemescheme
Much like the old draft DXVA schemeMuch like the old draft DXVA scheme Certificates for establishing trust Certificates for establishing trust
(perhaps X.509 or something else (perhaps X.509 or something else rather than old draft scheme)rather than old draft scheme)
RSA key exchangeRSA key exchange AES (RIJNDAEL) content encryptionAES (RIJNDAEL) content encryption
Other In-Scope AdditionsOther In-Scope Additions
Add new features for other codecs – Add new features for other codecs – WMV, H.26L, MPEG-4v2, etc.WMV, H.26L, MPEG-4v2, etc. 1/4-sample motion comp1/4-sample motion comp Added motion comp sizes and shapesAdded motion comp sizes and shapes New inverse transforms (e.g., 4x4)New inverse transforms (e.g., 4x4) Fine granularity scalabilityFine granularity scalability Global motion compGlobal motion comp Studio profile featuresStudio profile features
More possible GUIDs for precise More possible GUIDs for precise codec/configuration needscodec/configuration needs
New Video Building BlocksNew Video Building Blocks
DeinterlacingDeinterlacing Inverse TelecineInverse Telecine Frame rate conversionFrame rate conversion Contrast/Brightness/Gamma/ColorContrast/Brightness/Gamma/Color Precisely-specified resamplingPrecisely-specified resampling Video compression encodingVideo compression encoding
Deinterlace/Deinterlace/Inverse TelecineInverse Telecine Deinterlace is crucialDeinterlace is crucial Becoming a standard feature ofBecoming a standard feature of
high-end consumer TVshigh-end consumer TVs 1080i in weave can look awful1080i in weave can look awful 1080i in bob can look wrong too1080i in bob can look wrong too Deinterlace can be useful for either Deinterlace can be useful for either
decoding or encodingdecoding or encoding
Hypothetical DXVA Hypothetical DXVA StructureStructure
Interoperable DRM/Conditional Access/Content Protection/EncryptionInteroperable DRM/Conditional Access/Content Protection/Encryption
De-interlace /De-interlace /Inverse TelecineInverse Telecine
Color ConversionsColor Conversions& Adjustments& Adjustments
????
Frame RateFrame RateConversionConversion
??
Today’s Scope ofToday’s Scope ofDirectX VADirectX VA OVM/VMR/3DOVM/VMR/3D
ScalingScaling??????
Video EncodingVideo Encoding
??UncompressedUncompressedVideo SourceVideo Source
Motion EstimationMotion Estimation
Mode & MotionMode & MotionVector DecisionVector Decision
Residual DifferenceResidual DifferenceTransform (DCT)Transform (DCT)
QuantizationQuantization
Variable LengthVariable LengthEncodingEncoding ??
Residual DifferenceResidual DifferenceDecoding (IDCT)Decoding (IDCT)
MotionMotionCompensationCompensation Sum and ClipSum and ClipInverse Telecine /Inverse Telecine /
De-interlaceDe-interlace
Color ConversionsColor ConversionsAnd AdjustmentsAnd Adjustments
Frame StorageFrame Storage
What Can You Do Next?What Can You Do Next?(To All) Give Us Your Proposals(To All) Give Us Your Proposals
About any difficulties/problems in designAbout any difficulties/problems in design About encryption designAbout encryption design About new in-scope feature needsAbout new in-scope feature needs About how to support new featuresAbout how to support new features
Deinterlace/inverse telecineDeinterlace/inverse telecine EncodingEncoding Frame rate conversionFrame rate conversion Contrast/Brightness/Gamma/ColorContrast/Brightness/Gamma/Color ResamplingResampling
What Can You Do Next?What Can You Do Next?(For Graphic Accelerator Designers)(For Graphic Accelerator Designers)
Make your MPEG-2 and DVD subpicture DXVA Make your MPEG-2 and DVD subpicture DXVA solution rock-solid, fully-tested with every solution rock-solid, fully-tested with every available decoder, and frighteningly fastavailable decoder, and frighteningly fast
Fully support YUV surfaces as texturesFully support YUV surfaces as texturesfor input to 3-Dfor input to 3-D Conversion to RGB, and so onConversion to RGB, and so on
Design maximal WMV/H.263/MPEG-4 feature Design maximal WMV/H.263/MPEG-4 feature support into your next generationsupport into your next generation But don’t expose them unless fully testedBut don’t expose them unless fully tested
Move to the preferred configurations and Move to the preferred configurations and uncompressed surface typesuncompressed surface types
Support new memory-conserving GUIDsSupport new memory-conserving GUIDs
Writing AVStream Writing AVStream Minidrivers For Windows XPMinidrivers For Windows XP
William Messmer, SDE William Messmer, SDE Digital Audio-VideoDigital Audio-VideoMicrosoft Corporation Microsoft Corporation
AgendaAgenda
AVStream minidriver architectureAVStream minidriver architecture When and why to use AVStreamWhen and why to use AVStream Exposing minidriver functionalityExposing minidriver functionality
Data processingData processing Writing a minidriver: key issuesWriting a minidriver: key issues
and pitfallsand pitfalls Walk through sample codeWalk through sample code Common problems and mistakesCommon problems and mistakes DirectX 8.0 versus Windows XPDirectX 8.0 versus Windows XP
What can you do next?What can you do next?
Why AVStreamWhy AVStream
THE next generation class driverTHE next generation class driver More efficient streamingMore efficient streaming Reduces the amount of minidriver codeReduces the amount of minidriver code Simplifies development; faster to marketSimplifies development; faster to market One minidriver, one model – no more confusion One minidriver, one model – no more confusion
over stream class versus port classover stream class versus port class
New features, new technologies will only be New features, new technologies will only be supported in AVStream; stream and port supported in AVStream; stream and port class, however, are still supported!class, however, are still supported!
When To Use AVStreamWhen To Use AVStream
BDA DriversBDA Drivers New Device TypesNew Device Types
Which are not already written to stream Which are not already written to stream class or port classclass or port class
Combined A/V devicesCombined A/V devices Kernel Software TransformsKernel Software Transforms
Audio Global Effects (GFX) FiltersAudio Global Effects (GFX) Filters
No necessity to port existing stream No necessity to port existing stream or port class driversor port class drivers
Minidriver ArchitectureMinidriver Architecture
Functionality is exposed as a tree Functionality is exposed as a tree hierarchy described throughhierarchy described throughstatic descriptorsstatic descriptors Device – described by Device DescriptorDevice – described by Device Descriptor Filter Factory – creates a type of FilterFilter Factory – creates a type of Filter Filter – described by Filter DescriptorFilter – described by Filter Descriptor Pin Factory – creates a type of PinPin Factory – creates a type of Pin Pin – described by Pin DescriptorPin – described by Pin Descriptor
Functionality provided through static Functionality provided through static dispatch and automation tablesdispatch and automation tables
Device DispatchDevice DispatchAdd DeviceAdd Device
Pin DispatchPin Dispatch
Pin Automation Pin Automation
Filter DispatchFilter Dispatch
Filter AutomationFilter Automation
Minidriver ArchitectureMinidriver Architecture
Device DescriptorDevice Descriptor
Filter DescriptorFilter Descriptor
Pin DescriptorPin Descriptor
DeviceDevice
Filter FactoryFilter Factory
>= 1>= 1
Pin FactoryPin Factory
FilterFilter
>= 1>= 1
>= 1>= 1
PinPin
>= 1>= 1
Minidriver Provided TableMinidriver Provided Table Public AVStream ConstructPublic AVStream Construct Private AVStream ConstructPrivate AVStream Construct
Key:Key: Minidriver Dispatch RoutineMinidriver Dispatch Routine
Filter CreateFilter Create
Device DispatchDevice Dispatch
Pin CreatePin Create
Filter DispatchFilter Dispatch
Pin DispatchPin Dispatch
Exposing MinidriversExposing Minidrivers
Expose your driver to AVStreamExpose your driver to AVStream Call KsInitializeDriver in DriverEntry Call KsInitializeDriver in DriverEntry
passing your Device Descriptorpassing your Device Descriptor Return the status from KsInitializeDriverReturn the status from KsInitializeDriver
AVStream handles PnP to get your AVStream handles PnP to get your driver set up; minidriver gets calls driver set up; minidriver gets calls through device dispatchthrough device dispatch
Filter Factories set up by AVStream Filter Factories set up by AVStream during Add Device and Start Deviceduring Add Device and Start Device
Exposing MinidriversExposing Minidrivers
AVStream creates filters/pins AVStream creates filters/pins based on descriptorsbased on descriptors Minidriver receives creation dispatchMinidriver receives creation dispatch Creation dispatch associates minidriver Creation dispatch associates minidriver
specific context with objectspecific context with object Object bags available as containers for Object bags available as containers for
dynamic memory like contextsdynamic memory like contexts
AVStream handles cleanup of objects AVStream handles cleanup of objects based on bagsbased on bags No forgetting to free dynamic memoryNo forgetting to free dynamic memory
Minidriver ArchitectureMinidriver Architecture
Sample Code (Exposing Functionality)Sample Code (Exposing Functionality)
Data ProcessingData Processing
AVStream queues data/buffersAVStream queues data/buffers Minidriver queues not necessaryMinidriver queues not necessary Cancellation handled in the queueCancellation handled in the queue Data exposed through two abstractions: Data exposed through two abstractions:
stream pointers and process pinsstream pointers and process pins Stream pointers are robust and allow versatile Stream pointers are robust and allow versatile
queue management; typically used inqueue management; typically used inhardware drivershardware drivers
Process pins work purely at a single buffer Process pins work purely at a single buffer level making for very simple software level making for very simple software transformstransforms
Design IssuesDesign Issues
Two distinct ways to handle data Two distinct ways to handle data processingprocessing Filter-Centric processingFilter-Centric processing
Specify filter process dispatchSpecify filter process dispatch
Pin-Centric processingPin-Centric processing Specify pin process dispatchesSpecify pin process dispatches
The choice of which to use will The choice of which to use will influence design greatlyinfluence design greatly
Filter-Centric ProcessingFilter-Centric Processing
Filter is called to process data in a Filter is called to process data in a context where data is available oncontext where data is available onall required pinsall required pins
Typically used for software transformsTypically used for software transforms Stream pointer use not requiredStream pointer use not required Processing based on an index of Processing based on an index of
process pinsprocess pins Index/pins stable during processingIndex/pins stable during processing
Minidriver does transform, specifies Minidriver does transform, specifies how many bytes of each buffer used how many bytes of each buffer used
Process PinsProcess Pins
One per pin – points back to the pinOne per pin – points back to the pin Contains a stream pointer if neededContains a stream pointer if needed Contains a buffer virtual address and Contains a buffer virtual address and
size for data manipulationsize for data manipulation Informs the process routine of the pin’s Informs the process routine of the pin’s
relationships with other pinsrelationships with other pins InPlaceCounterpart – other pin in an InPlaceCounterpart – other pin in an
in-place transform pairin-place transform pair CopySource – pin data is copied fromCopySource – pin data is copied from DelegateBranch – pin that delegates DelegateBranch – pin that delegates
frames (in the same pipe)frames (in the same pipe)
Transform ExampleTransform Example
FrameFrame
(2880)(2880)
INPUTINPUT OUTPUTOUTPUT
1. Frame(s) arrive1. Frame(s) arrive
2. Filter is called to process.2. Filter is called to process.Filter sees two process pins:Filter sees two process pins:
ININ OUTOUT
1920 Bytes 1920 Bytes DataData
2880 Bytes Buffer2880 Bytes Buffer
3. Process Pins Point to Buffers3. Process Pins Point to Buffers
4. Filter performs transform;4. Filter performs transform;Sets 1920 bytes used on input and outputSets 1920 bytes used on input and output
5. Filter is called back; 5. Filter is called back; more data to transformmore data to transform
6. Process Repeats Similarly6. Process Repeats Similarly
960 Bytes 960 Bytes Buffer Buffer
FrameFrame
(1920)(1920)
FrameFrame
(1100)(1100)
FrameFrame
(960)(960)
1100 Bytes 1100 Bytes Data Data
FrameFrame
GoneGone
FrameFrame
Gone Gone
FrameFrame
(140)(140)
Pin-Centric ProcessingPin-Centric Processing
Each pin called to process data in a Each pin called to process data in a context independent of other pinscontext independent of other pins
Typically used for hardware driversTypically used for hardware drivers Data accessed through stream pointer Data accessed through stream pointer
abstractionabstraction
Stream PointersStream Pointers
Reference a single frame in a queueReference a single frame in a queue Hold that frame in the queueHold that frame in the queue Can be in multiple statesCan be in multiple states
Locked – referenced data is safe to access; Locked – referenced data is safe to access; Irp cannot be cancelledIrp cannot be cancelled
Unlocked – not guaranteed to even Unlocked – not guaranteed to even reference data; Irp can be cancelledreference data; Irp can be cancelled
Can be cloned to create new pointers Can be cloned to create new pointers into the data streaminto the data stream
Can schedule time-outsCan schedule time-outs
Stream PointersStream Pointers
Contain two offsets into the data Contain two offsets into the data stream for ease of in-place usestream for ease of in-place use
Address data at one of two Address data at one of two granularities:granularities: Byte – access via virtual addressByte – access via virtual address Mapping – access via logical DMA Mapping – access via logical DMA
addressaddress KSPIN_FLAG_GENERATE_MAPPINGSKSPIN_FLAG_GENERATE_MAPPINGS
Minidriver usable context available per Minidriver usable context available per stream pointerstream pointer
Stream Pointers And QueuesStream Pointers And Queues
FrameFrame
(1)(1)
FrameFrame
(0)(0)
FrameFrame
(0)(0)
FrameFrame
(0)(0)
LeadingLeadingEdgeEdge
FrameFrame
(1)(1)
FrameFrame
(1)(1)
FrameFrame
(1)(1)
FrameFrame
(0)(0)
LeadingLeadingEdgeEdge
TrailingTrailingEdgeEdge
FrameFrame
(1)(1)
FrameFrame
(2)(2)
FrameFrame
(3)(3)
FrameFrame
(1)(1)
LeadingLeadingEdgeEdge
CloneClone
ClonesClones
ClonesClones
Newest FramesNewest Frames
Oldest FramesOldest Frames
Direct DMA ExampleDirect DMA Example
FrameFrame
FrameFrame
FrameFrame
1. Frame(s) arrive; minidriver called to process1. Frame(s) arrive; minidriver called to process
2. Processing routine acquires 2. Processing routine acquires leading edgeleading edge KsPinGetLeadingEdgeStreamPointerKsPinGetLeadingEdgeStreamPointer
3. 3. Leading edgeLeading edge is cloned is cloned KsStreamPointerCloneKsStreamPointerClone
4. DMA Hardware is programmed4. DMA Hardware is programmed
6. Process may repeat for more frames6. Process may repeat for more frames
7. Hardware interrupts for DMA completion7. Hardware interrupts for DMA completion
8. ISR Schedules a DPC8. ISR Schedules a DPC
9. DPC releases the associated frames9. DPC releases the associated frames KsStreamPointerDeleteKsStreamPointerDelete
5. Leading edge is advanced5. Leading edge is advanced
FrameFrame
(1)(1)
FrameFrame
(2)(2)
FrameFrame
(1)(1)
Frame Frame GoneGone
FrameFrame
(1)(1)
FrameFrame
(2)(2)
FrameFrame
(1)(1)
FrameFrame
GoneGone
FrameFrame
(1)(1)
FrameFrame
(2)(2)
FrameFrame
(1)(1)
FrameFrame
GoneGone
QUEUEQUEUE
10. May need to continue processing10. May need to continue processing KsPinAttemptProcessingKsPinAttemptProcessing
Data Frame ControlData Frame Control
Held non-cancelable for a periodHeld non-cancelable for a period Use locked stream pointersUse locked stream pointers Consider stream pointer timeoutsConsider stream pointer timeouts
Can relinquish claim with callbackCan relinquish claim with callback Use unlocked stream pointers with Use unlocked stream pointers with
a cancel callbacka cancel callback
Periodic access where frame can Periodic access where frame can disappear between accessesdisappear between accesses Use unlocked stream pointers and Use unlocked stream pointers and
lock periodicallylock periodically
Processing DecisionsProcessing Decisions
Filter-CentricFilter-Centric All pins are involved in the decisionAll pins are involved in the decision Each pin type can have separate Each pin type can have separate
requirementsrequirements One pin not fulfilling requirements will One pin not fulfilling requirements will
veto processing for the entire filterveto processing for the entire filter Pin-CentricPin-Centric
Only one pin is involved in the decisionOnly one pin is involved in the decision Each pin type can have separate Each pin type can have separate
requirements which do not influence requirements which do not influence other pinsother pins
When Processing HappensWhen Processing Happens
Default case (no pin flags)Default case (no pin flags) Attempt made when frame arrives and Attempt made when frame arrives and
leading edge points to no frameleading edge points to no frame
Attempt will succeed ifAttempt will succeed if Involved pin(s) are >= KSSTATE_PAUSEInvolved pin(s) are >= KSSTATE_PAUSE Involved pin(s) all have dataInvolved pin(s) all have data
Continuing processingContinuing processing STATUS_SUCCESS returned from STATUS_SUCCESS returned from
dispatch and conditions still metdispatch and conditions still met
Adjusting ProcessingAdjusting Processing
KSPIN_FLAG_KSPIN_FLAG_ _INITIATE_PROCESSING_ON_EVERY…_INITIATE_PROCESSING_ON_EVERY…
Every frame arrival initiatesEvery frame arrival initiates
_DO_NOT_INITIATE_PROCESSING_DO_NOT_INITIATE_PROCESSING No frame arrival initiatesNo frame arrival initiates
PROCESS_IN_RUN_STATE_ONLYPROCESS_IN_RUN_STATE_ONLY Pin must be in KSSTATE_RUNPin must be in KSSTATE_RUN
FRAMES_NOT_REQUIRED…FRAMES_NOT_REQUIRED… Data is not required on this pinData is not required on this pin
Adjusting ProcessingAdjusting Processing
Some mentioned flags usefulSome mentioned flags usefulfor pin-centricfor pin-centric
Most flags useful for filter-centric Most flags useful for filter-centric where all pins are involved in the where all pins are involved in the decision as to when to process datadecision as to when to process data
See the DDK for a complete See the DDK for a complete description of flagsdescription of flags
Understand when processingUnderstand when processinghappens based on your flags! happens based on your flags!
Adjusting ProcessingAdjusting Processing
Processing can happen in a DPC!Processing can happen in a DPC! KSFILTER_FLAG_DISPATCH_LEVEL_PROCESSINGKSFILTER_FLAG_DISPATCH_LEVEL_PROCESSING KSPIN_FLAG_DISPATCH_LEVEL_KSPIN_FLAG_DISPATCH_LEVEL_
PROCESSINGPROCESSING Dispatch level processing still synchronizedDispatch level processing still synchronized
Processing mutex still held during dispatch Processing mutex still held during dispatch level processinglevel processing
Can still be used to synchronize with processingCan still be used to synchronize with processing Data manipulation (stream pointer) API fully Data manipulation (stream pointer) API fully
dispatch level ready!dispatch level ready!
Walkthrough Sample CodeWalkthrough Sample Code
Pin-centric sample codePin-centric sample code
Common ProblemsCommon Problems
Internal mutexes are exposedInternal mutexes are exposed Three mutex types in a hierarchyThree mutex types in a hierarchy
Device MutexDevice Mutex Filter Control MutexFilter Control Mutex Processing MutexProcessing Mutex
Some calls require mutexes heldSome calls require mutexes held Sometimes AVStream holds the mutex for Sometimes AVStream holds the mutex for
you; sometimes you must hold the mutex!you; sometimes you must hold the mutex! See the DDK for this!See the DDK for this!
Common ProblemsCommon Problems
Mutex RulesMutex Rules Do NOT take mutexes out of order: Do NOT take mutexes out of order:
device then control then processingdevice then control then processing Do NOT take a mutex and call out – Do NOT take a mutex and call out –
not for properties, not for anything!not for properties, not for anything! Walking the object hierarchy requires Walking the object hierarchy requires
mutexes held:mutexes held: Device Mutex – device down to filterDevice Mutex – device down to filter Filter Control Mutex – filter down to pinsFilter Control Mutex – filter down to pins
Common ProblemsCommon Problems
Do not traverse the object tree (filters Do not traverse the object tree (filters and pins) during processing!and pins) during processing! KsFilterGetFirstChildPinKsFilterGetFirstChildPin KsPinGetNextSiblingPinKsPinGetNextSiblingPin
Pin-centric filters should not need toPin-centric filters should not need todo this; filter-centric filters have the do this; filter-centric filters have the process pins indexprocess pins index
DirectX 8.0 Versus Windows XPDirectX 8.0 Versus Windows XP
Mutexes in DirectX 8.0 are fast mutexesMutexes in DirectX 8.0 are fast mutexes Certain APIs require mutexes heldCertain APIs require mutexes held Client must be careful of when toClient must be careful of when to
acquire mutexes!acquire mutexes!
Mutexes in Windows XP areMutexes in Windows XP arefull mutexesfull mutexes Completely backwards compatible Completely backwards compatible
with DirectX 8.0 driverswith DirectX 8.0 drivers Less APIs require mutex acquisitionLess APIs require mutex acquisition Mutex acquisition more lenientMutex acquisition more lenient
DirectX 8.0 Versus Windows XPDirectX 8.0 Versus Windows XP
New flags in Windows XPNew flags in Windows XP _SOME_FRAMES_REQUIRED…_SOME_FRAMES_REQUIRED…
One or more pin instances of this typeOne or more pin instances of this typerequires framesrequires frames
Can be programmatically done in DirectX 8.0Can be programmatically done in DirectX 8.0
_PROCESS_IF_ANY_IN_RUN_STATE_PROCESS_IF_ANY_IN_RUN_STATE One or more pin instances of this type must be One or more pin instances of this type must be
>= KSSTATE_RUN; others must be >= >= KSSTATE_RUN; others must be >= KSSTATE_PAUSEKSSTATE_PAUSE
Processing routine must check in DirectX 8.0Processing routine must check in DirectX 8.0
What Can What Can YouYou Do Next? Do Next?
Install the DirectX 8.0 or Install the DirectX 8.0 or Windows XP DDKWindows XP DDK
Try out the samples in the DDKTry out the samples in the DDK Write AVStream minidrivers forWrite AVStream minidrivers for
new hardware!new hardware!
Testing your WDM Driver Testing your WDM Driver with DirectShowwith DirectShow
Eric RudolphEric RudolphSystem Design EngineerSystem Design EngineerDirectShow Editing ServicesDirectShow Editing ServicesMicrosoft CorporationMicrosoft Corporation
AgendaAgenda
DirectShow supports capture from DirectShow supports capture from 1394, USB, analog video/audio, TV 1394, USB, analog video/audio, TV tuner, and custom devicestuner, and custom devices Demonstrate the use of the DirectShow-Demonstrate the use of the DirectShow-
based generic graph editor, GraphEdt, based generic graph editor, GraphEdt, as a WDM driver test toolas a WDM driver test tool
Walk through sample code that uses the Walk through sample code that uses the GraphBuilder COM objectGraphBuilder COM object
What tools exist to test your What tools exist to test your driver?driver? Included in DX8: GraphEdt.exe, a generic Included in DX8: GraphEdt.exe, a generic
graph editorgraph editorAlso in DX8: AmCap.exe, a simple Also in DX8: AmCap.exe, a simple
capture applicationcapture applicationNew for Windows XP: Still Image devices New for Windows XP: Still Image devices
show up in the shell (Explorer)show up in the shell (Explorer)New for Windows XP: Movie Maker (on New for Windows XP: Movie Maker (on
Start Menu)Start Menu)
GraphEdtGraphEdtOverviewOverviewShips with DX8Ships with DX8Provides UI to build dataflow graphs Provides UI to build dataflow graphs
and then uses DirectShow to run, and then uses DirectShow to run, pause, and stop the datapause, and stop the data
Views different filter categoriesViews different filter categories Capture, compressor, crossbar, DMO, and Capture, compressor, crossbar, DMO, and
so onso onConnects different filters togetherConnects different filters togetherAccesses property pagesAccesses property pagesWrites out filesWrites out filesControls 1394 devicesControls 1394 devices
GraphEdtGraphEdtFilter CategoriesFilter Categories Categories enable you to Categories enable you to
easily find a particular type easily find a particular type of DirectShow filterof DirectShow filter
Many categories predefined Many categories predefined in ksuuids.h & uuids.hin ksuuids.h & uuids.h
WDM drivers have many of WDM drivers have many of their own categoriestheir own categories
Capture devices can show up Capture devices can show up in both non-WDM and WDM in both non-WDM and WDM categoriescategories
As you add/remove WDM As you add/remove WDM devices, if they send device devices, if they send device notifications, they will auto notifications, they will auto show/hide from category show/hide from category listslists
GraphEdtGraphEdtProperty PagesProperty Pages The filter itself can expose The filter itself can expose
multiple property pagesmultiple property pages Each pin can expose 1 or Each pin can expose 1 or
more property pagesmore property pages When you query an output When you query an output
pin’s property pages, you pin’s property pages, you will see 1 extra page per will see 1 extra page per pin which lists available pin which lists available output media connection output media connection typestypes
Capture property pages are Capture property pages are often exposed by capture often exposed by capture applications (using applications (using standard DirectShow standard DirectShow methods), so make them methods), so make them look nice!look nice!
Example property pageExample property page
GraphEdt Property PagesGraphEdt Property PagesAnd Media TypesAnd Media Types
Output pins provide one or Output pins provide one or more media typesmore media types
Input pins normally do not Input pins normally do not provide a list of types, but provide a list of types, but instead accept typesinstead accept types
When you render a pin, When you render a pin, DirectShow will try to find DirectShow will try to find appropriate filters to renderappropriate filters to render
When you try to connect two When you try to connect two pins, DirectShow will find try pins, DirectShow will find try and find intermediate filtersand find intermediate filters
The media types must agree The media types must agree between any output pin and between any output pin and its connected input pinits connected input pin
Buffers are also negotiatedBuffers are also negotiated
The different media The different media types Indeo 5.11 types Indeo 5.11 decompressor providesdecompressor provides
Hot unplug while streamingHot unplug while streaming Device add/remove while streamingDevice add/remove while streaming Enter hibernation while streamingEnter hibernation while streaming Multiple camera enumerationMultiple camera enumeration Multiple camera streaming (one Multiple camera streaming (one
driver, multiple devices)driver, multiple devices) Video shows up black or wrongVideo shows up black or wrong Changing display props while Changing display props while
streamingstreaming Overlay and DDraw issuesOverlay and DDraw issues
Common ProblemsCommon Problems
Capture from USB, both with 1 pin Capture from USB, both with 1 pin and with 2 pins (capture & preview)and with 2 pins (capture & preview)
DV capture and device controlDV capture and device control Device Insertion / Removal and how Device Insertion / Removal and how
the Graph refreshesthe Graph refreshes
GraphEdt DemosGraphEdt DemosPart 1Part 1
GraphEdt DemosGraphEdt DemosPart 2Part 2
How to write AVI, WAV, and WM How to write AVI, WAV, and WM filesfiles
New Video Mixing Renderer has New Video Mixing Renderer has slightly different connection model slightly different connection model than old Video Rendererthan old Video Renderer
How to force a filter to produce a How to force a filter to produce a media type with a Type Enforcermedia type with a Type Enforcer
Timestamps are important!Timestamps are important! Using .GRF filesUsing .GRF files
Sample CodeSample CodeUsing the GraphBuilder COM ObjectUsing the GraphBuilder COM Object
CaptureGraphBuilder makes CaptureGraphBuilder makes connecting capture devices connecting capture devices easyeasy
See the AmCap sample code in See the AmCap sample code in the DX8/DirectShow SDK the DX8/DirectShow SDK directorydirectory
Sample code walkthroughSample code walkthrough
What Can What Can YouYou Do Next? Do Next?
Test your WDM drivers! Under many Test your WDM drivers! Under many different conditions!different conditions!
Read up on the DX8 docs, they’re great!Read up on the DX8 docs, they’re great!DirectShow contact:DirectShow contact:
stanpenn@microsoft.comstanpenn@microsoft.comGet on the DirectX A/V listGet on the DirectX A/V list