Dr A Sahu Dept of Comp Sc & Engg. IIT Guwahati 1.

43
Video/Graphics of Modern Desktop Board & its Linux programming Dr A Sahu Dept of Comp Sc & Engg. IIT Guwahati 1

Transcript of Dr A Sahu Dept of Comp Sc & Engg. IIT Guwahati 1.

Video/Graphics of Modern Desktop Board & its Linux programming

Dr A SahuDept of Comp Sc & Engg.

IIT Guwahati

1

Outline• Intel 945 Motherboard architecture• GMCH• ICH7 (8254,8259,8237)• PCI and PCI Express• Video Ram, In build GPU• DirectX, OpenGL, OpenCL• Advance GPU from ATI and AMD– Introduction to Nvidia Cuda Programming

2

Intel 945 Express Chipset

4 Serial ATA Ports

Integrated Matrix Storage

Technology

6 PCI Slots

BIOS Support

Intel HD Audio

8 high Speed USB Ports

6 PCI Express*x1 slot

Intel Pro 100/1000 LAN

Intel Active Mngement Tech.

82801 GRICH7 (io cont. hub sys7)

South Bridge

Intel Pentium D Processor

DDR2

DDR2

Support for Media Ext Card

Intel GMA 950 Graphics

PCI Express* x16 Graphics

82945GMCH/MCHNorth Bridge

3

82945 : GMCH/MCH

• Graphics and Memory Controller Hub• Graphics Interface (GI) and PCI Express for

Graphics card support • Host Interface (HI)– Connect to processor and support HT, IntrDelivery,

12 in-order queue, etc.• System Memory Interface (SMI)– Connected to two channel DDR2

• Direct Media Interface (DMI)– Connect to ICH7 4

82801: ICH7 • IO Controller HUB version 7 (South Bridge)• Enhance DMA controller, IC and timer – Two cascaded 8259 PIC – One 82C54 PIT (Motorola)– One 8237 DMA

• Low Pin count (LPC) Interface • PCI and PCI express (Peripheral Component. Int)• AC97 & HD Audio Codec• Serial Peripheral Interface (SPI) Support • Firm wire support (BIOS)• ACPI, SATA, USBs 5

Introduction

• Peripherals : HD monitor• Interfaces : Intermediate Hardware – Nvidia GPU card

• Interfaces : Intermediate Software/Program– Nvidia GPU driver

Intel Pentium D Processor

DDR2

DDR2

Support for Media Ext Card

Intel GMA 950 Graphics

PCI Express* x16 Graphics

82945GMCH/MCHNorth Bridge

6

Migration from Char to Graphics/Video

• Char display (80x25 char, 5x7pixel=400x175)• CRT Monitor (400x600, 640x480,600x800)• LCD Monitor (1024x768,1280x1024,…)• Graphics visually more appealing • Display Line, Circle, Rectangle, Curve, Polygon– Character using this primitives– True type font

RED ARROWCircle

7

Multiplexed 1024x768 pixel display

1024x768 Pixel LCD

0 1 2 3 4 ….. …1023012

767

R

Row Ctr

ColCtr CLK > 1024x768x50Hz

B G8x3=24 Bits

Frame Buffer

Refresh screen 50 time a Sec 8

Frame Buffer (24 Bit Pixel)

Pixels in Frame Buffer

Pixels on the Screen

24 Bit Per Pixels

Graphical representation of 24 bit color

9

Graphics Cards• GPU : specialized processor that accelerates

3D or 2D graphics primitives operations• Lots of Floating point operations• Accelerates Primitives – Line, circle, polygon, mesh, projection, sphere,

10

Graphics System 3D application

3D API: OpenGL

DirectX/3D

3D API Commands

CPU-GPU Boundary

GPU Command& Data Stream

GPU Command

PrimitiveAssembly

Rastereisation Interpolation

RasterOperation

Frame Buffer

Programmable Fragment Processors

ProgrammableVertex

Processor

Vertex Index Stream

Assembled polygon, line & points

Pixel Location Stream

Pixel Updates

Transformed Fragments

Rastorized PretransformedFragments

transformedVerticesPretransformed

Vertices

11

Graphics System

Memory System

Texture Memory

Frame Buffer

Vertex Processing

Pixel Processing

Vertices(x,y,z)

PixelR, G,B

Vertex Shadder

Pixel Shadder

12

Access to video memory

• We create a Linux device-driver that gives applications access to graphics frame-buffer

• Accessing Frame buffer through PCI Express slot

• Assume a Graphics card is installed in your system

13

The role of a device-driver

userapplication

standard“runtime”libraries

call

ret

user space kernel space

Operating Systemkernel

syscall

sysret

device-drivermodule

callret

hardware device

outin

i/o memory

RAM

A device-driver is a software module that controls a hardware device in response to OS kernel requests relayed, often, from an application

14

Raster Display TechnologyThe graphics screen is a two-dimensional array of picture elements (‘pixels’)

Each pixel’s color is an individually programmable mix of red, green, and blue

These pixels are redrawn sequentially, left-to-right, by rows from top to bottom

15

Special “dual-ported” memory

VRAM

RAM

CPU

CRT

16-MB of VRAM

2048-MB of RAM

16

How much VRAM is needed?• This depends on– the total number of pixels – the number of bits-per-pixel

• The total number of pixels – Determined by the screen’s width and height– 1280-by-960= 1,228,800 pixels

• The number of bits-per-pixel (“color depth”) is a programmable parameter (varies from 1 to 32)

• Certain types of applications also need to use extra VRAM – for multiple displays, or for “special effects” like

computer game animations17

How ‘truecolor’ works

R

B

G

alpha red green blue081624

pixel

longword

The intensity of each color-component within a pixel is an 8-bit value

0.5, 0, 1, 0

0, 0.5, 0

Alpha represent pre-multiplied valued

18

x86 uses “little-endian” order

B G R A B G R A B G RVRAM0 1 2 3

Video Screen

4 5 6 7 8 9 10

“truecolor” graphics-modes use 4-bytes per picture-element

19

Some operating system issues

• Linux is a “protected-mode” operating system• I/O devices normally are not directly accessible • Linux on x86 platforms uses “virtual memory” • Privileged software must “map” the VRAM• A device-driver module is needed: ‘vram.c’• We can compile it using: $ mmake vram• Device-node: # mknod /dev/vram c 98 0• Make it ‘writable’: # chmod a+w /dev/vram

20

Our ‘vram.c’ module

• It’s a character-mode Linux device-driver• It implements four device-file ‘methods’:– ‘read()’: lets a program read from video memory– ‘write()’: lets a program write to video memory– ‘llseek()’: lets a program ‘move’ the file’s pointer– ‘mmap()’: lets a program ‘map’ vram to user-space

• It also implements a pseudo-file that lets users view the RADEON X300 graphics controller’s PCI Configuration Space parameter-values:

$ cat /proc/vram

21

What is PCI?

• It’s an acronym for “Peripheral Component Interconnect” and refers to a collection of industry standards for devices used in PCs

• An Intel-sponsored initiative (from 1992-9) having several ambitious goals:

• Reduce diversity inherent in legacy PC devices• Improve speed and efficiency of data-transfers• Eliminate (or reduce) platform dependencies• Simplify adding/removing peripheral adapters• Lower PC’s total consumption of electrical power

22

PCI Configuration Space

PCI Configuration Space Body(48 doublewords – variable format)

64doublewords

PCI Configuration Space Header(16 doublewords – fixed format)

A non-volatile parameter-storage area for each PCI device-function

23

Example: Header Type 0

StatusRegister

CommandRegister

DeviceID

VendorID

BISTCacheLineSize

Class CodeClass/SubClass/ProgIF

RevisionID

Base Address 0

SubsystemDevice ID

SubsystemVendor ID CardBus CIS Pointer

reserved capabilitiespointer Expansion ROM Base Address

MinimumGrant

InterruptPin reserved

LatencyTimer

HeaderType

Base Address 1

Base Address 2Base Address 3

Base Address 4Base Address 5

InterruptLine

MaximumLatency

31 0 31 0

16 doublewords

Dwords

1 - 0

3 - 2

5 - 4

7 - 6

9 - 8

11 - 10

13 - 12

15 - 14

24

Examples of VENDOR-IDs• 0x8086 – Intel Corporation• 0x1022 – Advanced Micro Devices, Inc• 0x1002 – Advanced Technologies, Inc (My office machine)• 0x10EC – RealTek, Incorporated • 0x10DE – Nvidia Corporation• 0x10B7 – 3Com Corporation• 0x101C – Western Digital, Inc• 0x1014 – IBM Corporation• 0x0E11 – Compaq Corporation• 0x1057 – Motorola Corporation• 0x106B – Apple Computers, Inc• 0x5333 – Silicon Integrated Systems, Inc

25

Examples of DEVICE-IDs

• 0x5347: ATI RAGE128 SG• 0x4C58: ATI RADEON LX• 0x5950: ATI RS480• 0x436E: ATI IXP300 SATA• 0x438C: ATI IXP600 IDE• 0x5B60: ATI Radeon HD 3200 Graphics

See this Linux header-file for lots more examples: </usr/src/linux/include/linux/pci_ids.h>

26

Defined PCI Class Codes• 0x00: Legacy Device (i.e., built before class-codes were defined)• 0x01: Mass Storage controller • 0x02: Network controller• 0x03: Display controller• 0x04: Multimedia device• 0x05: Memory Controller• 0x06: Bridge device• 0x07: Simple Communications controller• 0x08: Base System peripherals• 0x09: Input device• 0x0A: Docking stations• 0x0B: Processors• 0x0C: Serial Bus controllers• 0x0D: Wireless controllers• 0x0E: Intelligent I/O controllers • 0x0F: Encryption/Decryption controllers• 0x10: Satellite Communications controllers• 0x11: Data Acquisition and Signal Processing controllers

27

Example of Sub-Class Codes

• Class Code 0x01: Mass Storage controller– 0x00: SCSI controller– 0x01: IDE controller– 0x02: Floppy Disk controller– 0x03: IPI controller– 0x04: RAID controller– 0x80: Other Mass Storage controller

28

Example of Sub-Class Codes

• Class Code 0x02: Network controller– 0x00: Ethernet controller– 0x01: Token Ring controller– 0x02: FDDI controller– 0x03: ATM controller– 0x04: ISDN controller– 0x80: Other Network controller

29

Example of Sub-Class codes

• Class Code 0x03: Display Controller– 0x00: VGA-compatible controller– 0x01: XGA controller– 0x02: 3D controller– 0x80: Other display controller

30

Hardware details may differ• Graphics controllers use vendor-specific

mechanisms to perform similar operations• There’s a common core of compatibility with

IBM’s VGA (Video Graphics Array) developed in the mid-1980s

• But since IBM’s loss of market dominance, each manufacturer has added enhancements which employ incompatible programming interfaces

• You need a vendor’s manual! (Download from vendor site)

31

The ‘frame-buffer’

• Today’s PCI graphics systems all provide a dedicated amount of display memory to control the screen-image’s pixel-coloring

• But how much memory will vary with price• And its location within the CPU’s physical

address-space can’t be predicted because it depends upon what other PCI devices are installed (and mapped) during startup

32

The ‘base address’ fields

• The PCI Configuration Header has several so-called Base Address fields, and vendors use one of these to hold the frame-buffer’s starting address and to indicate how much vram the video controller can actually use

• The Linux kernel provides driver-writers with some convenient functions for getting the location and size of the frame-buffer

33

ATI Radeon uses Base Address 0• Our ‘vram.c’ module’s initialization routine

employs these kernel helper-functions:

#include <linux/pci.h>struct pci_dev *devp; // for a variable that will point to

//a kernel-structure// get a pointer to the PCI device’s Linux data-structure

devp = pci_get_device( VENDOR_ID, DEVICE_ID, NULL );if ( !devp ) return –ENODEV; // device is not present

// get starting address and length for memory-resource 0 vram_base = pci_resource_start( devp, 0 );vram_size = pci_resource_len( devp, 0 );

34

Reading from ‘vram’

• You can use our ‘fileview’ utility to see the current contents of the video frame-buffer

$ fileview /dev/vram

• Our ‘vram.c’ driver’s ‘read()’ method gets invoked when an application-program attempts to ‘read’ from the ‘/dev/vram’ device-file

• The read-method is implemented by our driver using ‘ioremap()’ (and ’iounmap()’) to temporarily map a 4KB-page of physical vram to the kernel’s virtual address-space

35

I/O ‘memcpy()’ functions

• Linux provides a ‘platform-independent’ way to do copying from an i/o-device’s memory into an application’s buffer (or vice-versa):– A ‘read’ copies from vram to a user’s buffer

memcpy_fromio( buf, vaddr, len );

– A ‘write’ copies to vram from a user’s buffermemcpy_toio( vaddr, buf, len );

36

‘mmap()’

• This is a standard UNIX system-call that lets an application ‘map’ a file into its virtual address-space, where it can then treat the file as if it were an ordinary array

• See the man-page: $ man mmap• This same system-call can also work on a

device-file if that device’s driver provided ‘mmap()’ among its file-operations

37

The user-role

• In the application-program, six arguments get passed to the ‘mmap()’ library-function

int mmap( (void*)baseaddress, int memorysize, int accessattributes, int flags,int filehandle,int offset );

38

The driver-role

• In the kernel, those six arguments will get validated and processed, then the driver’s ‘mmap()’ callback-function will be invoked to supply missing information and perform further sanity-checks and do appropriate page-mapping actions:

int mmap( struct file *file,struct vm_area_struct *vma );

39

Our driver’s codeint mmap( struct file *file, struct vm_area_struct *vma ){

// extract the paramers we will need from the ‘vm_area_struct’unsigned long region_length = vma->vm_end – vma->vm_start;unsigned long region_origin = vma->vm_pgoff * PAGE_SIZE;unsigned long physical_addr = fb_base + region_origin;unsigned long user_virtaddr = vma->vm_start;

// sanity check: mapped region cannot extend past end of vramif ( region_origin + region_length > fb_size ) return –EINVAL;

// tell the kernel not to try ‘swapping out’ this region to the diskvma->vm_flags |= VM_RESERVED;

// tell the kernel to exclude this region from any core dumpsvma->vm_flags |= VM_IO;

40

Driver’s code continued

// invoke a helper-function that will set up the page-table entriesif ( remap_pfn_range( vma, user_virtaddr, physical_addr >> 12,

region_length, vma->vm_page_prot ) ) return –EAGAIN;

return 0; // SUCCESS}

41

Demo: ‘rotation.cpp’

• This application-program will demonstrate use of our ‘vram.c’ device-driver’s ‘read()’, ‘write()’ and ‘llseek()’ methods (i.e., device-file operations)

• It will perform a rotation of the color-components (R,G,B) in every displayed ‘truecolor’ pixel:

R GG B B

R• After 3 times the screen will look normal again

42

Thanks

43