Geometry Subsystem Design

Lan-Da Van (范倫達), Ph. D.

Department of Computer Science

National Chiao Tung University Hisnchu, Taiwan

Fall, 2018

2018/9/101

Outline

• Geometry Subsystem

• Introduction to Shading Algorithms

• Proposed Low-Complexity Subdivision Algorithm

• Proposed Power-Area Efficient Geometry Engine

• Implementation and Comparison Results

• Summary

Geometry Subsystem

• Process “vertices”

• Transform from “world space” to “image space”

• Compute per-vertex lighting

• The front-end of 3D graphic pipeline

3From http://www.hourences.com/tutorials-

vtx-lighting/

Geometry Subsystem

3D Graphics System

Source: B.-S. Liang, Y.-C. Lee, W.-C. Yeh, C.-W. Jen, "Index rendering: hardware-efficient architecture

for 3-D graphics in multimedia system," IEEE Trans. Multimedia, vol. 4, no. 3, pp. 343-360, Sep. 2002.

VLSI Signal Processing System Design Spectrum

System Level

Algorithm Level

Architecture Level

Circuit Level

Logic Level

Process Level

Introduction to Shading Algorithms

• Gouraud shading

– Per-vertex lighting

– Less computation requirement

– Not good shading quality

• Phong shading

– Per-pixel lighting

– Huge computation requirement

– Smooth and more realistic highlight

• Phong reflection model:

• Phong shading– Has smooth and realistic specular highlight

– Compute reflection model for every pixel in the polygon

– Require much more computation than Gouraud shading

2018/9/107

nssddaa HNIkLNIkIkI )()(

Shading algo. Phong shading Gouraud shading

# of lighting ops. 41,300 pixels 6,200 vertices

• Existing Approximate Phong Shading Algorithms– Taylor expansion based approximate algorithms

– Spherical interpolation based approximate algorithms

– Mixed shading

– Subdivision based approximate algorithms

Mixed shading Subdivision

No pass

Motivation

• Smooth highlight and Phong shading quality with low power consumption is desired.– Gouraud shading possesses lower power consumption but poor

quality.

– Phong shading possesses high quality but consumes more power.

– Until now, no one explores the architecture of subdivision algorithms

• A low complexity subdivision algorithm is proposed for lower power-area and near-Phong shading quality.

• A power-area efficient VLSI architecture of the geometry engine with scalable quality is proposed to provide satisfactory trade-off between shading quality and power consumption.

Proposed Low-Complexity Subdivision Algorithm

• Proposed subdivision algorithm:

(1) Triangle filtering scheme

(2) Forward difference scheme

(3) Edge function recovery scheme

(4) Dual space subdivision scheme

(5) Triangle setup coefficient sharing scheme

Data Flow of the Proposed Low-Complexity Subdivision Algorithm

CullingNo pass

H test

Subdivision

No pass

Discarded

Input triangles

Light vertices

(1) Triangle

filtering scheme

(2) Forward

difference scheme

(4) Dual space

subdivision

scheme

To triangle setup

engine

From GE

Subdivided

triangle?

Setup for normal

triangle

Setup for

subdivided

triangle

Edge function

coefficients/vertex

attribute parameters

To rasterizer

(3) Edge function

recovery scheme

(5) Setup coefficient

sharing scheme

Input triangles

• Eliminate the unnecessary subdivision and culling operations for the generated triangles.– The concept of mixed shading is adopted here.

– Perform culling before subdivision

Triangle Filtering

CullingCulling

Subdivision Using Forward Difference

• Subdivision algorithm using forward difference scheme – Step 1: Compute difference vectors: d1 and d2

– Step 2: Generate vertices using the difference vectors

– Step 3: Pack the vertices into four triangles and output them13

number. leveln Subdivisio :

triangleoriginal theof edgeeach on segments ofnumber The:

Rasterization Anomaly (1/2)

• The forward difference probably incurs rasterization anomaly.

Lost pixel

Rasterization Anomaly (2/2)

• Why the rasterization anomaly happens? – Because of the accumulated numerical errors, vertices A and A’ have

different coordinates.

– The triangles defined by A and A’ are not adjacent to each other.

Edge Function Recovery (1/3)

• Edge function method– Test if a pixel is inside the triangle

– Line equations of edges (edge function)

– Incorrect vertex coordinate leads to wrong edge function • Rasterization anomalies

• Edge function recovery scheme: Derive edge functions of generated

triangles using the coordinate of original vertices.

– Step 1: Compute the edge functions: Eab, Ebc, Eca of the original triangle using edge function

– Step 2: Compute the constant difference values: ∆Cab, ∆Cbc, ∆Cca .

17)))(())(((

1bcbaabbcab

ababkj

yyxxyyxxC

abbaab

abbababa

bbabba

abababab

yxyxy-xxx-yy

y-y-xxx-x-yy

CyBx: AE

0)()()(

0))(())((

– Step 3: Compute edge functions for small triangles: Eai, Eik, Eka, Eib, Ebj, Eji, Ekj, Ejc, Eck using pre-computed original edge functions and the differential values. • For example, for the central small triangle, the edge function Ekj

– Step 4: Render these small triangles using the edge functions

ababkj

kjkjkjjk

C*yB*x: AE

Rendering Results (1/4)

• Teapot

• Pawn

• Venus

• Couch

Computation of Edge Function (1/2)

• Recovery scheme can reduce the complexity of evaluating the edge functions.

abbaab

abababab

C*yB*x: AE

bccbbc

bcbcbcbc

C*yB*x: AE

caacca

cacacaca

C*yB*x: AE

))(*)()(*)((2

bcababbc

bcbaabbc

yyxxyyxx

))(*)()(*)((2

cabcbcca

cacbbcca

yyxxyyxx

))(*)()(*)((2

abcacaab

abaccaab

yyxxyyxx

2 muls + 3 subs 2 muls + 1 subs

Computation of Edge Function (2/2)

• Evaluating one edge function requires:

– 2 multiplications + 3 subtractions = 2 muls + 3 adds

• For a triangle with NS segments on each edge, there are total 3NS

edge functions to be computed.

• Evaluating all edge functions for these triangles requires:

3*NS*(2 muls + 3 adds) = 6*NS muls + 9*NS adds

• With the proposed recovery scheme, the computation only requires:

3*(2 muls + 3 adds) + (3*NS-3) * (1 sub) + 3*(2 muls + 1 add)

= 12 muls + (3*NS+9) adds

Dual Space Subdivision (1/4)

• Transforms in GE

Modelview Transform(Object –> Eye)

Projection Transform(Eye–> Clip)

Perspective Division(Clip –> NDC)

Viewport Transform(NDC -> Window)

110001

34333231

24232221

14131211

object

clipclip

offsetNDCscale

window

• Subdivide triangles in both eye space and window space– Reduce the computation of transforms

– Perspective incorrectly subdivision can be adopted if the error is acceptable.

Eye-space subdivision data flow:

Dual space subdivision data flow:

• Complexity analysis of the eye-space subdivision for one original triangle.– NGV: The number of the generated vertices.

Operations Computational Complexity

Modelview transform for 3 vertices 3x9 muls + 3x9 adds

Normal transform for 3 vertices 3x9 muls + 3x6 adds

Subdivision for 6 components :

Eye coordinate: (xeye, yeye, zeye)

Normal : (xN, yN, zN)

6(4L-1) adds

Projection transform for

NGV+3 vertices5(NGV+3) muls + 3(NGV+3) adds

Perspective division for

NGV+3 vertices3(NGV +3) muls + (NGV+3) invs

Viewport transform for

NGV +3 vertices3(NGV+3) muls + 3(NGV+3) adds

(11 NGV+87) muls

(6 NGV+6x4L+ 57) adds

(NGV+3) invs

• Complexity analysis of the proposed dual space subdivision for one original triangle.

Operations Computational Complexity

Modelview transform for 3 vertices 3x9 muls + 3x9 adds

Normal transform for 3 vertices 3x9 muls + 3x6 adds

Projective transform for 3 vertices 3x5 muls + 3x3 adds

Perspective division for 3 vertices 3x3 muls + 3 invs

Viewport transform for 3 vertices 3x3 muls + 3x3 adds

Subdivision for 10 components:

Eye coordinate: (xeye, yeye, zeye)

Normal : (xN, yN, zN)

Window coordinate: 10(NGV +2) adds

87 muls

(10 NGV +83) adds

3 invs

,,,(clip

windowwindowwindoww

Triangle Setup Coefficient Sharing (1/3)

• Eliminate the unnecessary subdivision and setup operations for vertex attributes

Screen position

Texture coordinate

Depth value

Fog factor

Subdivider

Screen position

Eye space coordinate

Normal

Lighting unit

Sharing

coefficient

Re-setup for

generated

triangles

3x3 matrix inverse and

matrix multiplication for

each attribute for a triangle 3x1matrix

multiplication for

each attribute

• Vertex attributes interpolation– Parameter ui– Perspective interpolation equation

Setup one attribute of a triangle requires one 3x3 matrix multiplication

Setup the coefficients of a triangle requires one 3x3 inverse matrix

CyBxAu

][][ 210

210 yyy

CBAuuu iii

uuuCBA iii

• Level-1 case– Setup one attribute for 4 triangles require 4 3x3 inverse matrix and

multiplication.

• All subdivided triangles are on the same plane– Setup coefficients: Ai, Bi, Ci can be shared.

– Re-setup is required to compute initial point for each triangle.

Re-setup requires one 3x1 multiplication

][** y

CBACyBxAu iiiiii

• Notation definition:

– NT: The number of original visible triangles

– NOT: The number of original triangles for input models

– NGV: The number of new generated vertices in a subdivided triangle

– NA: The number of vertex attributes

– Example:

Complexity Analysis (1/4)

Conventional

subdivision

algorithm

Proposed

subdivision

algorithmUsed schemes

Number of memory accesses (4L+1-1)*NT (2NGV-2L+5)*NTForward

difference

Edge function

evaluation

Muls 6*NS*NT 12*NT Edge function

recovery Adds 9*NS*NT (3*NS+9)*NT

Computation for

transforms

Muls (11NGV+87)*NT 87*NTDual space

subdivisionAdds (6NGV+6x4L+ 57)*NT (10NGV+83)*NT

Invs (NGV+3) *NT 3*NT

Number of culling test

operations1*NOT 1*NOT

Triangle

filtering

Number of 3x3 matrix

multiplications for setupNA*NS

Ceiling

{1/3*NA*NS2+NA}*

coefficient

sharing

• Level-1 case with L=1, NGV=3, NA=5

Conventional

subdivision

algorithm

Proposed

subdivision

algorithm

Complexity

reduction

percentage

Number of memory accesses 15*NT 9*NT 40.00%

Edge function

evaluation

Muls 12*NT 12*NT 0%

Subs 18*NT 15*NT 16.67%

Computation for

transforms

Muls 120*NT 87*NT 27.50%

Adds 99*NT 113*NT -14.14%

Invs 6*NT 3*NT 50.00%

multiplications for setup20*NT 12*NT 40.00%

• Level-2 case with L=2, NGV=12, NA=5

Conventional

subdivision

algorithm

Proposed

subdivision

algorithm

Complexity

reduction

percentage

Number of memory accesses 63*NT 25*NT 68.88%

Edge function

evaluation

Muls 24*NT 12*NT 50.00%

Subs 36*NT 21*NT 41.67%

Computation for

transforms

Muls 219*NT 87*NT 60.27%

Adds 225*NT 203*NT 9.78%

Invs 15*NT 3*NT 80.00%

multiplications for setup80*NT 32*NT 60.00%

Proposed Power-Area Efficient Geometry Subsystem

• Proposed GE Architecture

• Proposed Primitive Processing Unit (PPU)

• Proposed Vertex Processing Unit (VPU)

– Reconfigurable Datapath (RDP)• light_dp

• trans_dp

• vec_norm

• pd

• POW

• vec_sub

Proposed GE Architecture

• Transforms• Lighting • Object space culling

• Subdivision

Proposed GE Architecture

• Hardware feature

– Power-area efficient design• Achieve power-area efficiency (PAE): 545.1 Kvertices/(s*mW*mm2)

– Subdivision-based scalable shading quality support• Support level-0, level-1 and level-2

– High performance and area efficient vertex processing unit with reconfigurable datapath (RDP)• Speed up complicated operations. EX: vector normalization

• Hardware reusing

Proposed Primitive Processing Unit

d1=> Reg_Hdiff

d2=> Reg_Vdiff

Proposed Vertex Processing Unit

Proposed Reconfigurable Datapath(RDP)

• Key components :

– Processing elements (PE)

– Special function unit (SFU)

– FIFO

• Configurations:

Configuration Modes Description

light_dp Dot product for lighting

trans_dp Dot product for transform

vec_norm Vector normalization

pd Perspective division

POW Powering

vec_sub Vector subtraction

Proposed Vertex Processing Unit

• Features

– High performance

• Peak transform performance: 50Mvertices/s

• Construct ASIC like datapath for high performance vertex processing via reconfigurable datapath.

– Area efficient

• Provide different operations for vertex processing with the same set of PEs.

Proposed Processing Element (PE)

Configuration inside PE

• MUL

• MAC

• ADD/SUB

Configurations between PEs

• To clearly explain interconnection between PEs, a simplified block diagram PE is given.

• light _dp

2*1+2*1+2*1=]2,2,2[•]1,1,1[ ZZYYXXZYXZYX

• trans_dp

1+2*1+2*1+2*1=]1,2,2,2[•]1,1,1,1[ WZZYYXXZYXWZYX

• vec_sub

]2,2,2[-]1,1,1[ ZYXZYX

• vec_norm

222 111

[])1,1,1([

ZYXLength

Length

XZYXnorm

• Pd (perspective division)

1[]1,1,1[

Special Function Unit

• Log Number System and Operations:– Inverse

– Inverse square root

– Power (configured with 1 PE)

Chip Implementation Result

Power Supply 1.8V

Max. Clock 100 MHz

Max. Power 28.3 mW with level-1

Gate Count 183,748

Core Area 2.73 mm2

Process

Technology

TSMC 0.18 um

CMOS Process

VC ram1Ram2

Reg Bank

Constant Mem

Comparison Results

Level-0 Level-1 Level-2

Comparison Results

JSSC 2006 [2]

JSSC 2007 [3]

ISSCC 2007[4]

JSSC 2008 [5]

This Work

level-0 level-1 level-2

Process (nm) 180 180 180 180 180

Frequency (MHz) 200 100 200 50 100

Polygon Rate (Mvertices/s) 50 120 141 25*1/12.5*2 50*1/25*2

Power (mW) 155*3 157 52.4 8.6 28.3 33.6 43.6

Core Area (mm2) 23 16 9.7 6.05*4 2.73

Power-Area Efficiency (Kvertices/(s•mW•mm2)) 14 47.8 227 480.5 647.2 545.1 420.1

Feature Graphics Graphics Graphics Graphics, DSP

Graphics with scalable-quality hardware support

*1: With cache hit rate of 50%. *2: With cache hit rate of 0%.

*3: Include rendering engine. *4: With the core area of 2.164mmx2.797mm and see acknowledgement.

) (mm Core AreaPower (mW)

)Kvetices/sransform (Geomerty Trmance of Peak PerfoPAE

Conclusions

• Proposed an efficient subdivision algorithm • Low complexity

– The reduction of the number of memory accesses can be attained by 44.44% and 68.89% for level-1 and level-2, respectively.

– The reduction of the number of multiplications for transforms can be attained by 27.50% and 60.27% for level-1 and level-2, respectively.

• Scalable and near Phong shading quality

• Proposed power-area efficient geometry engine – Compared with [2-5], the proposed geometry engine has better power-area

efficiency with 545.1 Kvertices/(smWmm2) for level-1 subdivision.

– Compared with work in [5], the proposed geometry engine can increase the power-area efficiency by 34.7%, 13.4%, and -12.6% with level-0, level-1, level-2, respectively.

2018/9/1057

Reference

• [1] F. Arakawa et al., “An embedded processor core for consumer applications with 2.8 GFLOPS and 36 Mpolygons/s FPU,” IEEE ISSCC, Feb. 2004, pp. 334–335.

• [2] J. Sohn et al., “A 155-mW 50-Mvertices/s graphics processor with fixed-point programmable vertex shader for mobile applications,” IEEE J. Solid-State Circuits, vol. 41, no. 5, pp. 1081–1091, May 2006.

• [3] C. H. Yu, K. Chung, D. Kim and L. S. Kim, "An Energy-Efficient Mobil Vertex Processor With Multithread Expanded VLIW Architecture and Vertex Caches," IEEE J. Solid-State Circuits, vol. 42, no. 10, Oct. 2007.

• [4 ]B. G. Nam, J. Lee, K. Kim, S. J. Lee, and H.-J. Yoo, “A 52.4 mW 3-D graphics processor with 141 Mvertices/s vertex shader and 3 power domains of dynamic voltage and frequency scaling,” ISSCC 2007, pp. 278-603.

• [5 ]S. Y. Chien, Y. M. Tsao, C. H. Chang and Y. C. Lin, “An 8.6 mW 25 Mvertices/s 400-MFLOPS 800-MOPS 8.91 mm2 Multimedia Stream Processor Core for Mobile Applications,“ IEEE J. Solid-State Circuit, vol. 43, issue. 9, pp. 2025-2035, Sep. 2008.

Geometry Subsystem Design

Documents

Transcript of Geometry Subsystem Design

Law 50 Benefit Subsystem Detailed System Design

Subsystem Level Design Review

Week 9 Subsystem Design Review

National Ignition Facility SubSystem Design Requirements ...

P14415: Subsystem Design Review

5.4 Component and Subsystem Design 5.4.1 Reactor Coolant …

Design and Analysis of IP-Multimedia Subsystem (IMS)

Design and Analysis of IP-Multimedia Subsystem (IMS)cdn.intechopen.com/pdfs/36581/InTech-Design_and_analysis_of_ip... · Design and Analysis of IP-Multimedia Subsystem (IMS) 69 2.3

Design of Synchronization Subsystem for an Ultra Wideband ...

Mechanical, Power, and Propulsion Subsystem Design · PDF fileMechanical, Power, and Propulsion Subsystem Design ... Figure 8 – Micro PPT CAD drawing ... subsystems of a satellite

Synchronous Deployed Solar Sail Subsystem Design · PDF fileAmerican Institute of Aeronautics and Astronautics 1 Synchronous Deployed Solar Sail Subsystem Design Concept Jeremy A.

VLSI subsystem design processes and illustration

GIFTS Blackbody Subsystem Critical Design Review Blackbody Controller

Mechanical, Power, and Thermal Subsystem Design … Power, and Thermal Subsystem Design . ... stress analysis. ... SolidWorks, we performed analysis of possible designs.

SDO Preliminary Design Review: Propulsion Subsystem

5.4 Components and Subsystem Design

Design and Integration of Communication subsystem for Pratham

UNIT 6: Subsystem Design Processes Illustrationvtu.allsyllabus.com/ECE/sem_5/CMOS_VLSI/Uni6.pdf · UNIT 6: Subsystem Design Processes Illustration ... The nature of architectures

Design and Analysis of Satellite Subsystem Supporting ...

Design of IP Multimedia Subsystem for Educational … › ~maguire › .c › DEGREE-PROJECT-REPORTS › ...Design of IP Multimedia Subsystem for Educational Purposes Mikael Rudholm