Next generation mobile gp us and rendering techniques - niklas smedberg

44
Game Connection, October 29-31, 2014 Next-gen Mobile GPUs and Rendering Techniques Niklas Smedberg Senior Engine Programmer, Epic Games

description

This session provides a detailed analysis of next-generation mobile graphics hardware and points out special caveats and opportunities that are specific to mobile. This is followed by an in-depth explanation of advanced rendering techniques that were previously only considered for high-end PCs or dedicated gaming consoles, but which have now been brought over to mobile devices in Unreal Engine 4 - including features such as physically-based rendering, HDR and image-based lighting.

Transcript of Next generation mobile gp us and rendering techniques - niklas smedberg

Page 1: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Next-gen Mobile GPUs and Rendering Techniques

Niklas SmedbergSenior Engine Programmer, Epic Games

Page 2: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Introduction

• Niklas Smedberg, a.k.a. “Smedis”– 17 years in the game industry– Graphics programmer since the C64 demo scene

Page 3: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Content

• Next-gen Hardware– Tile-based GPU vs direct-rendering GPU

• Next-gen Rendering Techniques– Unreal Engine 4– Previous goal: Bringing AAA Console graphics to mobile

(DONE!)– New goal: Bringing AAA PC graphics to mobile (DONE!??)

Page 4: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Next Generation Mobile Hardware

• Big leap in features and performance

• Full-featured (e.g. OpenGL ES 3.1+AEP, DirectX 10/11)

• Peak performance now comparable to consoles (Xbox 360/PS3)– About 300+ GFLOPS and 26 GB/s

• New goal: Bring AAA PC graphics to mobile

Page 5: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Performance Trends (FP16 GFLOPS)

2010 2011 2012 2013 20140

50

100

150

200

250

300

350

6.4 12.825.5

154

300+ 2010 SGX 5352011 SGX 543MP22012 SGX 543MP32013 G64302014 Adreno, K1, GX6650

Page 6: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

iOS 8 Metal

• New rendering API for iOS• Better match for the hardware

– Like a “game console API”– Much more efficient on the CPU– Exposes hardware features

• 20x faster on our rendering thread– OpenGL ES API: 30% of renderthread– Metal API: 1.6% of renderthread

• All power and perf responsibility is now in the hands of the developer!

Page 7: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Tech Demo: Zen Garden

Page 8: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Tile-based Mobile GPU

• Mobile GPUs are usually tile-based (next-gen too)– Tile-based: ImgTec, Qualcomm*, ARM– Direct: NVIDIA, Intel, Vivante

* Qualcomm Adreno can render either tile-based or direct to frame buffer– Extension: GL_QCOM_binning_control

Page 9: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Tile-Based Mobile GPU

Summary:• Split the screen into tiles

– E.g. 32x32 pixels (ImgTec) or 300x300 (Qualcomm)

• The whole tile fits within GPU, on chip

• Process all drawcalls for one tile– Write out final tile results to RAM

• Repeat for each tile to fill the image in RAM

Page 10: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

ImgTec Tile-based Rendering Process

Game Vertex Processing

Tile Data (RAM)

Pixel Processing (Top-most only)

Frame Buffer (RAM)

Cmd Buffer (RAM)

Hidden Surface Removal

Tile Memor

y

Per Tile:

Page 11: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

ImgTec Series 6 G6430

• One GPU core• Four shader units (USC)

– 16-way scalar

• FP16: Two SOP per clock– (a*b + c*d)

• FP32: Two MADD per clock– (a*b + c)

• 154 GFLOPS @ 400 MHz– 16-bit floating point

Page 12: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

FP16 Is Faster Than FP32

• ImgTec Series 6: 50% faster– FP16 pipeline: Two SOP per clock– FP32 pipeline: Two MADD per clock

• ImgTec Series 6XT: 100% faster– FP16 pipeline: Four MADD per clock– FP32 pipeline: Two MADD per clock

• Qualcomm Snapdragon: 100% faster

Page 13: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

ImgTec Rendering Tips

• Hidden Surface Removal– For opaque only– Don’t keep alpha-test enabled all the time– Don’t keep “discard” keyword in shader source, even if it’s not

used

• Group opaque drawcalls together

• Sort on state, not distance

Page 14: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Framebuffer Resolve/Restore

• Expensive to switch Frame Buffer Object on Tile-based GPUs– Saves the current FBO to RAM– Reloads the new FBO from RAM

• Best performance:– A single rendertarget for the entire frame– No post-processing passes

• Does not apply to NVIDIA Tegra GPUs!– This made it simpler for us to make our “Rivalry” tech demo for K1

Page 15: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Framebuffer Resolve/Restore

• Clear ALL FBO attachments after new frame/rendertarget– Clear after eglSwapBuffers / glBindFramebuffer– Avoids reloading FBO from RAM– NOTE: Do NOT clear unnecessary on non-tile-based GPUs (e.g.

NVIDIA)

• Discard unused attachments before new frame/rendertarget– Discard before eglSwapBuffers / glBindFramebuffer– Avoids saving unused FBO attachments to RAM– glDiscardFramebufferEXT / glInvalidateFramebuffer

Page 16: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

iOS Performance Profiling

• Screenshot from Xcode, which shows:– How we clear FBO at the beginning of every render pass– Other important performance info

Page 17: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Programmable Blending

• GL_EXT_shader_framebuffer_fetch (gl_LastFragData)• Reads current pixel background “for free”• Potential uses:

– Custom color blending– Blend by background depth value (depth in alpha)

• E.g. Soft intersection against world geometry for particles– Tone-mapping within tile-memory– Deferred shading without resolving G-buffer

• Stay on GPU and avoid expensive round-trip to RAM• See also: GL_EXT_shader_pixel_local_storage

Page 18: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014Game Developer Conference, March 25-29, 2013

Soul: Programmable Blending

Page 19: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Core Optimization: Opaque Draw Ordering

• All platforms,1. Group draws by material (shader) to reduce state changes

• Then for all platforms except ImgTec,2. Skybox last: 5 ms/frame savings (vs drawing skybox first)3. Sort groups nearest first : extra 3 ms/frame savings4. Sort inside groups nearest first : extra 7 ms/frame savings

• (Timings from a UE4 map on a Qualcomm-based device)

Page 20: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Optimization: Resolution

• Resolution does not match GPU performance!• Use custom resolution to match perf/pixel• Examples of same GPU:

– iPhone 5S = 0.7 Mpix (1136x640)

– iPad Air = 3.1 Mpix (2048x1536), more than 4 times slower!

• Other examples:

– Prev-Gen Console = 0.9 Mpix (1280x720)

– Current-Gen Console = 2.1 Mpix (1920x1080)

Page 21: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Single Content, Multiple Platforms

• Core motivating factor in designing UE4– Authoring consistency between PC and Mobile

• Authoring environment for both platforms– Physically-based shading model– High dynamic range linear color space– High quality post processing

Page 22: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014Game Developer Conference, March 25-29, 2013

Comparison: PC

Page 23: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014Game Developer Conference, March 25-29, 2013

Comparison: Mobile

Page 24: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Consistency: Material Editor

• One material for many platforms– Artist authored feature

levels to scale shader perf from PC to mobile

• Using cross compiler tool to retarget HLSL into GLSL

Page 25: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014Game Developer Conference, March 25-29, 2013

Directional Light + SDF Shadows

Page 26: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Consistency: HDR Directional Lightmaps

• Two compressed + One uncompressed textures:1. HDR color with log luma encoding2. World space 1st order spherical harmonic luma directionality3. SDF shadow texture

• Optimized for Mobile and PVRTC compression:– PVRTC (ImgTec) lacks separately compressed encoding for alpha– PC color: RGB/Luma, LogLuma in Alpha– Mobile color: RGB/Luma * LogLuma, no Alpha (LogLuma derived

with ALU)– Mobile: A single dynamic directional light as the “sun”

Page 27: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014Game Developer Conference, March 25-29, 2013

Image-Based Lighting

Page 28: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Image-Based Lighting

• Selects a mip-level in an IBL cubemap based on per-pixel roughness – Same as PC, filtering same as PC

• PC: FP16• Mobile: Decode(RGBM)^2• PC: Blend multiple cubemaps per surface and parallax

correction• Mobile: One infinite-distance cubemap per object

Page 29: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

FP16 RGB=Color A=Depth

Mobile Post Processing Pipeline

Anti-Aliasing

Tonemapping

1/2 DOF Filter

1/2 DOFDownsample

1/4 Final Filter Passes and Merge {Light Shafts, Bloom, Vignette}

Bloom Filter Tree

1/4 Lightshaft Filter, Pass 2

1/4 Lightshaft Filter, Pass 1

1/4 DOF Near Dilation

1/4 Smart Reduction

Light Shaft (Sun) MaskA=Depth to A=CoC+Sun

[conversion done on chip if possible]

1/8

1/16

1/32

1/64

1/32

1/16

1/8

Page 30: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014Game Developer Conference, March 25-29, 2013

Mobile Depth of Field

Page 31: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Packed Circle of Confusion and Sun Intensity

• Optimization done for Depth of Field and Light Shafts– Represent sun intensity and circle of confusion (CoC) in one FP16 value– Saves needing an extra render target

Depth (0 to 65504)

CoC (0 to 1) Sun Intensity (1 to 65504)

0=Max Near Bokeh

0.5=In Focus

1=Max Far Bokeh and No Sun

65504=Max Sun (still at Max Far Bokeh)

Page 32: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014Game Developer Conference, March 25-29, 2013

Vignette + Bloom + Light Shafts

Page 33: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Vignette + Bloom + Light Shafts: Optimized

• Composited at quarter resolution– Using pre-multiplied alpha FP16– Also performs the final filtering passes for bloom and light shafts

• Optimized to minimize rendertarget switches

• 3 effects applied to scene with one full-res TEX fetch– Applied in the tonemap pass

Page 34: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Mobile Light Shafts

• Filtered at quarter resolution• Monochromatic with artist controlled tint on final composite

– Bloom and light shaft down-sample pass shared– RGB = color for bloom, A = light shaft intensity

• Light shaft filter runs in tonemapped space (8-bit/channel)– Applied in linear (reverse tonemap before tint and composite)

• Filtering done by 3 passes of 8 taps

Page 35: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014Game Developer Conference, March 25-29, 2013

Mobile Bloom

Page 36: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Mobile Bloom

• Lower quality version to minimize resolves– Limited effect radius and less passes

• Turns out to be faster and higher-quality– Standard hierarchical algorithm with some optimizations– Down-sample from 1:1/4 res first (shared with light shaft)– Then down-sample in 1:1/2 resolution passes– Single pass circle-based filter (instead of two-pass

Gaussian)• 15 taps on circle during down-sampling• 7 taps for both circles during up-sample+merge pass

Page 37: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014Game Developer Conference, March 25-29, 2013

Film Post Tonemapping

Page 38: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Film Post

Film Post Tonemapping: Mobile Shader

RGBA8 LDR sRGB Output

Quantization Grain

(optional)

Linear to sRGB

Tint, Tonemap Curve

Color Matrix {Saturation,

Channel Mixer} (optional)

Shadow Tint (optional)

Film Grain(optional)

Blend in {Bloom, Vignette, Light Shafts}

(optional)

1/4 Pre-multiplied Alpha

Film Grain Jitter

(optional)

Blend in DOF

(optional) 1/2 DOF

RGBA16F HDR Linear Color

Page 39: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014Game Developer Conference, March 25-29, 2013

Anti-Aliasing

Page 40: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Anti-Aliasing on Mobile

• Using a spatial & temporal anti-aliasing filter– 2x temporal super-sampling (higher quality surface shading)– Blending two jittered frames after tonemapping (32 bpp,

perceptual color space)– Some special logic to remove ghosting/judder (not using

motion re-projection)

• Not currently using MSAA– Needed super-sampled shading for HDR physically based shading

model

Pixel

Page 41: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014Game Developer Conference, March 25-29, 2013All Techniques Combined

Page 42: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Android Extension Pack

• Released with Android “L”• OpenGL ES 3.1 plus a specific set of extensions

– Compute shaders– Tessellation– ASTC– Detailed spec on www.khronos.org: GL_ANDROID_extension_pack_es31a

• Full UE4 desktop rendering pipeline on Android!– Deferred rendering with G-buffer– Physically-based shading– Image-based lighting– Solid 30 FPS on NVIDIA K1 mobile GPU

Page 43: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Tech Demo: Rivalry

Page 44: Next generation mobile gp us and rendering techniques - niklas smedberg

Game Connection, October 29-31, 2014

Unreal Engine 4

Full source code available!

unrealengine.com

Includes all C++, shaders, tools, content$19/mo