Advanced Mobile Optimizations.ppt
-
Upload
artem-gerasimovich -
Category
Technology
-
view
692 -
download
2
description
Transcript of Advanced Mobile Optimizations.ppt
- 1. Advanced Mobile Optimizations How to go to 60 fps after you have removed all Sleep calls ;-)
2. Disclaimer
- The views expressed here are my personal views and do not necessarily reflect the thoughts, opinions, intentions, plans or strategies ofUnity
3. Optimization Mindset
- you can't just make your game faster
-
- there is no magic bullet
-
- very specific stuff
-
-
- not the same as scripting charachter
-
4. Optimization Mindset
- not in specific order
- know
- think
- measure
5. Optimization Mindset
- You can't avoid any of that
-
- no, really
6. Optimization Mindset
- know + think = shoot in the dark
-
- you just write code hoping for the best
- know + measure = shoot in the dark
-
- you are missing "understand" part
- think + measure = shoot in the dark
-
- you solve abstract problem, not real
7. Optimization Mindset:know + think
- hardware is more complex then you think
-
- highly parallel
-
- deep pipelining
-
- when you write asm - high-level already
8. Optimization Mindset:know + measure
- knowledge is static
- knowledge comes from the past
- knowledge is general
9. Optimization Mindset:know + measure
- qsort vs bubble sort
-
- sure, qsort is faster
- but you are missing the point
-
- maybe radix?
-
- maybe no need to sort?
-
- maybe insertion?
-
- parallel sorting network?
10. Optimization Mindset:think + measure
- solving abstract problem
-
- example: GPU
-
-
- optimizing for RIVA TNT and GTX is different
-
11. Optimization Mindset
- well, if you are missing two from the three
-
- no comments
12. Know
- your hardware
- your data
-
- knowing data is interleaved with think
-
- we will talk more of it in "think"
13. Know your hardware
- GPU
- CPU
- whatever
-
- e.g. disk load speed
14. Know your hardware: GPU
- Pipeline
-
- meaning - slow step = slow everything
-
- you are as slow as your bottleneck
- Know your pipeline
- Won't go into full pipeline spec
-
- Resources section
- Just common/biggest problems
15. Know your hardware:GPU Geometry
- pre/post tnl cache
-
- should use indexed geometry or not
- cache hit rate
-
- strips vs tri list
- memory throughput
-
- vertex size
- fetch cost (memory)
-
- pack attributes or not
16. Know your hardware:GPU Textures
- Texture Cache
-
- swizzle
-
- compression
-
- mip-maps
- Biggest memory hog
17. Know your hardware:GPU Shaders
- VertexProgram vs FragmentShader
-
- balancing
-
- attributes
- Unified Shaders
-
- load balancing
- Precision
-
- gles: highp/mediump/lowp
-
- CG: float/half/fixed (iirc)
18. Know your hardware:GPU Rasterization
- Fillrate (memory speed)
-
- alpha
- 2x2 samples (or more)
-
- why GometryLOD matters
19. Know your hardware: CPU
- Mobile = in-order RISC
-
- for stupid code far worse than CISC
- 2 main issues:
-
- Memory speed
-
- Computation speed
20. Know your hardware:CPU Memory
- This is single most important factor
-
- memory access far slower then computation
- Latency vs Throughput
- Caches
-
- fast memory
-
- your best friend
-
- L1/L2/whatever
- LHS
21. Know your hardware:CPU Computations
- SIMD
-
- better memory usage
-
- better arithmetic usage (4 vals instead of 1)
22. Know your target hardware
- There were general rules
- But you are running on that particular piece of sh... hardware
23. Know your target hardware: PowerVR
- TBDR
-
- perfect hidden surface removal
-
- Alpha-Test/discard
- shader precision
- unified shaders
- Tegra / ATI-AMD / Adreno more common
24. Know your target hardware: ARM
- VFP = FPU on steroids (not real SIMD)
-
- scalar instructions at same speed as vectorized
- NEON = SIMD
-
- more registers
-
- awesome load/store instructions
-
- not as cool as Altivec but cool enough for mobiles
25. Know your target hardware: ARM
- Conditional execution of most instructions
- Fold shifts and rotates into the "data processing" instructions
-
- load structure from array by index
- Thumb + float = disaster
-
- switch back and forth between Thumb mode and regular 32-bit mode
26. Know your hardware: Resources
- RTR
- lots of whitepapers:
-
- powerVR (imgtech) tegra (nvidia) adreno (qualcomm)
-
- AMD/ATI - basically the same as X360, but much smaller tiles
- ARM dev center
27. Think
- Think about your data
- Think about your algorithms
- Think about your constraints
- Think about your hardware
28. Think Basics
- CPU vs GPU
-
- e.g. draw calls
-
-
- pure CPU cost
-
- CPU:
-
- memory vs arithmetic
-
-
- memory slower
-
- GPU:
-
- vprog vs fshader
-
- memory vs arithmetic
29. Think Memory
- fragmentation
- data organization
-
- AOS vs SOA
-
- hot/cold split
- data structures
-
- linear vs random
-
- array vs list
-
- map vs hashtable
-
- allocators
30. Think Constraints
- GPU: will you see the difference?
-
- really?
-
- on mobile screen?
-
- on that one small thingy in the corner?
- CPU: will you need that?
-
- e.g. physics in casual game?
- Memory: will you need that?
-
- will you need more then XXX actors?
31. Measure
- you didn't optimize anything if you didn't measure difference
- you can't optimize if you don't know what needs to be optimized
-
- if you can't measure what takes time
32. Measure Tools
- there are lots of tools
-
- instruments (ios)
-
- perfhud (tegra)
-
- adreno profiler (qualcomm)
-
- some more probably
- Poor-man profiler
-
- timers
33. Unity use case: random bits
- Mobile shaders
-
- specialized of usual built-ins
- Skinning
-
- full NEON/VFP impl
-
-
- usually 10-15% of c-code time
-
-
-
-
- and we are not done optimizing it ;-)
-
-
- Rej's baking material to texture and coming soon BRDF baking to texture
34. Unity use case: random bits
- Remote Profiler
-
- run on target hw, data is transferred over wifi
-
- collect in Editor and show pretty graphs ;-)
- Sort alpha-test *after* opaque
- check *lots* of extensions
- LODs - almost done
- Vertex Cache optimization - after LODs ;-)
35. Closing Words
- Know hardware
- Know data
- Think data
- Think constraints
- Measure always
-
- You better know earlier
- You should be always optimizing
36. Questions