5 . 1. Performance

29
5.1. PERFORMANCE Code performance and optimisation

description

5 . 1. Performance. Code performance and optimisation. Optimisation Overview. Overview of the optimisation process. Optimisation Overview. - PowerPoint PPT Presentation

Transcript of 5 . 1. Performance

Page 1: 5 . 1. Performance

5.1. PERFORMANCECode performance and optimisation

Page 2: 5 . 1. Performance

OPTIMISATION OVERVIEWOverview of the optimisation process

Page 3: 5 . 1. Performance

Optimisation OverviewOptimisation is the process of developing your program so that it uses the minimal amount of some resource. Resources can include CPU time, memory, hard-disk space, network traffic, etc. Optimisation approaches can focus on:• Design – ensuring the design

employs optimal algorithms, minimises redundancy, etc.

• Implementation – ensuring the code optimally maps onto CPU instructions, avoids inefficient data usage, etc.Important: This section focuses

on code (implementation) optimisation

Page 4: 5 . 1. Performance

Optimisation Overview (Start and Stop)

Design Optimisation - Go! In most applications, good performance comes from getting the architectural design right. Given the cost of refactoring do consider performance when developing your design.

Code Optimisation – Stop! Code optimisation can be ‘expensive’ in terms of time, added complexity, etc. It is generally recommended not to optimise code until there is an observable need to do so.

Quote attributed to C.A.R. Hoare: "We should forget about

small efficiencies, say about 97% of the time: premature optimisation is the root of all

evil”

Page 5: 5 . 1. Performance

Code Optimisation (the process)Whenever the UPS or FPS targets cannot be obtained) profile the game to determine which areas are consuming most CPU/GPU time.

Based on the profiling and the design, decide which areas can be optimised and how they will be optimised.

Implement optimisations, re-profile, and repeat until desired UPS/FPS targets obtained.

If targets cannot be reached, then the wider game design and/or game feature set must be reconsidered.Profiler Suggestions: For Java use the NetBeans Profiler. For XNA use NProf and

PIX

Page 6: 5 . 1. Performance

Code Optimisation (JIT - a little bit of help)Both Java and C# use a JIT (just-in-time)

compiler to compile each method from the intermediate language (IL) into native code at runtime.

Most JIT compilers will optimise code, typically by:• Constant folding / copy propagation• Method in-lining• Extraction of loop invariants• Loop unrolling (for small loops)

Aside: Unlike a traditional (e.g.C++) compiler, JIT compilation factors in run-time performance, e.g. the JIT compiler does not (normally) have the luxury of performing

an exhaustive optimisation pass.

Page 7: 5 . 1. Performance

Warning! (The cost of code optimisation)

Optimising code for performance can be risky and normally has an associated cost.

Examples of risks and costs include:• Breaking code that works• Limiting the reusability

and/or extensibility of the code by introducing additional constraints• Making the code harder to

understand and maintain, thereby increasing the likelihood of future bugs

Remember: Unless there is a clear and explicit performance problem, it is

best to err on the side of producing ‘cleaner’ more simple code.

Page 8: 5 . 1. Performance

Warning! (Optimisation shelf-life)Development languages and execution environments change over time (compilers get smarter, garbage collection algorithms change, CPUs run faster, memory grows, etc.)

An optimisation technique solving a limitation of a previous development environment may not be applicable to current environments.

Before accepting performance advice, ask firstly if the advice remains relevant to your current development environment.

Page 9: 5 . 1. Performance

GENERAL ADVICE: CREATING OBJECTSGeneral advice applicable to both Java and C# (and most other managed languages)

Page 10: 5 . 1. Performance

Creating objects…Both Java and C# are object oriented languages. Whenever an object is created there is a creation overhead: • Memory is allocated for all

the instance variables (including those from super-classes)

• All instances variables are initialised to starting values.

• The constructor (including those of any super-classes) are executed.

Given this cost, it is good practice to avoid needless and excessive object creation.

Page 11: 5 . 1. Performance

Creating objects (Hints and tips)Use primitive data types rather than wrapper classes, e.g. use an int instead of an Integer to store a numberUse lazy object creation / avoid creating conditional objects unless conditions are matched. This also applies to instance variables if only some instances will use the instance variables.

...string message = “...” + name;if( printMessage )

Console.WriteLine( message );... ...if( printMessage ) {

string message = “...” + name;Console.WriteLine( message );

}...

Use String literals instead of String objects where possible (as String literals are interned and reused)String str1= "Hello"; // LiteralString str2 = new String("Hello”); // Object

Integer val = 4+5;int val = 4+5;

Page 12: 5 . 1. Performance

GENERAL ADVICE: GARBAGE COLLECTIONGeneral advice applicable to both Java and C# (and most other managed languages)

Page 13: 5 . 1. Performance

Garbage collection

Garbage collection (GC) is a form of automatic memory management. The GC automatically deletes objects and reclaims memory from objects ‘discarded’ by the program.

GC makes manual memory de-allocation unnecessary, thereby freeing the programmer from having to release objects (which is an error prone and often onerous process)

However, garbage collection can have performance implications that are difficult to manage.

Page 14: 5 . 1. Performance

Garbage collection (important factors)

Three important factors influence the GC process:• Allocation rate: the rate at which new

objects (including strings, arrays, etc.) are created.

• Retention: the amount of live heap data (i.e. the amount of allocated ‘stuff’), effecting workload for allocation and de-allocation.

• Fragmentation: the number of unusable fragments (chunks of memory) between allocated objects, effecting space usage and search times. Most GCs try to avoid fragmentation (usually with an associated avoidance cost!)Aside: There are different types of collector,

including: generational, mark-sweep, reference counting, incremental, etc. that influence the

above three factors.

Page 15: 5 . 1. Performance

Garbage collection (important factors)

For game developers the key questions to ask are:• How long does a GC take (in ms)• When and how often does a GC

occur

An acceptable balance between ‘how long’ and ‘how often’ is desirable, e.g. a GC once every several second that incurs a high cost can introduce a perceivable skip in the frame rate. On the other hand, lots of GCs / second will introduce a very high (and unnecessary overhead).

Page 16: 5 . 1. Performance

Garbage collection (Hints and tips)Select and fine-tune an appropriate GC algorithm (see the linked reading material)

Reduce number of allocated / de-allocated objects (reducing GC allocation costs and heap compacting frequency). Tactics include: • Use recyclable object pools (i.e.

‘released’ objects are stored and new object requests taken from this pool).

• XNA: Use structs instead of classes. As structs are value types they are not stored on the GC maintained heap.Aside: The downside of object pools is that it involves writing and

maintaining additional code, and it can introduce subtle errors by recycling an object from the pool that is still referred to and modified

from another part of the program.

Page 17: 5 . 1. Performance

GENERAL ADVICE: METHODSGeneral advice applicable to both Java and C# (and most other managed languages)

Page 18: 5 . 1. Performance

Methods and Method callsConsider the shown code. Where is the potential performance bottleneck?

The length() method will be called for very iteration of the loop (incurring stack push/pop costs for method parameters and returns). For a large loop it can be a heavy additional expense.

Where the result of the method call is invariant over the duration of the loop, it is good practice to extract the method call, and store the result before the loop, e.g.

private long getAccLength(String stringObj) {long len = 0;for(int i = 0; i < stringObj.length(); i++)

len+=(i+1);

return len; private long getAccLength(String stringObj) {

long len = 0; int stringSize = stringObj.length(); for(int i = 0; i < stringSize; i++)

len+=(i+1);

return len; Aside: Understanding you can also calculate this quantity n(n+1)/2,

where n is the string length, results in optimal performance (well, maybe

short of having a lookup table).

Page 19: 5 . 1. Performance

Methods and Method callsSimilar to the last suggestion, it is good to avoid excessive re-calculation by computing the expression once and binding the result to a variable which is reused.

if (enemies.elementAt(i).isAlive()) ...if (enemies.elementAt(i).isBoss()) ...

Enemy enemy = enemies.elementAt(i);if (enemy.isAlive()) ...if (enemy.isBoss()) ...

Making chunky calls…A chunky call is a function call that performs several related tasks (e.g. initialising fields).

A chatty call only does one thing (with several chatty calls needed to get things done). Favour chunky calls for processes that run numerous times per second.

Page 20: 5 . 1. Performance

GENERAL ADVICE: BOXING AND UNBOXINGGeneral advice applicable to both Java and C# (and most other managed languages)

Page 21: 5 . 1. Performance

Boxing/Unboxing (Overview)Boxing is the creation of a reference wrapper for a value type (e.g. storing an int within an Integer). Unboxing is the conversion of the reference wrapper to the value type, e.g.:

int i = 123;object o = (object)i;

Boxing Unboxingo = 123;i = (int)o;

Page 22: 5 . 1. Performance

Boxing/Unboxing (The cost…)Boxing and unboxing are computationally expensive processes.

A new object must be created every time a value type is boxed.

This can be more than an order of magnitude slower than a simple assignment. Additionally, the casting process when unboxing takes longer than a simple assignment.

Boxing/unboxing also creates objects that must be stored/discarded by the GC.

Page 23: 5 . 1. Performance

Boxing/Unboxing (Hints and tips)

ArrayList list = new ArrayList();list.Add(56); // This will cause boxing List<int> list = new List<int>();listInt.Add(56); // This does not cause boxing

Avoiding using non-generic collections (which always box value types). Instead, use generic collections where a defined value type can be specified (removing the need to box/unbox).

Page 24: 5 . 1. Performance

GENERAL ADVICE: MULTI-THREADINGGeneral advice applicable to both Java and C# (and most other managed languages)

Page 25: 5 . 1. Performance

Multi-threadingSequential programs are collections of functions executed in a defined sequence. Information is passed between functions via parameters, return values and shared data.

Parallel programs are collections of tasks that execute together with other tasks. Tasks communicate using messages.CPUs with multiple cores are now the norm (e.g. the XBox360 has 3 cores (6 hardware threads)). Core counts will continue to increase entailing concurrent programming will be needed to get maximum performance.

This has a major impact on program design and opens a wider range of issues (e.g. thread-safe data manipulation, lock performance, etc.).

Page 26: 5 . 1. Performance

ASIDE: XNA ADVICEAdvice applicable to C#

Page 27: 5 . 1. Performance

Passing by reference and by valueIt is faster to pass a memory large value type (e.g. Matrix) by reference and not by value.

A new matrix value will be created in the method and populated using the source matrix.

A pointer to the struct location

need only be passed.

Warning: As with any reference type, care must be taken when

modifying values within the method.

Matrix matrix;processMatrix(matrix);

void processMatrix( Matrix matrix ){ ... }

Matrix matrix;processMatrix(ref matrix);

void processMatrix( ref Matrix matrix ){ ... }

Page 28: 5 . 1. Performance

SpriteBatch performanceSpriteBatch is optimised for batch drawing (it’s in the name!) You should try to:

SpriteSortMode

Immediate

Deferred

Texture

1000 batches, one sprite in each 34 ms 34 ms 34 ms

One batch, 1000 sprites, all using the same texture

0.6 ms 0.7 ms 1.8 ms

One batch, 1000 sprites, alternating between two different textures

11.5 ms 11.6 ms 1.9 ms

• ..., if this is not possible, use SpriteSortMode .Texture

• If possible, use SpriteSortMode. Immediate and draw in texture order or use a sprite sheet, or...

• Draw lots of sprites inside a single Begin/End call

Page 29: 5 . 1. Performance

Summary

To do:Read section to be

completed in Project Development Report

Think about what you hope to submit for the Week 6 hand-in

Continue to develop exploratory code

Today we explored:

Basic principles behind optimisation

Advice on how to avoid common performance issues