Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing...

26
Test and Verification Solutions 1 27 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley, TVS

Transcript of Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing...

Page 1: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

Test and Verification Solutions 127 October 2010

Test and Verification Solutions

The testing challenges associated with distributed

processing

Mike Bartley, TVS

Page 2: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

2Test and Verification Solutions 227 October 2010

Agenda

• The types of distributed computing• The rise of multicore computing• Where is this stuff used?• Why is it important?• The problems• The challenges for

– Test– Debug

Page 3: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

3Test and Verification Solutions 327 October 2010

The types of distributed computing

• Distributed CPU and memory– Client-server– Internet applications

• Multitasking– The execution of multiple concurrent software processes in a

system as opposed to a single process at any one instant • Multi-processing

– Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system

• Multicore– A multi-core processor is composed of two or more independent

cores. – An integrated circuit which has two or more individual processors

(called cores in this sense).• Multi-threading

– threads have to share the resources of a single CPU

Page 4: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

4Test and Verification Solutions 427 October 2010

The rise of multicore & parallel computing

• Moore’s law– Double the number of transistors & speed every 18

months

• Power issues– Continuous frequency scaling through process

minimisation cannot cope with the reduced power demands

• Multicore processing will become the norm

One of the goals of .NET Framework 4 was to make it easier for developers to write parallel programs that target multi-core machines. To achieve that goal, .NET 4 introduces various parallel-programming primitives that abstract away some of the messy details that developers have to deal with when implementing parallel programs from scratch.

Page 5: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

5Test and Verification Solutions 527 October 2010

Example mobile phone chip

Page 6: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

6Test and Verification Solutions 627 October 2010

Example application domains of distributed computing

• Mobile• Automotive• Aerospace• Gaming

– Physics, rendering, AI, sound, IO• Video encode and decode• Server

– Handle multiple users accessing the same content– Files, applications, just sharing raw compute power

• Graphic design– Photoshop

• Banking– Simulations, Monte-carlo

Page 7: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

Driving Innovation

Parallel processing and multi-core

• Moore’s Law: computing speed doubles every 18 months• Performance can be increased further through parallelisation

– Multiple processing units can work simultaneously, on a single chip, on different parts of the same problem to reach a solution quicker than if one unit tried to solve it by itself

– Non-trivial to exploit effectively• Multi-core means multiple, separate processing units

operating in parallel in a single processor• Multicore has become the dominant processor architecture in

a short space of time and number of cores in servers and devices is set to grow

Page 8: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

Driving Innovation

Why is this important to the UK?

• Multicore devices are becoming all-pervasive, deployed in– embedded systems in washing machines and cars, mobile telephones,

communications networks including the Internet– high-performance systems running applications in sectors including

pharmaceutical, energy, aerospace, finance and engineering • System inefficiencies that could arise from failing to exploit parallel

computing effectively could have a substantial economic impact • UK has world-class expertise in this area stemming in part from the

development of the Transputer in the 1980s• Much of this expertise is now dispersed

Page 9: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

Driving Innovation

The challenge

• Our consultations with experts in industry and academia have highlighted the (short-term) need for:1. Awareness creation: Many senior managers do not realise the

importance of parallel computing in the context of their businesses2. Application porting: Those who do realise do not know how to

migrate existing applications to take their businesses forward3. Training: It is estimated that fewer than 1% of programmers are

“parallel-literate” and able to properly and quickly exploit parallelisation

4. Multicore Ecosystems: While the UK has a base of parallel and multicore programme skills, the community is fragmented

Page 10: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

10Test and Verification Solutions 1027 October 2010

The problems with distributed processing

• A sequential program‘s output only depends on the input

• whereas a distributed systems output depends on both the input and the execution order of the interactions between processes

• Non-determinism– We cannot guarantee the order of execution– Can lead to race conditions

Page 11: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

11Test and Verification Solutions 1127 October 2010

Race Condition Examples

Page 12: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

12Test and Verification Solutions 1227 October 2010

The problems with distributed processing

• Non-determinism– We cannot guarantee the order of execution– Can lead to race conditions

• Shared resources

Page 13: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

13Test and Verification Solutions 1327 October 2010

Sharing resources

• Typical distributed programming paradigms for distributed hardware can be divided into two categories: – shared-memory

• The shared-memory paradigm uses shared memory space (e.g. registers) for communication between threads

– message-passing• the implementation of data exchange between threads by

passing messages. • This paradigm is necessary for communication of threads

on distributed-memory applications• Message Passing Interface (MPI) is the most commonly

used library for the message-passing paradigm

Page 14: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

14Test and Verification Solutions 1427 October 2010

The problems with distributed processing

• Non-determinism– We cannot guarantee the order of execution– Can lead to race conditions

• Shared resources– Shared memory

• Use of locks– Errors because we forget to release the locks

– Can lead to deadlocks

Page 15: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

15Test and Verification Solutions 1527 October 2010

The use of locks

class mutex{mutex() {//gettheappropriatemutex}~mutex() {//release it }private: sometype mHandle;

}voidfoo() {

mutex get; //getthemutex…if(a) return; //released here…if(b) throw“oops”; //or here…return; //or here

Page 16: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

16Test and Verification Solutions 1627 October 2010

Deadlock example

Code for Process P Code for Process Q

Lock(M) Lock(N)

Lock(N) Lock(M)

Critical Section Critical Section

Unlock(N) Unlock(M)

Unlock(M) Unlock(N)

Page 17: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

17Test and Verification Solutions 1727 October 2010

Cause of deadlock

• 1) Tasks claim exclusive control of the resources they require ("mutual exclusion“ condition).

• 2) Tasks hold resources already allocated to them while waiting for additional resources ("wait for" condition).

• 3) Resources cannot be forcibly removed from the tasks holding them until the resources are used to completion ("no preemption" condition).

• 4) A circular chain of tasks exists, such that each task holds one or more resources that are being requested by the next task in the chain ("circular wait“ condition).

Page 18: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

18Test and Verification Solutions 1827 October 2010

The use of message passing

There is still scope for race conditions

Page 19: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

19Test and Verification Solutions 1927 October 2010

Avoiding deadlock

• Each task must request all its required resources at once and cannot proceed until all have been granted ("wait-for“ condition denied).

• If a task holding certain resources is denied a further request, that task must release its original resources and, if necessary, request them again together with the additional resources ("no preemption“ condition denied).

• The imposition of a linear ordering of resource types on all tasks;

Page 20: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

20Test and Verification Solutions 2027 October 2010

Categorisation of parallel programming bugs

• From [5] Pedersen J.B. Classification of Programming Errors in Parallel Message Passing Systems. Proceedings of Communicating Process Architectures 2006 (CPA'06). IOS Press, 2006. p363-376.

• The research looked into the bugs introduced when students were asked to parallelise sequential programs:– Equation Solver, – Mandelbrot, – Matrix Multiplication, – Partial Sum, – Pipeline Computation – and Differential Equation Solver

Page 21: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

21Test and Verification Solutions 2127 October 2010

Categorisation of parallel programming bugs

• Data Decomposition—The root of the bug had to do with the decomposition of the data set from the sequential to the parallel version of the program.

• Functional Decomposition — The root of the bug was the decomposition of the functionality when implementing the parallel version of the program.

• API Usage — This type of error is associated with the use of the MPI API calls. Typical errors here include passing data of the wrong type or misunderstanding the way the MPI functions work.

• Sequential Error—This type of error is the type we know from sequential programs. This includes using = instead of == in tests etc.

• Message Problem — This type covers sending/receiving the wrong data, that is, it is concerned with the content of the messages, not the entire protocol of the system.

• Protocol Problem — This error type is concerned with stray/missing messages that violate the overall communication protocol of the parallel system.

• Other—Bugs that do not fit any of the above categories are reported as other‘.

• Let’s look at the distribution of errors!

Page 22: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

22Test and Verification Solutions 2227 October 2010

Results of an online error reporting survey(Source: copy from Pedersen)

Page 23: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

23Test and Verification Solutions 2327 October 2010

The test and debug challenges

• Are our current test methodologies sufficient?– Is our testing thorough enough?– How do we know when to stop?

• Debug– How much time do we spend in debug?– How do we do it?

• Debuggers• Print statements

• Heisenbugs!– These will be more frequent

Page 24: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

24Test and Verification Solutions 2427 October 2010

The test challenges

• What are our current models of test completion?– Black box?

• Requirements coverage• Test matrix• Use-case coverage

– White box?• Code coverage

• These will not be enough!– Can we ensure no races?– Can we ensure no deadlocks?– How much of the message passing have we covered?

• Static analysis will become more important

Page 25: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

25Test and Verification Solutions 2527 October 2010

Consider coverage of process communications

Page 26: Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing challenges associated with distributed processing Mike Bartley,

26Test and Verification Solutions 2627 October 2010

Debug

• If we run the same test twice it is not guaranteed to produce the same result!

• Debuggers show a single CPU only!