Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing...
-
Upload
edmund-hudson -
Category
Documents
-
view
221 -
download
0
Transcript of Test and Verification Solutions127 October 2010 Test and Verification Solutions The testing...
Test and Verification Solutions 127 October 2010
Test and Verification Solutions
The testing challenges associated with distributed
processing
Mike Bartley, TVS
2Test and Verification Solutions 227 October 2010
Agenda
• The types of distributed computing• The rise of multicore computing• Where is this stuff used?• Why is it important?• The problems• The challenges for
– Test– Debug
3Test and Verification Solutions 327 October 2010
The types of distributed computing
• Distributed CPU and memory– Client-server– Internet applications
• Multitasking– The execution of multiple concurrent software processes in a
system as opposed to a single process at any one instant • Multi-processing
– Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system
• Multicore– A multi-core processor is composed of two or more independent
cores. – An integrated circuit which has two or more individual processors
(called cores in this sense).• Multi-threading
– threads have to share the resources of a single CPU
4Test and Verification Solutions 427 October 2010
The rise of multicore & parallel computing
• Moore’s law– Double the number of transistors & speed every 18
months
• Power issues– Continuous frequency scaling through process
minimisation cannot cope with the reduced power demands
• Multicore processing will become the norm
One of the goals of .NET Framework 4 was to make it easier for developers to write parallel programs that target multi-core machines. To achieve that goal, .NET 4 introduces various parallel-programming primitives that abstract away some of the messy details that developers have to deal with when implementing parallel programs from scratch.
5Test and Verification Solutions 527 October 2010
Example mobile phone chip
6Test and Verification Solutions 627 October 2010
Example application domains of distributed computing
• Mobile• Automotive• Aerospace• Gaming
– Physics, rendering, AI, sound, IO• Video encode and decode• Server
– Handle multiple users accessing the same content– Files, applications, just sharing raw compute power
• Graphic design– Photoshop
• Banking– Simulations, Monte-carlo
Driving Innovation
Parallel processing and multi-core
• Moore’s Law: computing speed doubles every 18 months• Performance can be increased further through parallelisation
– Multiple processing units can work simultaneously, on a single chip, on different parts of the same problem to reach a solution quicker than if one unit tried to solve it by itself
– Non-trivial to exploit effectively• Multi-core means multiple, separate processing units
operating in parallel in a single processor• Multicore has become the dominant processor architecture in
a short space of time and number of cores in servers and devices is set to grow
Driving Innovation
Why is this important to the UK?
• Multicore devices are becoming all-pervasive, deployed in– embedded systems in washing machines and cars, mobile telephones,
communications networks including the Internet– high-performance systems running applications in sectors including
pharmaceutical, energy, aerospace, finance and engineering • System inefficiencies that could arise from failing to exploit parallel
computing effectively could have a substantial economic impact • UK has world-class expertise in this area stemming in part from the
development of the Transputer in the 1980s• Much of this expertise is now dispersed
Driving Innovation
The challenge
• Our consultations with experts in industry and academia have highlighted the (short-term) need for:1. Awareness creation: Many senior managers do not realise the
importance of parallel computing in the context of their businesses2. Application porting: Those who do realise do not know how to
migrate existing applications to take their businesses forward3. Training: It is estimated that fewer than 1% of programmers are
“parallel-literate” and able to properly and quickly exploit parallelisation
4. Multicore Ecosystems: While the UK has a base of parallel and multicore programme skills, the community is fragmented
10Test and Verification Solutions 1027 October 2010
The problems with distributed processing
• A sequential program‘s output only depends on the input
• whereas a distributed systems output depends on both the input and the execution order of the interactions between processes
• Non-determinism– We cannot guarantee the order of execution– Can lead to race conditions
11Test and Verification Solutions 1127 October 2010
Race Condition Examples
12Test and Verification Solutions 1227 October 2010
The problems with distributed processing
• Non-determinism– We cannot guarantee the order of execution– Can lead to race conditions
• Shared resources
13Test and Verification Solutions 1327 October 2010
Sharing resources
• Typical distributed programming paradigms for distributed hardware can be divided into two categories: – shared-memory
• The shared-memory paradigm uses shared memory space (e.g. registers) for communication between threads
– message-passing• the implementation of data exchange between threads by
passing messages. • This paradigm is necessary for communication of threads
on distributed-memory applications• Message Passing Interface (MPI) is the most commonly
used library for the message-passing paradigm
14Test and Verification Solutions 1427 October 2010
The problems with distributed processing
• Non-determinism– We cannot guarantee the order of execution– Can lead to race conditions
• Shared resources– Shared memory
• Use of locks– Errors because we forget to release the locks
– Can lead to deadlocks
15Test and Verification Solutions 1527 October 2010
The use of locks
class mutex{mutex() {//gettheappropriatemutex}~mutex() {//release it }private: sometype mHandle;
}voidfoo() {
mutex get; //getthemutex…if(a) return; //released here…if(b) throw“oops”; //or here…return; //or here
16Test and Verification Solutions 1627 October 2010
Deadlock example
Code for Process P Code for Process Q
Lock(M) Lock(N)
Lock(N) Lock(M)
Critical Section Critical Section
Unlock(N) Unlock(M)
Unlock(M) Unlock(N)
17Test and Verification Solutions 1727 October 2010
Cause of deadlock
• 1) Tasks claim exclusive control of the resources they require ("mutual exclusion“ condition).
• 2) Tasks hold resources already allocated to them while waiting for additional resources ("wait for" condition).
• 3) Resources cannot be forcibly removed from the tasks holding them until the resources are used to completion ("no preemption" condition).
• 4) A circular chain of tasks exists, such that each task holds one or more resources that are being requested by the next task in the chain ("circular wait“ condition).
18Test and Verification Solutions 1827 October 2010
The use of message passing
There is still scope for race conditions
19Test and Verification Solutions 1927 October 2010
Avoiding deadlock
• Each task must request all its required resources at once and cannot proceed until all have been granted ("wait-for“ condition denied).
• If a task holding certain resources is denied a further request, that task must release its original resources and, if necessary, request them again together with the additional resources ("no preemption“ condition denied).
• The imposition of a linear ordering of resource types on all tasks;
20Test and Verification Solutions 2027 October 2010
Categorisation of parallel programming bugs
• From [5] Pedersen J.B. Classification of Programming Errors in Parallel Message Passing Systems. Proceedings of Communicating Process Architectures 2006 (CPA'06). IOS Press, 2006. p363-376.
• The research looked into the bugs introduced when students were asked to parallelise sequential programs:– Equation Solver, – Mandelbrot, – Matrix Multiplication, – Partial Sum, – Pipeline Computation – and Differential Equation Solver
21Test and Verification Solutions 2127 October 2010
Categorisation of parallel programming bugs
• Data Decomposition—The root of the bug had to do with the decomposition of the data set from the sequential to the parallel version of the program.
• Functional Decomposition — The root of the bug was the decomposition of the functionality when implementing the parallel version of the program.
• API Usage — This type of error is associated with the use of the MPI API calls. Typical errors here include passing data of the wrong type or misunderstanding the way the MPI functions work.
• Sequential Error—This type of error is the type we know from sequential programs. This includes using = instead of == in tests etc.
• Message Problem — This type covers sending/receiving the wrong data, that is, it is concerned with the content of the messages, not the entire protocol of the system.
• Protocol Problem — This error type is concerned with stray/missing messages that violate the overall communication protocol of the parallel system.
• Other—Bugs that do not fit any of the above categories are reported as other‘.
• Let’s look at the distribution of errors!
22Test and Verification Solutions 2227 October 2010
Results of an online error reporting survey(Source: copy from Pedersen)
23Test and Verification Solutions 2327 October 2010
The test and debug challenges
• Are our current test methodologies sufficient?– Is our testing thorough enough?– How do we know when to stop?
• Debug– How much time do we spend in debug?– How do we do it?
• Debuggers• Print statements
• Heisenbugs!– These will be more frequent
24Test and Verification Solutions 2427 October 2010
The test challenges
• What are our current models of test completion?– Black box?
• Requirements coverage• Test matrix• Use-case coverage
– White box?• Code coverage
• These will not be enough!– Can we ensure no races?– Can we ensure no deadlocks?– How much of the message passing have we covered?
• Static analysis will become more important
25Test and Verification Solutions 2527 October 2010
Consider coverage of process communications
26Test and Verification Solutions 2627 October 2010
Debug
• If we run the same test twice it is not guaranteed to produce the same result!
• Debuggers show a single CPU only!