Multi-core Software Development with examples in C++

MULTI-CORE SOFTWARE DEVELOPMENT WITH

EXAMPLES IN C++

By Jon Nosacek

Why should you care?

Multi-core systems are becoming the standard for all devicesLess heat1 core = 2 cores at half frequency using ¼

power!(P = C × V2 × F)

Designing a new system around multi-core architecture can be quite difficult.

Why should you care? (cont) Technology isn’t evolving like it was

beforeNot automatic gains

We want fast!Our users deserve the same

Multi-threaded VS Multi-core Same basic principle, but can yield very

different resultsMulti-threaded assumes no knowledge of

the release environment and can make the program slower on a single-core platform

Multi-core means specifically designing your system for a platform that you know has two or more cores. Can yield significant performance boosts if done correctly

Hardware

To understand how the software works, you must first understand how the hardware works

Very much a hardware-oriented evolution (Hardware could not keep up with our increasing demands)

Why transition to multi-core? Higher processor frequencies

necessitated better coolingThere is a limit based on materials and

methods Computers are replacing us

Brain is not sequential

Why multi-core (cont)

Traditional:

Multi-core

Intel Core 2 extreme Quadhttp://www.techspot.com/articles-info/23/images/img2.jpg

Intel Core i7 965 quad core (8 threads)http://tinyurl.com/3tgfygn

Terminology

ThreadSmallest unit of execution that a program

can be broken down intoContains all the info that is needed for it to

run Atomic Statement

Single operation by the processor. Can’t slice out during execution

Terminology (cont)

Hyper threading: (SMT)Intel’s route of having 2 threads per core to

simulate more cores and reduce CPU wasteVirtual processors not necessarily tied to

physical onesExample of hardware helping software

How to design a multi-core system Planning Implementation Testing Deployment Maintenance

Planning

A “code-and-fix” laissez faire mentality WILL NOT WORK

Too many things to go wrong, hard to pinpoint problem post factum

Single most important stepProblems here will cascade into other steps

and become worse Clear vision is a must How deep into threading do you want to

go?

Planning (cont.) Opportunity comes during the

decomposition phase Need to model

the state of the threads and what combinations effect each other

Thread interaction Number of threads

More threads => more problemsBalance performance with understandability,

maintainability, timeFairness and priorityMore threads => more communication

Planning (cont.)

Error handling is more importantWho handles the errors? Other threads

might take a while to respond and what if everyone responds?

Synchronization and semaphores should be used sparingly.Threads should be as independent as

possible Need to make rules on memory access

Dataflow diagrams!

Concurrent Vs Parallel Design

Which do you think is better?

http://blog.rednael.com/content/binary/parallel%20vs%20concurrent.jpg

Concurrent Parallel Easy to design and

implement Works well for IO Minimal interaction

to plan and synchronize

Less CPU waste Even more difficult

to track CPU has to keep

track and time slice more (swap time)

Implementation

Languages are becoming more and more open to multi-core programming

There are libraries for C++ that help ease the workloadA lot of threading is OS tied and Microsoft

knows theirs better than anyoneUsually support goes Linux & Microsoft then

Macs Watch for CPU specific commands that

can improve performance

Implementation (cont.)

Make sure resources are being managed Update the models as the system changes The IDE you choose during this phase can

be very important and effects what you see your system doing

Using existing libraries usually reduces workload and are often more efficient

Make sure all basic/shared initializations are done before the threads are created

Implementation (cont.) Watch for evolving trends

If a lot of communication is going on between two threads, see if things can be merged/swapped

See which threads take up the most resources and what will increase program responsiveness

Keep the future in mindMore cores will always be added. Think about the simplest case and expand into

the complexAlso realize that more features are being added

to C++ to help abstract multithreading

// Basic example:

#include < iostream >

#include < pthread.h >

void *task1(void *X) //define task to be executed by ThreadA

{

cout < < “Thread A complete” < < endl;

return (NULL);

}

void *task2(void *X) //define task to be executed by ThreadB

{

cout < < “Thread B complete” < < endl;

return (NULL);

}

int main(int argc, char *argv[])

{

pthread_t ThreadA,ThreadB; // declare threads

pthread_create( & ThreadA,NULL,task1,NULL); // create threads

pthread_create( & ThreadB,NULL,task2,NULL);

pthread_join(ThreadA,NULL); // wait for threads to “join up”

pthread_join(ThreadB,NULL);

return (0);

}

// Doing little things can make a big difference too:

array<int, 4> a = { 24, 26, 41, 42 };

vector<tuple<int,int>> results1;

concurrent_vector<tuple<int,int>> results2;

elapsed = time_call([&] {

for_each (a.begin(), a.end(), [&](int n)

{

results1.push_back(make_tuple(n, fibonacci(n)));

}); });

elapsed = time_call([&] {

parallel_for_each (a.begin(), a.end(), [&](int n)

{

results2.push_back(make_tuple(n, fibonacci(n)));

});});

// a 4 core system outputs: 9250 ms, 5726 ms

Testing

Race conditions are the most prevalent Identify critical paths Balance threads and tweak for

performance Non-determinism (for some initial state,

the final state is ambiguously determined)

Deployment

Mostly the same See what platforms are actually using

you program and tune as necessary

Maintenance

Need to keep up with the changing tech (still pretty new)

Adding new functionality will be more difficult especially when it’s very different from existing.

Much more testing needed Going back to the original plan and

seeing how new features fit in and what is effected is much more important

Maintenance (cont.)

What about adding to an existing system?Very difficultShould focus on largest time consumers (IO,

disk, complex algorithms)Applications with low coupling are the best

to add parallel aspects

Challenges

Lots of planning neededThorough understanding of the environment

Very hard to debug Built in support is hit-and-miss

(language & IDE) Security concerns (from other programs

as well as your own) A lot of life-critical embedded systems

are sticking with single core platforms

What apps can help me out? Intel’s Threading Building Blocks OpenMP Microsoft Visual Studio MULTI-Green Hills Total View - Rogue Wave

Intel’s Threading Building Blocks Template Library

Algorithms, containers, mutex, atomic statements, timing, scheduling

Implements “Task Stealing”If one core is idle, it will take a scheduled

task from another to reduce CPU waste Automatically creates the threads for

you to maximize performanceMuch like parallel_for

Tries to be like the STLease of use, generality, but more aggressive

Intel’s Threading Building Blocks (cont.) A bit more memory/cache oriented than STL Intel knows their own cores and how to

schedule on them Adds a lot more concurrency-oriented data

types (concurrent_queue, concurrent_vector, concurrent_hash_map)Also geared for easy scalability

More atomic operations (also from knowing their own cores)

Follows a pipe-line architecture like graphics

OpenMP

OpenMPint th_id, nthreads;

#pragma omp parallel private(th_id) shared(nthreads)

{

th_id = omp_get_thread_num();

#pragma omp critical {

cout << "Hello World from thread " << th_id << '\n';

}

#pragma omp barrier

#pragma omp master

{

nthreads = omp_get_num_threads();

cout << "There are " << nthreads << " threads" << '\n';

}

}

Microsoft Visual Studio

Thread View

Microsoft Visual Studio (cont.)

MULTI IDE – Green Hills

Cool debugging/recording features

http://www.ghs.com/products/MULTI_IDE.html

Total View - Rogue Wave

Thread viewer:

Sources: Buttari, Alfredo, Jack Dongarra, Jakub

Kurzak et all. The Impact of Multicore on Math Software

Hughes, Cameron, and Tracey Hughes. Professional Multicore Programming Design and Implementation for C++ Developers. Indianapolis, IN: Wiley Pub., 2008.

http://msdn.microsoft.com/en-us/concurrency/default.aspx

http://channel9.msdn.com/search?term=concurrency

http://www.cs.kent.edu/~farrell/amc09/lectures/





Any Questions?

This is all sounds like a lot of work. Why should we bother when something easier might come along?It’s very much a game of figuring out how

much effort gets the largest returns.True progress will take both EE’s and SE’s

(and CS’s too if any showed up today)Might be a long time before we see change

Multi-core Software Development with examples in C++

Documents

Transcript of Multi-core Software Development with examples in C++