Multi-core Software Development with examples in C++
-
Upload
valdemar-astrid -
Category
Documents
-
view
28 -
download
0
description
Transcript of Multi-core Software Development with examples in C++
MULTI-CORE SOFTWARE DEVELOPMENT WITH
EXAMPLES IN C++
By Jon Nosacek
Why should you care?
Multi-core systems are becoming the standard for all devicesLess heat1 core = 2 cores at half frequency using ¼
power!(P = C × V2 × F)
Designing a new system around multi-core architecture can be quite difficult.
Why should you care? (cont) Technology isn’t evolving like it was
beforeNot automatic gains
We want fast!Our users deserve the same
Multi-threaded VS Multi-core Same basic principle, but can yield very
different resultsMulti-threaded assumes no knowledge of
the release environment and can make the program slower on a single-core platform
Multi-core means specifically designing your system for a platform that you know has two or more cores. Can yield significant performance boosts if done correctly
Hardware
To understand how the software works, you must first understand how the hardware works
Very much a hardware-oriented evolution (Hardware could not keep up with our increasing demands)
Why transition to multi-core? Higher processor frequencies
necessitated better coolingThere is a limit based on materials and
methods Computers are replacing us
Brain is not sequential
Why multi-core (cont)
Traditional:
Multi-core
Intel Core 2 extreme Quadhttp://www.techspot.com/articles-info/23/images/img2.jpg
Intel Core i7 965 quad core (8 threads)http://tinyurl.com/3tgfygn
Terminology
ThreadSmallest unit of execution that a program
can be broken down intoContains all the info that is needed for it to
run Atomic Statement
Single operation by the processor. Can’t slice out during execution
Terminology (cont)
Hyper threading: (SMT)Intel’s route of having 2 threads per core to
simulate more cores and reduce CPU wasteVirtual processors not necessarily tied to
physical onesExample of hardware helping software
How to design a multi-core system Planning Implementation Testing Deployment Maintenance
Planning
A “code-and-fix” laissez faire mentality WILL NOT WORK
Too many things to go wrong, hard to pinpoint problem post factum
Single most important stepProblems here will cascade into other steps
and become worse Clear vision is a must How deep into threading do you want to
go?
Planning (cont.) Opportunity comes during the
decomposition phase Need to model
the state of the threads and what combinations effect each other
Thread interaction Number of threads
More threads => more problemsBalance performance with understandability,
maintainability, timeFairness and priorityMore threads => more communication
Planning (cont.)
Error handling is more importantWho handles the errors? Other threads
might take a while to respond and what if everyone responds?
Synchronization and semaphores should be used sparingly.Threads should be as independent as
possible Need to make rules on memory access
Dataflow diagrams!
Concurrent Vs Parallel Design
Which do you think is better?
http://blog.rednael.com/content/binary/parallel%20vs%20concurrent.jpg
Concurrent Parallel Easy to design and
implement Works well for IO Minimal interaction
to plan and synchronize
Less CPU waste Even more difficult
to track CPU has to keep
track and time slice more (swap time)
Implementation
Languages are becoming more and more open to multi-core programming
There are libraries for C++ that help ease the workloadA lot of threading is OS tied and Microsoft
knows theirs better than anyoneUsually support goes Linux & Microsoft then
Macs Watch for CPU specific commands that
can improve performance
Implementation (cont.)
Make sure resources are being managed Update the models as the system changes The IDE you choose during this phase can
be very important and effects what you see your system doing
Using existing libraries usually reduces workload and are often more efficient
Make sure all basic/shared initializations are done before the threads are created
Implementation (cont.) Watch for evolving trends
If a lot of communication is going on between two threads, see if things can be merged/swapped
See which threads take up the most resources and what will increase program responsiveness
Keep the future in mindMore cores will always be added. Think about the simplest case and expand into
the complexAlso realize that more features are being added
to C++ to help abstract multithreading
// Basic example:
#include < iostream >
#include < pthread.h >
void *task1(void *X) //define task to be executed by ThreadA
{
cout < < “Thread A complete” < < endl;
return (NULL);
}
void *task2(void *X) //define task to be executed by ThreadB
{
cout < < “Thread B complete” < < endl;
return (NULL);
}
int main(int argc, char *argv[])
{
pthread_t ThreadA,ThreadB; // declare threads
pthread_create( & ThreadA,NULL,task1,NULL); // create threads
pthread_create( & ThreadB,NULL,task2,NULL);
pthread_join(ThreadA,NULL); // wait for threads to “join up”
pthread_join(ThreadB,NULL);
return (0);
}
// Doing little things can make a big difference too:
array<int, 4> a = { 24, 26, 41, 42 };
vector<tuple<int,int>> results1;
concurrent_vector<tuple<int,int>> results2;
elapsed = time_call([&] {
for_each (a.begin(), a.end(), [&](int n)
{
results1.push_back(make_tuple(n, fibonacci(n)));
}); });
elapsed = time_call([&] {
parallel_for_each (a.begin(), a.end(), [&](int n)
{
results2.push_back(make_tuple(n, fibonacci(n)));
});});
// a 4 core system outputs: 9250 ms, 5726 ms
Testing
Race conditions are the most prevalent Identify critical paths Balance threads and tweak for
performance Non-determinism (for some initial state,
the final state is ambiguously determined)
Deployment
Mostly the same See what platforms are actually using
you program and tune as necessary
Maintenance
Need to keep up with the changing tech (still pretty new)
Adding new functionality will be more difficult especially when it’s very different from existing.
Much more testing needed Going back to the original plan and
seeing how new features fit in and what is effected is much more important
Maintenance (cont.)
What about adding to an existing system?Very difficultShould focus on largest time consumers (IO,
disk, complex algorithms)Applications with low coupling are the best
to add parallel aspects
Challenges
Lots of planning neededThorough understanding of the environment
Very hard to debug Built in support is hit-and-miss
(language & IDE) Security concerns (from other programs
as well as your own) A lot of life-critical embedded systems
are sticking with single core platforms
What apps can help me out? Intel’s Threading Building Blocks OpenMP Microsoft Visual Studio MULTI-Green Hills Total View - Rogue Wave
Intel’s Threading Building Blocks Template Library
Algorithms, containers, mutex, atomic statements, timing, scheduling
Implements “Task Stealing”If one core is idle, it will take a scheduled
task from another to reduce CPU waste Automatically creates the threads for
you to maximize performanceMuch like parallel_for
Tries to be like the STLease of use, generality, but more aggressive
Intel’s Threading Building Blocks (cont.) A bit more memory/cache oriented than STL Intel knows their own cores and how to
schedule on them Adds a lot more concurrency-oriented data
types (concurrent_queue, concurrent_vector, concurrent_hash_map)Also geared for easy scalability
More atomic operations (also from knowing their own cores)
Follows a pipe-line architecture like graphics
OpenMP
OpenMPint th_id, nthreads;
#pragma omp parallel private(th_id) shared(nthreads)
{
th_id = omp_get_thread_num();
#pragma omp critical {
cout << "Hello World from thread " << th_id << '\n';
}
#pragma omp barrier
#pragma omp master
{
nthreads = omp_get_num_threads();
cout << "There are " << nthreads << " threads" << '\n';
}
}
Microsoft Visual Studio
Thread View
Microsoft Visual Studio (cont.)
MULTI IDE – Green Hills
Cool debugging/recording features
http://www.ghs.com/products/MULTI_IDE.html
Total View - Rogue Wave
Thread viewer:
Sources: Buttari, Alfredo, Jack Dongarra, Jakub
Kurzak et all. The Impact of Multicore on Math Software
Hughes, Cameron, and Tracey Hughes. Professional Multicore Programming Design and Implementation for C++ Developers. Indianapolis, IN: Wiley Pub., 2008.
http://msdn.microsoft.com/en-us/concurrency/default.aspx
http://channel9.msdn.com/search?term=concurrency
http://www.cs.kent.edu/~farrell/amc09/lectures/
Any Questions?
This is all sounds like a lot of work. Why should we bother when something easier might come along?It’s very much a game of figuring out how
much effort gets the largest returns.True progress will take both EE’s and SE’s
(and CS’s too if any showed up today)Might be a long time before we see change