Cristiana Amza University of Toronto. Once Upon a Time … … locks were painful for dynamic and...
-
Upload
cassandra-barnett -
Category
Documents
-
view
229 -
download
0
Transcript of Cristiana Amza University of Toronto. Once Upon a Time … … locks were painful for dynamic and...
Cristiana AmzaCristiana AmzaUniversity of Toronto
Once Upon a Time …Once Upon a Time …
… … locks locks were painful for dynamic were painful for dynamic and complex applications ….and complex applications ….
e.g., Massively Multiplayer Gamese.g., Massively Multiplayer Games
Massively Multiplayer Massively Multiplayer GamesGames
• Support many Support many concurrentconcurrent players players and and
• Low update interval to playersLow update interval to players
So, game developers said …So, game developers said …
“ “Forget Forget locks ! locks !
We’ll use our secret sauce !”We’ll use our secret sauce !”
State-of-the-art in Game State-of-the-art in Game CodeCode
Ad-hoc parallelization: Ad-hoc parallelization: segments/shards segments/shards
e.g., World of Warcraft/ Ultima Onlinee.g., World of Warcraft/ Ultima Online
Sequential code, admission controlSequential code, admission controle.g., Quake III e.g., Quake III
Ad-hoc Partitioning (segments)Ad-hoc Partitioning (segments)
Countries, roomsCountries, rooms
Artificial Admission ControlArtificial Admission Control
Admission control Admission control
GatewaysGatewaysE.g., airports, E.g., airports, doorsdoors
But, gamers said …But, gamers said …
””We want to interact, and we hate We want to interact, and we hate lag !” lag !”
Problem with State-of-the-Problem with State-of-the-artart
FlockingFlocking Players move to one area e.g., questsPlayers move to one area e.g., quests
Overload the server hosting the hotspotOverload the server hosting the hotspot
So I said …So I said …
Forget painful Forget painful locks !locks !
Transactional Memory will make Transactional Memory will make game developers and players game developers and players happy !happy !
Story endorsed by Intel (fall of Story endorsed by Intel (fall of 2006).2006).
Our GoalsOur Goals
Parallelize server code into transactionsParallelize server code into transactionsEasy to threadEasy to thread any game any game
Dynamic load balance of txDynamic load balance of txon on any platformany platforme.g., clusters, multi-cores, mobile devices e.g., clusters, multi-cores, mobile devices ……
Beats locks any day !Beats locks any day !
Ideal solution: Contiguous Ideal solution: Contiguous worldworld
Seamless partitionSeamless partitionPlayers can “see” across partition Players can “see” across partition boundariesboundaries
Players can smoothly transferPlayers can smoothly transfer
Regardless of game mapRegardless of game map
Challenge: On Multi-coreChallenge: On Multi-core
Inter-thread conflictsInter-thread conflictsMostly at the boundaryMostly at the boundary
RoadmapRoadmap
The gameThe game
Parallelization Using TMParallelization Using TMCompiler code transformations for TMCompiler code transformations for TM
Runtime TM design choicesRuntime TM design choices
Dynamic load balancing of tx in Dynamic load balancing of tx in gamegame
1515
Game Benchmark (SimMud)Game Benchmark (SimMud)
Interactions: player - Obj, player - Interactions: player - Obj, player - playerplayer
Playerscan move
and interact
Food objects
Terrainfixed, restricts
movement
1616
Game Benchmark (SimMud)Game Benchmark (SimMud)
Actions: move, eat, fightActions: move, eat, fightQuest: flocking of players to a spot on Quest: flocking of players to a spot on the game mapthe game map
1717
Flocking in SimMudFlocking in SimMud
S1
S3
S2
S4
Quest
Parallelization of Server Parallelization of Server CodeCode
Process Requests
Form & Send
Replies
Rx
Tx3
1
2
Select
Read-only phase
Read-Write phase
Example: “Move” RequestExample: “Move” Request
Move()Move() {{
region1->removePlayer( region1->removePlayer( p );p );
region2->addPlayer( p );region2->addPlayer( p );
}}
Parallelize Move RequestParallelize Move Request
Insert “atomic” keyword in codeInsert “atomic” keyword in code
Compiler makes it a Compiler makes it a transactiontransaction
Ex:Ex: #pragma omp critical / #pragma omp critical / __tm_atomic__tm_atomic
{{
region1->removePlayer( p );region1->removePlayer( p );
region2->addPlayer( p );region2->addPlayer( p );
}}
Ex: SimMud Data StructureEx: SimMud Data Structure
struct Regionstruct Region{{ int x, y;int x, y; int width, height;int width, height;
set_t* players;set_t* players; set_t* objects;set_t* objects;
}}
Example Code for Action Example Code for Action MoveMove
void movePlayer( Player* p, int new_x, int new_y ) void movePlayer( Player* p, int new_x, int new_y )
{{ Region* r_old = getRegion( p->x, p->y );Region* r_old = getRegion( p->x, p->y ); Region* r_new = getRegion( new_x, new_y );Region* r_new = getRegion( new_x, new_y );
if( isVacant_position( r_new, new_x, new_y ) )if( isVacant_position( r_new, new_x, new_y ) ) {{ set_remove( r_old->players, p );set_remove( r_old->players, p ); set_insert( r_new->players, p );set_insert( r_new->players, p );
p->x = new_x; p->y = new_y;p->x = new_x; p->y = new_y; }}}}
Manual Transformations Manual Transformations (Locks)(Locks)
void movePlayer( Player* p, int new_x, int new_y ) void movePlayer( Player* p, int new_x, int new_y ) {{
lock_player( p);lock_player( p);
Region* r_old = getRegion( p->x, p->y );Region* r_old = getRegion( p->x, p->y ); Region* r_new = getRegion( new_x, new_y );Region* r_new = getRegion( new_x, new_y );
lock_regions( r_old, r_new );lock_regions( r_old, r_new );
if( isVacant_position( r_new, new_x, new_y ) )if( isVacant_position( r_new, new_x, new_y ) ) {{ set_remove( r_old->players, p );set_remove( r_old->players, p ); set_insert( r_new->players, p );set_insert( r_new->players, p );
p->x = new_x; p->y = new_y;p->x = new_x; p->y = new_y; }} unlock_regions( r_old, r_new ); unlock_regions( r_old, r_new ); unlock_player( p->lock );unlock_player( p->lock );}}
Manual Transformations Manual Transformations (TM)(TM)
void movePlayer( Player* p, int new_x, int new_y ) void movePlayer( Player* p, int new_x, int new_y )
{ { #pragma omp critical {#pragma omp critical { Region* r_old = getRegion( p->x, p->y );Region* r_old = getRegion( p->x, p->y ); Region* r_new = getRegion( new_x, new_y );Region* r_new = getRegion( new_x, new_y );
if( isVacant_position( r_new, new_x, new_y ) )if( isVacant_position( r_new, new_x, new_y ) ) {{ set_remove( r_old->players, p );set_remove( r_old->players, p ); set_insert( r_new->players, p );set_insert( r_new->players, p );
p->x = new_x; p->y = new_y;p->x = new_x; p->y = new_y; }} }}}}
My StoryMy Story
TM will make game developers and TM will make game developers and players happy !players happy !
So far, the developers should be !So far, the developers should be !
It Gets Worse for LocksIt Gets Worse for Locks
Move Move May impact objects within May impact objects within bounding boxbounding box
Short-range or long-rangeShort-range or long-range
Lock all impacted objects Lock all impacted objects
need to search objectsneed to search objects
Top-view of world
Short-range
Long-range
Objects
Each region corresponds to a leaf Each region corresponds to a leaf
Top-view of World
e.g., Quake III Area Node e.g., Quake III Area Node TreeTree
27
Each region corresponds to a leaf Each region corresponds to a leaf Lock all leaf nodes in bounding boxLock all leaf nodes in bounding box
atomically atomically
Top-view of World
Overlapping regions
e.g., Quake III Area Node e.g., Quake III Area Node TreeTree
28
29
– Objects linked to leaf nodesObjects linked to leaf nodes– If cross leaf boundary, link to parent nodeIf cross leaf boundary, link to parent node
Non-Overlapping regions
Top-view of worldObject
lists
Region leafs Objects cross boundary
Area Node Tree – Even Area Node Tree – Even Worse !Worse !
30
– Need to lock parent nodesNeed to lock parent nodes– False SharingFalse Sharing– TheThe whole tree may be lockedwhole tree may be locked
Non-Overlapping regions
Top-view of worldObject
lists
Region leafs Objects cross boundary
Area Node Tree – Even Area Node Tree – Even Worse !Worse !
My StoryMy Story
TM will make game developers and TM will make game developers and players happy !players happy !
Lock down a whole box/tree, vs just Lock down a whole box/tree, vs just read/write what you need in TM.read/write what you need in TM.
Players should be happy too !Players should be happy too !
Compiler/Runtime TM Compiler/Runtime TM SupportSupport
CompilerCompilerAutomatic source transformations to Automatic source transformations to txtx
RuntimeRuntime track accessestrack accesses
resolve conflicts between resolve conflicts between transactionstransactions
adapt to application patternadapt to application pattern
Manual Transformations Manual Transformations (TM)(TM)
void movePlayer( Player* p, int new_x, int new_y ) void movePlayer( Player* p, int new_x, int new_y )
{ i{ i #pragma omp critical {#pragma omp critical { Region* r_old = getRegion( p->x, p->y );Region* r_old = getRegion( p->x, p->y ); Region* r_new = getRegion( new_x, new_y );Region* r_new = getRegion( new_x, new_y );
if( isVacant_position( r_new, new_x, new_y ) )if( isVacant_position( r_new, new_x, new_y ) ) {{ set_remove( r_old->players, p );set_remove( r_old->players, p ); set_insert( r_new->players, p );set_insert( r_new->players, p );
p->x = new_x; p->y = new_y;p->x = new_x; p->y = new_y; }} }}}}
Automatic Transformations Automatic Transformations (TM)(TM)
void tm_movePlayer( void tm_movePlayer( tmtm_Player* p, int new_x, int new_y ) _Player* p, int new_x, int new_y )
{ { Begin_transaction;Begin_transaction; tmtm_Region* r_old = _Region* r_old = tmtm_getRegion( p->x, p->y );_getRegion( p->x, p->y ); tmtm_Region* r_new =_Region* r_new = tm tm_getRegion( new_x, new_y );_getRegion( new_x, new_y );
if(if( tm tm_isVacant_position( r_new, new_x, new_y ) )_isVacant_position( r_new, new_x, new_y ) ) {{ tmtm_set_remove( r_old->players, p );_set_remove( r_old->players, p ); tmtm_set_insert( r_new->players, p );_set_insert( r_new->players, p );
p->x = new_x; p->y = new_y;p->x = new_x; p->y = new_y; }} Commit_transaction;Commit_transaction;}}
Automatic Transformations Automatic Transformations (TM)(TM)
struct tm_Regionstruct tm_Region{{ tm_int x, y;tm_int x, y; tm_int width, height;tm_int width, height;
tm_set_t* players; //recursively re-tm_set_t* players; //recursively re-typetype
tm_set_t* objects; //nested structurestm_set_t* objects; //nested structures
}}
Compiler TM code Compiler TM code translationtranslation
#pragma #pragma begin/endbegin/end
Re-type variables: Re-type variables: tm_shared<> or tm_shared<> or tm_private<>tm_private<> Variables
Shared VariablesUsed in
Transaction
All OtherVariables
Private Variables Used In
Transaction
TM Runtime (libTM)TM Runtime (libTM)
Access Tracking: tm_type<>Access Tracking: tm_type<>Operator overloading for Operator overloading for intercepting reads and writesintercepting reads and writes
Access Granularity: basic-type Access Granularity: basic-type levellevel
Conflict detection and resolutionConflict detection and resolutionSeveral design choicesSeveral design choices
TM Conflict Resolution TM Conflict Resolution ChoicesChoices
PessimisticPessimisticReader/Writer LocksReader/Writer Locks
Read OptimisticRead OptimisticOnly writer locksOnly writer locks
Fully OptimisticFully Optimistic~No locks~No locks
AdaptiveAdaptive
PessimisticPessimistic
A transaction (tx) locks an object A transaction (tx) locks an object before usebefore use
Waits for locks held by other tx Waits for locks held by other tx
Releases all locks at the end Releases all locks at the end
BEGIN END
Reader-writer locksReader-writer locks
Reader lock excludes writersReader lock excludes writers
Writer lock excludes readers/writersWriter lock excludes readers/writers
Read OptimisticRead Optimistic
Writers take locks, readers do notWriters take locks, readers do not
A write invalidates (aborts) all A write invalidates (aborts) all readersreaders
a) Encounter-time: at the writea) Encounter-time: at the writeT1:
BEGIN_TRANSACTION ... ... ... WRITE A ... ...COMMIT_TRANSACTION
T2:
BEGIN_TRANSACTION READ A ... ... INVALID
T3:
BEGIN_TRANSACTION ... READ A ... INVALID
Read OptimisticRead Optimistic
T1:
BEGIN_TRANSACTION ... ... ... WRITE A ... ...COMMIT_TRANSACTION
T2:
BEGIN_TRANSACTION READ A ... ... ...COMMIT_TRANSACTION
T3:
BEGIN_TRANSACTION ... READ A ... ... ... ... INVALID
Writers take locks, readers do notWriters take locks, readers do not
A write invalidates (aborts) all A write invalidates (aborts) all readersreaders
b) Commit-time: at commitb) Commit-time: at commit
Fully OptimisticFully Optimistic
T1:
BEGIN_TRANSACTION ... ... ... WRITE A ... ...COMMIT_TRANSACTION
T2:
BEGIN_TRANSACTION WRITE A ... ... ...COMMIT_TRANSACTION
T3:
BEGIN_TRANSACTION ... READ A ... ... ... ... INVALID
A write invalidates (aborts) all A write invalidates (aborts) all active readers, but supports active readers, but supports multiple writers multiple writers
Commit-time: at commitCommit-time: at commit
Implementation DetailsImplementation Details
Meta-data kept with Meta-data kept with tm_shared<> vartm_shared<> var
Lock, visible-readers setLock, visible-readers set
Implementation DetailsImplementation Details
Validation of each read Validation of each read
Recoverability:Recoverability:Undo-loggingUndo-logging
Write-buffering Write-buffering Private thread data (needs to be Private thread data (needs to be searchable)searchable)
Necessary for fully optimisticNecessary for fully optimistic
Factors Determining Trade-Factors Determining Trade-offsoffs
Conflict typeConflict typew-w conflicts favor fully optimisticw-w conflicts favor fully optimistic
Conflict-spanConflict-spanlong long domino-effect (no progress) domino-effect (no progress) for read optimisticfor read optimistic
Evaluation of Design Trade-Evaluation of Design Trade-offsoffs
Commits
0
10000
20000
30000
40000
50000
60000
0 10 20 30 40 50 60 70 80 90 100
Writes Quota(%)
Pessimistic
ReadOptimistic_Commit-time_Inv
FullyOptimistic
ReadOptimistic_Encounter-time_Inv
No. of threads: 4No. of threads: 4
RoadmapRoadmap
The gameThe game
Parallelization Using TMParallelization Using TMCompiler code transformations for TMCompiler code transformations for TM
Runtime TM design choicesRuntime TM design choices
Dynamic load balancing of tx in Dynamic load balancing of tx in gamegame
Parallel Server Phase TypesParallel Server Phase Types
Process Requests
Form & Send
Replies
Rx
Tx3
1
2
Select
Read-only phase
Read-Write phase
Load balancing
Dynamic Load Management Dynamic Load Management
Region: grid unitRegion: grid unit
Dynamic load balancingDynamic load balancingReassign regions from one server/thread to Reassign regions from one server/thread to anotheranother
Conflicts vs Load ManagementConflicts vs Load Management
Locality, fewer conflictsLocality, fewer conflicts
Keep adjacent regions on same Keep adjacent regions on same threadthread
Global Global reshufflereshuffle
Block partitionBlock partition
Overload due to QuestOverload due to Quest
Reassign Load & Minimize Reassign Load & Minimize ConflictsConflicts
Locality-Aware Load Locality-Aware Load BalancingBalancing
SimMud game map with quest in upper left
Recorded dynamic load balancing
5555
Dynamic Load-balancing Dynamic Load-balancing AlgorithmsAlgorithms
LightestLightestShed regions to lightest loaded threadShed regions to lightest loaded thread
SpreadSpreadBest load spread across all threadsBest load spread across all threads
Locality awareLocality awareKeep nearby regions on same threadKeep nearby regions on same thread
Locality-aware (Quad-tree) Locality-aware (Quad-tree)
Split task when: Split task when: Load > threshLoad > thresh
Reassign tasks:Reassign tasks: reduce conflictsreduce conflicts
Can approximate Can approximate !!
Task SplittingTask Splitting
A
B
C D
E
FG H
I J
B C D
A E F
G H I J
Task Re-assignmentTask Re-assignment
Assign tasks to Assign tasks to reduce conflictsreduce conflicts
Keep Keep Load < thresholdLoad < threshold
T1 T1
T0 T2
5959
Dynamic Load-balancing Dynamic Load-balancing AlgorithmsAlgorithms
All algorithms implemented onAll algorithms implemented onA cluster (single thread on each node)A cluster (single thread on each node)A multi-core (with multiple threads)A multi-core (with multiple threads)
Results on Multi-coreResults on Multi-core
Load balancing algorithms:Load balancing algorithms:StaticStaticLightestLightestSpreadSpreadLocality (Quad-tree)Locality (Quad-tree)
MetricsMetricsNumber of clients per threadNumber of clients per threadBorder conflictsBorder conflictsClient update latencyClient update latency
Thread Load on Multi-coreThread Load on Multi-coreLightest
0
50
100
150
200
1 101 201 301 401 501 601 701 801 901
Time
Nu
mb
er o
f cl
ien
ts
T3
T0
T1
T2
Static
0
50
100
150
200
1 101 201 301 401 501 601 701 801 901
Time
Num
ber
of c
lient
s
T3
T0
T1
T2
Spread
0
50
100
150
200
1 101 201 301 401 501 601 701 801 901
Time
Nu
mb
er
of
clie
nts
T3
T0
T1
T2
Quadtree
020406080
100120140160180200
1 101 201 301 401 501 601 701 801 901
Time
Nu
mb
er o
f cl
ien
tsT3
T0
T1
T2
Border Conflicts on Multi-Border Conflicts on Multi-corecore
Lightest
0100200300400500600700800900
1000
1 201 401 601 801 1001 1201 1401
Time
Nu
mb
er o
f co
nfl
icts
Static
0100200300400500600700800900
1000
1 201 401 601 801 1001 1201 1401
Time
Nu
mb
er o
f co
nfl
icts
Spread
0100200300400500600700800900
1000
1 201 401 601 801 1001 1201 1401
Time
Nu
mb
er o
f co
nfl
icts
Quadtree
0100200300400500600700800900
1000
1 201 401 601 801 1001 1201 1401
Time
Nu
mb
er o
f co
nfl
icts
Client update latency on M-Client update latency on M-corecore
Static
0
2
4
6
8
10
1 101 201 301 401 501 601 701 801 901
Time
Clie
nt u
pdat
e tim
e (m
s)
T3
T0
T1
T2
Spread
0
2
4
6
8
10
1 101 201 301 401 501 601 701 801 901
Time
Clie
nt
up
da
te t
ime
(m
s)
T3
T0
T1
T2
Quadtree
0123456789
10
1 101 201 301 401 501 601 701 801 901
Time
Clie
nt
up
dat
e ti
me
(ms)
T3
T0
T1
T2
Lightest
0
2
4
6
8
10
1 101 201 301 401 501 601 701 801 901
Time
Clie
nt
up
dat
e ti
me
(ms)
T3
T0
T1
T2
ConclusionConclusion
Support forSupport for seamless seamless world world partitioningpartitioning
Compiler & Runtime parallelization Compiler & Runtime parallelization supportsupport
Tx much simpler than locksTx much simpler than locks
Locality awareLocality aware dynamic load balancing dynamic load balancing
Can apply in server clusters, P2P Can apply in server clusters, P2P mobile environments and multi-coresmobile environments and multi-cores
I need your help.I need your help.
““When TM first beat locks” is a good When TM first beat locks” is a good storystory
I need a more sophisticated game to I need a more sophisticated game to make the story happen !make the story happen !
Backup SlidesBackup Slides
6767
0
2000
4000
6000
8000
10000
12000
14000
16000
0
2000
4000
6000
8000
10000
12000
14000
16000
Client Update Latency on Client Update Latency on ClusterCluster
STATIC
LOCALITY
most loaded
least loaded
• All dynamic load balancing algs - similar
6868
Number of Player MigrationsNumber of Player Migrations
• Locality aware has fewest migrations
0
2 0 0 0 0 0
4 0 0 0 0 0
6 0 0 0 0 0
8 0 0 0 0 0
1 0 0 0 0 0 0
1 2 0 0 0 0 0
1 4 0 0 0 0 0
1 6 0 0 0 0 0
l ig h test sp read locality
Average Execution Time / Average Execution Time / RequestRequest(when App changes access (when App changes access pattern)pattern)
Average Execution Time / Requestwhen Application Changes Access Patterns
0
50
100
150
200
250
300
60 30 15
Application Access Pattern Switch Period (s)
Tim
e (
us
)
Locks
ReadOptimistic_Encounter-time_Inv
ReadOptimistic_Commit-time_Inv
Fully Optimistic
Adaptive
Trade-offsTrade-offs
Private thread data Private thread data Per-thread data copy overhead (-)Per-thread data copy overhead (-)
Search private data on read (-)Search private data on read (-)
No need to restore data on abort (+)No need to restore data on abort (+)
Allows multiple concurrent writers (+)Allows multiple concurrent writers (+)
Trade-offs (contd)Trade-offs (contd)
Private thread data Private thread data Per-thread data copy overhead (-)Per-thread data copy overhead (-)
Search private data on read (-)Search private data on read (-)
No need to restore data on abort (+)No need to restore data on abort (+)
Allows multiple concurrent writers (+)Allows multiple concurrent writers (+)
LocksLocksAborts due to deadlock (-)Aborts due to deadlock (-)
No other aborts (+)No other aborts (+)
A WAN distributed server A WAN distributed server systemsystem
Periodical update interval over the slowest server (seconds)
0
2
4
6
8
10
12
14
50
20
0
35
0
50
0
65
0
80
0
95
0
11
00
12
50
14
00
15
50
17
00
18
50
20
00
Time (seconds)
Av
era
ge
up
da
te in
terv
al
Locality
Lightest
Spread
Static
Quest lasts during 0-1000 sec
TM code translation (cont.)TM code translation (cont.)
Based on Omni OpenMP compilerBased on Omni OpenMP compiler
Front-EndFortran / C
Analysis Generation
OpenMPPragmaAnalysis
Identify Variables
OpenMpPragma
Translation
- TM function call- Instrumentation
-Optimization
Average Execution Time / Average Execution Time / RequestRequest
Average Execution Time / Request
1.00
10.00
100.00
1000.00
15 30 50 70 85
Percentage of Writes (%)
Tim
e (
us
)
Locks
Pessimistic
ReadOptimistic_Encounter-time_Inv
ReadOptimistic_Commit-time_Inv
Fully Optimistic
Adaptive