ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command
description
Transcript of ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command
![Page 1: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/1.jpg)
ConCORD: Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command
Lei Xia, Kyle Hale, Peter Dinda
HPDC’14, Vancouver, Canda, June 23-27
Hobbes: http://xstack.sandia.gov/hobbes/
![Page 2: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/2.jpg)
2
Overview• Claim: Memory content-sharing detection and
tracking should be built as a separate service– Exploiting memory content sharing in parallel
systems through content-aware services • Feasibility: Implementation of ConCORD: A
distributed system that tracks memory contents across collections of entities (vms/processes)
• Content-aware service command minimizes the effort to build various content-aware services
• Collective checkpoint service – Only ~200 line of code
![Page 3: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/3.jpg)
3
Outline• Content-sharing in scientific workloads– Content-aware services in HPC– Content-sharing tracking as a service
• Architecture of ConCORD – Implementation in brief
• Content-aware service command • Collective checkpoint on service command• Performance evaluation• Conclusion
![Page 4: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/4.jpg)
4
Content-based Memory Sharing
• Eliminate identical pages of memory across multiple VMs/processes
• Reduce memory footprint size in one physical machine
• Intra-node deduplication
[Barker-USENIX’12]
![Page 5: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/5.jpg)
5
Mol
dy.w
ater
.2M
oldy
.wat
er.4
Mol
dy.w
ater
.8M
oldy
.wat
er.1
2M
oldy
.qua
rtz.2
Mol
dy.q
uartz
.4M
oldy
.qua
rtz.8
Mol
dy.q
uartz
.12
Lam
mps
.2La
mm
ps.4
Lam
mps
.8La
mm
ps.1
2
HPCC
.12
NPB.
bt.B
.4NP
B.cg
.B.4
NPB.
cg.C
.4NP
B.ep
.C.4
NPB.
ep.D
.4NP
B.lu
.C.4
NPB.
sp.B
.4NP
B.sp
.C.4
2048
20480
204800
2048000 Total MemoryIntra-node DistinctInter/Intra Distinct
# of
mem
ory
page
sMemory Content Sharing is Common
in Scientific Workloads in Parallel Systems[previous work published at VTDC’12]
![Page 6: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/6.jpg)
6
Memory Content Sharing in Parallel Workloads [previous work published at VTDC’12]
• Both Intra-node and inter-node sharing is common in scientific workloads,
• Many have significant amount of inter-node content sharing beyond intra-node sharing
[A Case for Tracking and Exploiting Inter-node and Intra-node Memory Content Sharing in Virtualized Large-Scale Parallel Systems, VTDC’12]
![Page 7: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/7.jpg)
7
Content-aware Services in HPC• Many services in HPC systems can be simplified
and improved by leveraging the intra- /inter-node content sharing
• Content-aware service: service that can utilize memory content sharing to improve or simplify itself– Content-aware checkpointing
• Collectively checkpoint a set of related VMs/Processes– Collective virtual machine co-migration
• Collectively moving a set of related VMs– Collective virtual machine reconstruction
• Reconstruct/migrate a VM from multiple source VMs, – Many other services ….
![Page 8: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/8.jpg)
8
A B C GA B C DFC A B EA C D E
Content-aware Collective Checkpointing
A B C D EP1 P2
FC A B EP3
GA B C D
Checkpoint
![Page 9: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/9.jpg)
9
A B C D E FC A B E GA B C D
C GA B C DFC A B EA B C D E
Content-aware Collective Checkpointing
A B C D EP1
Reduce checkpoint size by saving only one copy of each distinct content (block) across the all processes
P2
FC A B EP3
GA B C D
Checkpoint
Collective-checkpoint
CA BD E F G
![Page 10: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/10.jpg)
10
Collective VM Reconstruction
A B C D EVM-1Host-1
FVM-2
C A B EHost-2 Host-3
A B C D GVM-3
Host-4Single VM Migration
![Page 11: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/11.jpg)
11
Collective VM Reconstruction
A B C D EVM-1Host-1
FVM-2
C A B EHost-2 Host-3
A B C D GVM-3
Host-4Collective VM Reconstruction
A BD GC
![Page 12: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/12.jpg)
12
Collective VM Reconstruction
A B C D EVM-1Host-1
FVM-2
C A B EHost-2 Host-3
A B C D G
Host-4Collective VM Migration
A BD GC
A B C D GVM-3
Fasten VM migration by reconstructing its memory from multiple sources
![Page 13: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/13.jpg)
13
• We need to detect and track memory content sharing–Continuously tracking with system
running–Both intra-node and inter-node sharing–Scalable in large scale parallel systems
with minimal overhead
Content-sharing Detection and Tracking
![Page 14: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/14.jpg)
14
• Content sharing tracking should be factored into a separate service–Maintain and enhance a single
implementation of memory content tracking• Allow us to focus on developing an efficient and
effective tracking service itself–Avoid redundant content tracking overheads
when multiple services exist–Much easier to build content-aware services
with existing tracking service
Content-sharing Tracking As a Service
![Page 15: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/15.jpg)
15
• A distributed inter-node and intra-node memory content redundancy detection and tracking system– Continuously tracks all memory content sharing in
a distributed memory parallel system– Hash-based content detection• Each memory block is represented by a hash
value (content hash)• Two blocks with the same content hash are
considered as having same content
ConCORD: Overview
![Page 16: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/16.jpg)
16
ConCORD: System Architecture
Memory Content Update Interface
Content-sharing Query Interface
Content-aware Service Command Controller
ConCORD
Hypervisor (VMM)
VM
Memory Update Monitor
Content-awareService
ProcessMemory Update Monitor
Distributed Memory Content Tracer
Service Command Execution Engine
OS
ptraceinspect… … …node
node
nodes
![Page 17: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/17.jpg)
17
Distributed Memory Content Tracer
• Uses customized light-weight distributed hash table (DHT)– To track memory content sharing, and location of
contents in system-wide
Conventional DHT DHT in ConCORD
Target System Distributed Systems Large-scale Parallel Systems
Type of key variable length fixed length
Type of object variable size, variable format small size
Fault Tolerance Strict Loose
Persistency Yes No
![Page 18: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/18.jpg)
18
DHT in ConCORD• DHT Entry: <content-hash, Entity-List>• DHT content is split into partitions, and
distributed (stored and maintained) over the ConCORD instances
• Given a content hash, computing its partition and its responsible instance is fast and straightforward:– zero-hop – no peer information is needed
![Page 19: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/19.jpg)
19
• Examine memory content sharing and shared locations in system
• Node-wise queries–Given a content hash:• Find number of copies existing in a set of entities• Find the exact locations of these blocks
• Collective Queries– Degree of Sharing: Overall level of content redundancy
among a set of entities– Hot memory content: contents duplicate more than k
copies in a set of entities
Content-sharing Queries
![Page 20: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/20.jpg)
20
How can we build a content-aware service?
• Runs content-sharing queries inside a service–Uses sharing information to exploit content
sharing and improve service–Requires many effort from service developers • how efficiently and effectively utilize the
content sharing
![Page 21: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/21.jpg)
21
How can we build a content-aware service?
• Runs a service inside a collective query–ConCORD provides query template– Service developer defines a service by
parametering the query template –ConCORD executes the parameterized query
over all shared memory content• During run of the query, ConCORD completes the service
while utilizes memory content sharing in the system
–Minimize service developers’ effort
![Page 22: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/22.jpg)
22
• A service command is a parameterable query template
• Services built on top of it are automatically parallelized and executed by ConCORD– partitioning of the task – scheduling the subtasks to execute across nodes– managing all inter-node communication
Content-aware Service Command
![Page 23: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/23.jpg)
23
• ConCORD provides best-effort service– ConCORD DHT’s view of memory content may be
outdated• Application services require correctness– Ensure correctness using best-effort sharing
information
Challenge: Correctness vs. best-effort
![Page 24: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/24.jpg)
24
• Collective Phase:– Each node performs subtasks in parallel on locally
available memory blocks – Best-effort, using content tracked by ConCORD– Stale blocks are ignored– Driven by DHT for performance and efficiency
• Local Phase– Each node performs subtasks in parallel on memory
blocks ConCORD does know of– All fresh blocks are covered– Driven by local content for correctness
Service Command: Two Phase Execution
![Page 25: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/25.jpg)
25
Collective Checkpoint: Initial State
A B C D E
A BC E F
A B C D G
P-1
P-2
P-3
Memory content in processes
![Page 26: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/26.jpg)
26
Collective Checkpoint: Initial State
A B C D E
A BC E F
A B C D G
P-1
P-2
P-3
A B C D E
A BC E F
A B C D H
P-1
P-2
P-3
Memory content in processes Memory content in ConCORD’s view
![Page 27: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/27.jpg)
27
Collective Checkpoint: Initial State
A B C D E
A BC E F
A B C D G
P-1
P-2
P-3
A B C D E
A BC E F
A B C D H
P-1
P-2
P-3
Memory content in processes Memory content in ConCORD’s view
ConCORD’s DHT
Content Hash Process Map
A {p1, p2, p3}
B {p1, p2, p3}
C {p1, p2, p3}
D {p1, p3}
E {p1, p2}
F {p2}
H {p3}
![Page 28: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/28.jpg)
28
Collective Checkpoint: Collective Phase
A B C D E
A BC E F
A B C D G
P-1
P-2
P-3
ConCORDService ExecuteEngine
![Page 29: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/29.jpg)
29
Collective Checkpoint: Collective Phase
A B C D E
A BC E F
A B C D G
P-1
P-2
P-3
ConCORDService ExecuteEngine
Save {A,D}
Save {B, E, F}
Save {C,H}
![Page 30: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/30.jpg)
30
Collective Checkpoint: Collective Phase
A B C D E
A BC E F
A B C D G
P-1
P-2
P-3
ConCORDService ExecuteEngine
![Page 31: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/31.jpg)
31
Collective Checkpoint: Collective Phase
A B C D E
A BC E F
A B C D G
P-1
P-2
P-3
ConCORDService ExecuteEngine
{A,D} saved
{B, E, F} saved
{C} saved
![Page 32: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/32.jpg)
32
Collective Checkpoint: Collective Phase
A B C D E
A BC E F
A B C D G
P-1
P-2
P-3
ConCORDService ExecuteEngine
{A,D} saved
{B, E, F} saved
{C} saved
Completed: {A, B, C, D, E, F}
![Page 33: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/33.jpg)
33
Collective Checkpoint: Local Phase
A B C D E
A BC E F
A B C D G
P-1
P-2
P-3
ConCORDService ExecuteEngine
{A, B, C, D, E, F}
{A, B, C, D, E, F}
{A, B, C, D, E, F}
Local Phase: P-1: Do NothingP-2: Do NothingP-3: Save G
![Page 34: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/34.jpg)
34
• User-defined Service Specific Functions:– Function executed during collective phase:
For request content hash:If (a local block exists locally):
save the memory block into user-defined file– Function executed during local phase:
For each local memory block if(block is not saved):
saves the block to user-defined file
• Implementation: 220 lines of C code. (Code in Lei Xia’s Ph.D Thesis).
Example Service: Collective Checkpoint
![Page 35: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/35.jpg)
35
Performance Evaluation• Service Command Framework– Use a service class with all empty methods (Null
Service Command)• Content-aware Collective Checkpoint • Testbed– IBM x335 Cluster (20 nodes)• Intel Xeon 2.0 GHz/1.5 GB RAM• 1 Gbps Ethernet NIC (1000BASE-T)
– HPC Cluster (500 nodes) • Two quadcore 2.4 GHz Intel Nehalem/48 GB RAM• InfiniBand DDR network
![Page 36: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/36.jpg)
36
256 512 1024 2048 4096 81920
500
1000
1500
2000
2500
3000
3500
4000
4500
Memory Size per process (6process, 6nodes)
Serv
ice
Tim
e (m
s)Null Service Command
Execution Time Linearly Increases with Total Memory Size
Execution time is linear with total process’ memory size
![Page 37: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/37.jpg)
37
1 2 4 8 120
100
200
300
400
500
600
700
800
Number of Nodes (1process/node, 1GB/process)
Serv
ice
Tim
e (m
s)Null Service Command
Execution Time Scales with Increasing Nodes
![Page 38: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/38.jpg)
38
Null Service CommandExecution Time Scales in Large Testbed
![Page 39: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/39.jpg)
39
1 2 4 6 8 12 160%
20%
40%
60%
80%
100% Raw-gzipConCORD
Number of Nodes (Moldy, 1process/node)
Com
pres
sion
Rati
o (%
)Checkpoint Size:
Runs application with plenty of inter-node content sharing, ConCORD achieves better compression ratio
![Page 40: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/40.jpg)
40
1 2 4 6 8 12 160%
20%
40%
60%
80%
100% Raw-gzipConCORDConCORD-gzip
Number of Nodes (Moldy, 1process/node)
Com
pres
sion
Rati
o
•Content-aware checkpointing achieves better compression than GZIP for applications with many inter-node content sharing
Checkpoint Size: ConCORD achieves better compression ratio
![Page 41: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/41.jpg)
41
1 2 4 8 12 16 201024
10240
102400
Raw-GzipConCORD-Checkpoint
Number of Nodes (1 process/node, 1 Gbytes/process, Moldy)
Chec
kpoi
nt T
ime
(ms)
Collective Checkpoint Checkpoint Time Scales with Increasing Nodes
![Page 42: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/42.jpg)
42
Collective Checkpoint Checkpoint Time Scales with Increasing Nodes
1 2 4 8 12 16 201024
10240
102400
Raw-GzipConCORD-CheckpointRaw-Chkpt
Number of Nodes (1 process/node, 1 Gbytes/process, Moldy)
Chec
kpoi
nt T
ime
(ms)
• Content-aware checkpointing scales well in increasing number of nodes.
•Content-aware checkpointing uses significantly less checkpoint time than memory dump+GZIP while achieving same or better compression ratio.
![Page 43: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/43.jpg)
43
Checkpoint Time Scales Well in Large Testbed
![Page 44: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/44.jpg)
44
Conclusion• Claim: Content-sharing tracking should be
factored out as a separate service• Feasibility: Implementation and evaluation of
ConCORD – A distributed system that tracks memory contents in
large-scale parallel systems• Content-aware service command minimizes the
effort to build content-aware services• Collective checkpoint service – Performs well – Only ~200 line of code
![Page 45: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/45.jpg)
45
• Lei Xia• [email protected]• http://www.lxia.net• http://www.v3vee.org • http://xstack.sandia.gov/hobbes/
![Page 46: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/46.jpg)
46
Backup Slides
![Page 47: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/47.jpg)
47
![Page 48: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/48.jpg)
48
Memory Update Monitor
• Collects and monitors memory content updates in each process/VM periodically– Tracks updated memory pages– Collects updated memory content– Populates memory content in ConCORD
• Maintains a map table from content hash to all local memory pages with corresponding content– Allows ConCORD to locate a memory content
block given a content hash
![Page 49: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/49.jpg)
49
Code Size of ConCORD
• ConCORD Total: 11326– Distributed content tracer: 5254– Service Command Execution Engine: 3022– Memory update monitor: 780– Service command execution agent: 1946
• Content-sharing query library: 978• Service command library: 1325• Service command terminal: 1826• Management panel: 1430
![Page 50: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/50.jpg)
50
ConCORD Service Daemon (xDaemon)
Update Interface
Control Interface
xCommandExecution Engine
Content Tracer(DHT)
Query Interface
xCommand Controller
Hash updates
Content queries
xCommand synchronization
Palacios Kernel Modules
xCommand VMM Execution Agent
Memory UpdateMonitor
Send hash updates
System Control
ConCORD Control
ConCORD-VMM interface
xCommand Controller
xCommand Synchronization
ConCORD running instances
![Page 51: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/51.jpg)
51
• Run-time Parameters– Service Virtual Machines (SVMs): VMs this service
is applied to– Participating Virtual Machines (PVMs): VMs that
can contribute to speed up the service– Service mode: Interactive vs. Batch mode– Timeout, Service data, Pause-VM
Service Command: Run-time Parameters
![Page 52: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/52.jpg)
52
1 2 4 6 8 12 160%
20%
40%
60%
80%
100%
RawRaw-gzipConCORDConCORD-gzipDoS
Number of Nodes (Moldy VM, 1VM/node)
Com
pres
sion
Rati
oCheckpoint: Significant Inter-node Sharing
![Page 53: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/53.jpg)
53
1 2 4 8 12 160%
20%
40%
60%
80%
100%
RawRaw-gzipConCORDConCORD-gzipDoS
Number of Nodes (HPCCG, 1VM/Node)
Com
pres
sion
Rati
oCheckpoint: Significant Intra-node Sharing
![Page 54: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/54.jpg)
54
1 2 4 8 12 1650%
60%
70%
80%
90%
100%
110%
RawConCORD
Number of Nodes (1VM/Node)
Com
pres
sion
Rati
oCheckpoint: Zero Sharing
In worst case with zero sharing, checkpoint generated by content-aware checkpointing is 3% larger than raw memory size
![Page 55: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/55.jpg)
55
• Communication failure–Message loss between• xDaemon and VM: UDP• xDaemon and library: reliable UDP
• xDaemon instance failure– Lost of effort done during collective phase
• Client library failure– Command is aborted
Service Command: Fault tolerance
![Page 56: ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command](https://reader036.fdocuments.net/reader036/viewer/2022062811/568161e2550346895dd1f9af/html5/thumbnails/56.jpg)
56
Zero-hop DHT: Fault Tolerance
• Communication Failure– Update message loss between memory update monitor
and DHT instance is tolerable– Causes inaccuracy, which is ok
• xDaemon instance Failure– Let it fail, no replication in current implementation– Hash partitions on the instance is lost– Assume the failed instance is coming back soon (in the
same or different physical node)– Lost content hashes on that instance will eventually be
added again– Causes inaccuracy, which is ok