SecureMR: A Service Integrity Assurance Framework for MapReduce Author: Wei Wei, Juan Du, Ting Yu,...
-
Upload
lucas-marshall -
Category
Documents
-
view
238 -
download
0
Transcript of SecureMR: A Service Integrity Assurance Framework for MapReduce Author: Wei Wei, Juan Du, Ting Yu,...
SecureMR: A Service Integrity Assurance Framework for MapReduce
Author: Wei Wei, Juan Du, Ting Yu, Xiaohui Gu
Source: Annual Computer Security Applications Conference, 2009, pp.73-82.
Presenter: Tsuei-Hung Sun (孫翠鴻 )
Date: 2010/9/17
2
Outline
• Introduction
• Motivation
• Contribution
• Scheme
• Security analysis
• Performance evaluation
• Comment
3
Introduction
• MapReduce– A parallel data processing model to simplify parall
el data processing on large clusters.
– Proposed by Google.
– It is mainly running on clusters belonging to a single administration domain.
Yahoo’s Hadoop
– Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (Amazon S3).
4
Introduction
Fig. The MapReduce data processing reference model.
M1.
M2. M3.
R1.
R2. R3.
(Distributed File System)
5
Introduction
Fig. Combine multiple map and reduce phases.
6
Introduction
• Data processing service integrity Replication-based techniques
– Sampling techniques
– Checkpoint-based verification
7
Motivation
• Existing address the service integrity, but not on data processing service.
• Replication-based techniques drawback – Replicate all distributed computing tasks for
consistency verification is not efficiency.
– Not scalable to perform centralized consistency verification over massive result data.
8
Contribution
• Decentralized replication-based integrity verification for MapReduce in open systems.
• Achieves security: non-repudiation, resilience to DoS attacks and replay attacks.
• Security components can be easily integrated into existing MapReduce implementations.
• Low performance overhead.• The first attempt to address data processing servi
ce.
10
Scheme
• SecureMR - Architecture Design
11
Scheme
• SecureMR - Communication Design
Commitment protocol
Verification protocol
12
Scheme
• Commitment Protocol
IDMap: a monotonically increasing identity of a map task. DataLoc: input data block location. sig: Master’s signature. KpubM: Mapper’s public key. sigM: Mapper’s signature.HP1,…,HPr: hash value for each partition of its intermediate result
SchedulerTask Executor
Commit Manager
13
Scheme• Verification Protocol
Pi: partition of intermediate results that the reducer will process. ADM: Mapper’s address. HPi: Pi partition committed by the Committer. ReqSeq: sequence number.
Task Executor
Manager
Scheduler
Verifier
CommitterVerifier
Committer VerifierManager
Verifier
sigR
14
Scheme
• Extension for Reducers and MapReduce Chain
MapPhase
MapPhase
ReducePhase
ReducePhase
VerifyPhase
Add Verifier componentAdd Committer component
15
Security analysis
• Collusive Attack - Attacker behavior analysis– Periodical Attacker
• Naive attacker
• Without collusion attacker
• With collusion attacker
– Strategic Attacker
16
Security analysis
Fig. Detection Rate for Non-Collusion Naive Attacker.
Fig. Detection Rate for Non-Collusion Periodical Attacker.
b = 20; Pm = 1 b = 20; Pm = 0.5
b : block number of one input job. Pm: misbehaving probability.l: misbehavior of mapper is detected when he do number of jobs.
17
Security analysis
Fig. Detection Rate for CollusionPeriodical Attacker.
Fig. Misbehaving Probability vs.Duplication Rate.
n : total worker number. m: malicious workers
n = 50; Pm = 0.5; b=20; l = 15n = 50; b =20; l = 15
18
Performance evaluation
T: time D: data transmission cost. r: number of reducers.
19
Performance evaluation
Fig. Response Time vs. Numberof Reduce Tasks. Fig. Response Time vs. Data Size.
number of map task = 60; Data Size = 1GB number of map task = 60;number of reduce task =25
20
Performance evaluation
Fig. Response time vs. Duplication Rate.Fig. Response time vs. Number of Reduce Tasks.
number of map task = 60; Data Size = 1GB
21
Comment
• Assign and Notify can combine into one step.
• TicketM contain some parameters are the same as reducer sign part in request massage.
• If first request is failure, how can reducer do? (TicketM and ReqSeq how to renew)
• In Response massage, mapper can sign Data together that can avoid one hash and reducer also didn’t need to check it.