Accumulo Summit 2014: Using Accumulo to Implement Confidentiality Protection in Message Queuing

Post on 22-Nov-2014

275 views 0 download

description

Accumulo is primarily used as a Big Data storage facility in a clustered environment. Accumulo’s columnar arrangement of rows, key-value pair indices and cell-level security make it attractive for non-Big Data applications as well. In this talk, we describe how to use Accumulo to implement message queuing that provides confidentiality protection. One feature of message queuing is broadcasting messages from a producer to multiple consumers. The messages could be part of a stream that the producer is providing to multiple consumers. In some cases, not all consumers should see every message in the stream. In a traditional queuing system, separate queues would be created for different levels of access. Thereby the messages would be duplicated for each level of access. In thistalk, we show how to use Accumulo to create a queuing system that does not require duplication. We also present results from experiments testing the performance of such a system under different loads. We also present results comparing the performance of streaming messages using a queuing system based on Accumulo compare to traditional queuing systems, such as Apache QPid.

Transcript of Accumulo Summit 2014: Using Accumulo to Implement Confidentiality Protection in Message Queuing

Using Accumulo to Implement Confidentiality Protection in

Message QueuingDr. Rod MotenChief Scientist

PROARC, Inc.

6/6/2014PROARC, Inc. | 300 E. Lombard Suite 640 Baltimore MD 21202 | info@proarc-inc.com | 410-665-2230 1

Ensure confidential information is only accessible by those with the correct privileges

Example◦ Ensure only people with Secret clearances can

read Secret documents

Confidentiality Protection

6/6/2014

PROARC, INC. PROPRIETARY INFORMATION: The information contained herein may not be used in whole or in part except for the limited purpose for which it was furnished. Do not distribute, duplicate, or reproduce in whole or in part without the prior written consent of an authorized official of PROARC, Inc) 2

Artifacts are tagged with attributes that specify their confidentiality level

Portions of a single artifact can have different confidentiality levels

Entire artifact will be protected at the highest level of its parts

Reduce confidentiality level by stripping out portions with higher levels

Example

A Policy for Confidentiality Protection

Protection level of this document is Trade Secret

(Public) Sweeping fingers in shapes across the screen of a smartphone or tablet, can be used to unlock devices.(Confidential) The CEO of Acme uses the same shape for all his devices.(Trade Secret) When near a CEO exploit the Bluetooth bleed bug to send a fake notification to his device and study his gesture. (Public)  The free-form gestures have an inherent appeal as passwords.

Mark each frame or collection of frames with a confidentiality level◦ Consumers can only receive frames for which

they are privileged to read Consumers cannot directly transfer frames

to producers.◦ A broker is required

Use traditional message queuing system with access control, such as Qpid.

Queue per Confidentiality Level

Implementing Confidentiality Protection for Data Streams

Queue per Confidentiality Level

Frame 1A,B

Frame 2A

Frame 3A,B

Frame 4A,B

Queue for Confidentiality Level A

Queue for Confidentiality Level B

Frame 1A,B

Frame 3A,B

Frame 4A,B

Frame 1A,B

Frame 1A,B

Frame 2AFrame 3A,B

Frame 3A,B

Frame 4A,B

Frame 4A,B

Frame 1A,BFrame 2AFrame 3A,BFrame 4A,B

Frame 1A,B

Frame 2A

Frame 3A,B

Frame 4A,B

A separate queue for each protection level Consumers read all frames from queue for which they have access

Queue for A, but Not B

Frame 2A

A single queue contains all frames for all confidentiality levels

Consumers only read frames for which they have access.

Single Queue for All Confidentiality Levels

Single Queue for All Confidentiality Levels

Frame 1A,B

Frame 2A

Frame 3A,B

Frame 4A,B

Frame 1A,BFrame 2AFrame 3A,BFrame 4A,B

Frame 1A,B

Frame 2A

Frame 3A,B

Frame 4A,B

A single queue contains all frames for all protection levels Consumers only read frames for which they have access.

Consumers with Access to A

Consumers with Access to B

Frame 1A,BFrame 1A,B

Frame 2A

Frame 3A,B

Frame 4A,BFrame 4A,B

Frame 3A,B

Treat queue as an unbounded buffer◦ Single writer – multiple readers

Buffer implemented as an Accumulo table◦ Technically it is a very large bounded buffer◦ Theoretically it can hold 2632 = 1.9 x 1049 entries

Each row contains a frame Row ID string of 32 characters from the set [a-z]

2632 frames = 1.9 x 1049 frames 1st frame: aaa…aaa 2nd frame: aaa…aab 27th frame: aaa…aba

Security label Confidentiality level

Using Accumulo for Single Queue Approach

The frame is stored as the values of one or more columns.◦ A frame will be partitioned into multiple values if it is large.

Column Family◦ Contains the column index number

Column Qualifier◦ First column – total size of frame◦ Subsequent columns – size of value

Example – 1KB Frame divided into two columns

Organization of Columns

Row ID Column Family

Column Qualifier

Value

aaa…aaa

0 1024

aaa…aaa

1 512 <512 bytes>

aaa…aaa

2 512 <512 bytes>

Design of Proof-of-Concept

Producer

Unbounded Buffer Writer

AuthorizationService

Accumulo

Consumer

Unbounded Buffer Reader

Reader’sState

Writer’sState

Expired Row

Deleter

Single node instance of Accumulo

Deletes rows older than N seconds

Local persistent storage of last row read, etc.

Local persistent storage of last row written, etc.

Contains security labels for each Producer and Consumer

Multiple consumers per buffer

One producer per buffer

Batch writing of rows◦ Currently, Writers flush after writing one row.

Reduce polling◦ Currently Readers polls for a new row when it has

reached the end of the buffer◦ Writers can notify Readers via multicast when a

row is written

Possible Improvements

Comparison between Qpid and our POC messaging system ◦ Compare the average time to read and write a

frame at a specific rate Frames sizes: 2MB and 8KB Frame rate: 50 ms Number of Consumers: 1, 10, 100, 1000 Number of confidentiality levels: 1 and 5 We didn’t make any special configurations

to Qpid or Accumulo.

Multiple Queues vs Single Queue

1Consumer – 50ms Frame Rate

Accumulo Qpid

# of Levels

Frame Size

Avg. Write Time

Avg. Read Time

1 8KB 0.18ms 4.3ms

1 2MB 111ms 196ms

5 8KB 0.18ms 4.3ms

5 2MB 111ms 196ms

# of Levels

Frame Size

Avg. Write Time

Avg. Read Time

1 8KB 0.93ms 47ms

1 2MB 129ms 3.98s

5 8KB 2.21ms 47ms

5 2MB 3.58s 3.98s

The number of access levels had no impact on the read and write times.

As expected, duplicating the frame for each confidentiality level slows down writes.

100 Consumers – 50ms Frame Rate

Accumulo Qpid

# of Levels

Frame Size

Avg. Write Time

Avg. Read Time

1 8KB 0.21ms 28.3ms

1 2MB 236ms 2.23s

5 8KB 0.21ms 28.3ms

5 2MB 236ms 2.23s

# of Levels

Frame Size

Avg. Write Time

Avg. Read Time

1 8KB 0.93ms 47ms

1 2MB 129ms 3.98s

5 8KB 2.21ms 47ms

5 2MB 3.58s 3.98s

The read and write times for 1 and 100 consumers were so close we only show the results from 1 consumer.

Impacted by the number of consumers.

# of Levels Frame Size Avg. Write Time

Avg. Read Time

Frame Rate

1 & 5 8KB 2.43ms 209ms 50 ms

1 & 5 2MB 12.9s 11.4s 50 ms

1 & 5 2MB 512ms 18.6s Write-50msRead-30s

1000 Consumers - Accumulo

Read times impacted by multiple consumers on the same VM and disk contention.

We didn’t test Qpid with 1000 Consumers because the queues are kept in RAM and we didn’t have enough RAM for 1000 consumers.

1 10 100 10000

50

100

150

200

250

4.3 5.3828.3

209

8KB Frames

Read Write

# of Consumers

Read/W

rite

tim

es

in m

illis

eco

nds

Evidence that sharing NIC may be impacting performance

Read times are almost the same when there is only 1 consumer per VM.

Write times remain flat while read times increase as the number of consumers increase on the same VM.

Accumulo may be suitable as the backbone for a message queuing system◦ Accumulo outperforms Qpid for complex attribute policies. ◦ A messaging system based on Accumulo isn’t restricted by

RAM like Qpid.◦ Drawback: May require a lot of polling.

Large frames◦ Small number of consumers and no more than 5 frames

per second.

Small frames◦ 100’s of consumers per buffer and no more than 40 frames

per second.

Conclusion