accumulo summit 2015
-
Upload
skand-gupta -
Category
Documents
-
view
52 -
download
1
Transcript of accumulo summit 2015
![Page 1: accumulo summit 2015](https://reader035.fdocuments.net/reader035/viewer/2022081502/55ab92f61a28abc7158b469c/html5/thumbnails/1.jpg)
Accumulo @ BloombergAccumulo Summit 2015
Skand GuptaBloomberg LP
![Page 2: accumulo summit 2015](https://reader035.fdocuments.net/reader035/viewer/2022081502/55ab92f61a28abc7158b469c/html5/thumbnails/2.jpg)
Bloomberg• Bloomberg technology helps drive the world’s financial markets
– We build our own software, digital platforms, mobile applications and state of the art hardware
– We run one of the world’s largest private networks with over 20,000 routers across our network
– We have the largest server side JavaScript deployment in the world – 22 million lines of JavaScript code
– We developed “cloud computing” and deployed “software as a service” well ahead of the general marketplace
– Our technology, has brought transparency to the global financial markets • Bloomberg technologists
– More than 3,000 software developers and designers located around the world (London, NYC, SF “tech hubs”)
– BloombergLabs.com (@BloombergLabs) is our platform for dialogue between our experts and the broader tech community
• Our clients – Over 320,000 subscribers – Primarily financial professionals including investment bankers, CFOs, investor
relations, hedge funds managers, foreign exchange, etc.
![Page 3: accumulo summit 2015](https://reader035.fdocuments.net/reader035/viewer/2022081502/55ab92f61a28abc7158b469c/html5/thumbnails/3.jpg)
Source: Wall Street Journal, CFTC , New York Times, Marketplace.org
![Page 4: accumulo summit 2015](https://reader035.fdocuments.net/reader035/viewer/2022081502/55ab92f61a28abc7158b469c/html5/thumbnails/4.jpg)
Source: Wall Street Journal, CFTC , New York Times
Importance of Compliance
![Page 5: accumulo summit 2015](https://reader035.fdocuments.net/reader035/viewer/2022081502/55ab92f61a28abc7158b469c/html5/thumbnails/5.jpg)
Source: Commodity Futures Trading Commission
Hiding in Plain Sight
![Page 6: accumulo summit 2015](https://reader035.fdocuments.net/reader035/viewer/2022081502/55ab92f61a28abc7158b469c/html5/thumbnails/6.jpg)
Compliance Platform and Processing Pipeline
Chat
Reference Data
Trade Data
Customer Data
Product Data
Market Data
Counterparty
Social Media Voice
Human-‐ and Machine-‐generated Data
Surveillance Pipeline
Communication Data
Transactional Data
User Data
Case Management
Compliance Platform
Compliance Storage
Compliance Officers
Search, Review, Analyze
![Page 7: accumulo summit 2015](https://reader035.fdocuments.net/reader035/viewer/2022081502/55ab92f61a28abc7158b469c/html5/thumbnails/7.jpg)
HDFS
Spark
Kafka Storm
Mesos (Cluster Resource Manager)
Elastic data-‐processing and analytics stack
Open REST API (Play)
WORM
Pre-‐fabricated Hardware
Applications
![Page 8: accumulo summit 2015](https://reader035.fdocuments.net/reader035/viewer/2022081502/55ab92f61a28abc7158b469c/html5/thumbnails/8.jpg)
Need for a robust, scalable, high performance, geo-‐distributed data storage and retrieval system
❑ More than 3 Peta Bytes of archived data
❑ 80+ Billion indexed objects ❑ Real-‐time scanning of 35 million
objects per day
100’s G
igab
ytes/year
Communication Data Growth Cumulative Data Growth
Over 3
Petab
ytes to
day
$0.00
$0.75
$1.50
$2.25
$3.00
List Price Replication DR Isolation
$2.31
$1.15
$0.58$0.19
Storing 1GB of Data
Storage Cost
2000 2002 2004 2006 2008 2010 2012
![Page 9: accumulo summit 2015](https://reader035.fdocuments.net/reader035/viewer/2022081502/55ab92f61a28abc7158b469c/html5/thumbnails/9.jpg)
Need for Low Level Security Primitives
Document Level Security
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum
Company Level Security
Data StoreData Pipe Application
User Level Security
Data Store
![Page 10: accumulo summit 2015](https://reader035.fdocuments.net/reader035/viewer/2022081502/55ab92f61a28abc7158b469c/html5/thumbnails/10.jpg)
Security Solutions
• Post-process the queries
– Too slow
– Nasty bugs
• Generate unique document for each view
– Exponential growth in number of documents
• Use application specific features
– Solr dynamic fields, Mangled Fields
• Accumulo Visibility
– Fast, Clean, Generic
![Page 11: accumulo summit 2015](https://reader035.fdocuments.net/reader035/viewer/2022081502/55ab92f61a28abc7158b469c/html5/thumbnails/11.jpg)
Data Model
Row ID Value
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150427 <bytes>
CompanyA_userX_20150428 <bytes>
CompanyA_userY_20150427 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
![Page 12: accumulo summit 2015](https://reader035.fdocuments.net/reader035/viewer/2022081502/55ab92f61a28abc7158b469c/html5/thumbnails/12.jpg)
Find all Communications for a Set of Users for a Date Range
Row ID Value
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150427 <bytes>
CompanyA_userX_20150428 <bytes>
CompanyA_userY_20150427 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
Batch ScannerApplication
![Page 13: accumulo summit 2015](https://reader035.fdocuments.net/reader035/viewer/2022081502/55ab92f61a28abc7158b469c/html5/thumbnails/13.jpg)
Find all Records with “Libor”
Filter
Row ID Value
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150427 <bytes>
CompanyA_userX_20150428 <bytes>
CompanyA_userY_20150427 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
Batch ScannerApplication
![Page 14: accumulo summit 2015](https://reader035.fdocuments.net/reader035/viewer/2022081502/55ab92f61a28abc7158b469c/html5/thumbnails/14.jpg)
Count Number of Objects that Match a Filter
Counting Iterator Filter
Row ID Value
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150427 <bytes>
CompanyA_userX_20150428 <bytes>
CompanyA_userY_20150427 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
Batch ScannerApplication
![Page 15: accumulo summit 2015](https://reader035.fdocuments.net/reader035/viewer/2022081502/55ab92f61a28abc7158b469c/html5/thumbnails/15.jpg)
Scaling OutAp
plic
atio
n
Row ID Value
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150427 <bytes>
CompanyA_userX_20150428 <bytes>
CompanyA_userY_20150427 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
CountingIterator Filter Batch
Scanner
Counting Iterator Filter Batch
Scanner
Counting Iterator Filter Batch
Scanner
Spar
k Pr
oces
sing
![Page 16: accumulo summit 2015](https://reader035.fdocuments.net/reader035/viewer/2022081502/55ab92f61a28abc7158b469c/html5/thumbnails/16.jpg)
Low Latency Writes using Accumulo ‘File System’
RowID Family Qualifier Valueattach.pdf chunk “00001” <bytes>
attach.pdf chunk “00002” <bytes>
… … … …
attach.pdf metadata file_size <file size>
attach.pdf metadata chunk_size <chunk size>
attach.pdf metadata sha256 <checksum>
Writ
e Ti
mes
(ms)
0 5 10 15 20
HDFS Accumulo File System
![Page 17: accumulo summit 2015](https://reader035.fdocuments.net/reader035/viewer/2022081502/55ab92f61a28abc7158b469c/html5/thumbnails/17.jpg)
Conclusion
• Understand the data
• Free your data… but enforce access control
• Need sensible systems that help achieve these goals
Thank You!