Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
-
Upload
neo4j-the-fastest-and-most-scalable-native-graph-database -
Category
Technology
-
view
237 -
download
0
Transcript of Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
ON
ON
Neo4j on IBM POWER8
Philip RathleVP of ProductsNeo Technology
Keshav RanganathanSenior Offering Manager, Data & Analytics SolutionsIBM POWER Systems
ON
2
Neo4j on IBM POWER Systems
Key Takeaways:• Why Graphs & Why Now?• Unique Characteristics of Graph Data &
Architecture Implications• IBM Power Systems Overview• Why deploy Neo4j on IBM Power Systems • Q&A
ON
ON
Neo4j on IBM Power Systems Solves Massive-Scale, Previously Unsolvable Problems
A paradigm shift accelerating time to insight and real-time decision making…Bringing big data insights into action
ON
7
Data Management in 2016
Dynamic Real-World Systems
Abundant RAM
Flash & CAPI
(High-Capacity Storage &Ultra-Fast Random I/O)
ON
10
Sep 2015May 2015Jan 2015Sep 2014May 2014Jan 2014Sep 2013May 2013
100
Popu
larit
y Ch
ange
s
500
600
700
200
300
400
Jan 2013
© DB-Engines.com 2015
Popular Movement
• Wide column stores• RDF stores• Document stores• Search engines• Native XML DBMS• Key-value stores• Object oriented DBMS• Multivalue DBMS• Times Series DBMS
Relational database
Graph Database
ON
11
“Graph analysis is possibly the single most effective competitive differentiator for organizations pursuing data-driven operations and decisions after the design of data capture.”
By the end of 2018, 70% of leading organizations will have one or more pilot or proof-of-concept efforts underway utilizing graph databases.
Analyst Perspective
“Forrester estimates that over 25% of enterprises will be using graph databases by 2017”
IT Market Clock for Database Management Systems, 2014https://www.gartner.com/doc/2852717/it-market-clock-database-management
TechRadar™: Enterprise DBMS, Q1 2014http://www.forrester.com/TechRadar+Enterprise+DBMS+Q1+2014/fulltext/-/E-RES106801
Making Big Data Normal with Graph Analysis for the Masses, 2015
http://www.gartner.com/document/3100219
ON
12
100 Best in Show 2015
Magic Quadrant for Operational DBMS 2015
Neo4j: World’s Leading Graph Database
Technology of the Year 2015, 2014
100 Companies that Matter the Most in Data 2015
Neo4j named most popular Graph Database, 2015
Neo4j declared“Champion”, 2015 & 2016
“Most Popular and Widely Deployed Database”
Winner of NoSQL: Graph Database Technologies
DB-Engines Rankings
Source: http://db-engines.com/en/ranking/graph+dbms
ON
13
Database Technology ArchitecturesA Portfolio View
Graph DB
Connected DataDiscrete Data
Relational DBMSOther NoSQL
ON
Queries can take non-sequential,arbitrary paths through data
Real-time queries need speed and consistent response times
Queries must run reliably with consistent results
Q
A single query can touch a lot of data
15
Relationship Queries StrainTraditional Architectures
ON
UNIFIED, IN-MEMORY MAP
Lightning-fast queries due to
replicated in-memory architecture +
index-free adjacency
MACHINE 1 MACHINE 2 MACHINE 3
Slow queriesdue to
index lookups + network hops
Neo4j on IBM POWER8
Using Other NoSQL to Join DataQ R
Q R
16
Data Relationship Queries
ON
17
Traversal Speeds• Realistic retail dataset from Amazon • Social recommendation (Java procedure) equivalent to:
MATCH (you)-[:BOUGHT]->(something)<-[:BOUGHT]-(other)-[:BOUGHT]->(reco)WHERE id(you)={id}RETURN reco
Threads Hops/second 1 3-4M
10 17-29M20 34-50M30 36-60M
ON
18
Write Scale
• Import highly connected Friendster dataset
• 1.8 billion relationships takes around 20 minutes
• That is 1M writes/second!
ON
19
Good News for Real-Time, In-Memory Graph Queries:Big RAM is Eating Big Data
Jure Leskovec (Stanford), GRADES 2016Szilard Pafka (Datascience.LA)
ON
20
Value from Data Relationships: Top Use Cases
Internal ApplicationsFraud Detection
Master Data Management Network and IT Operations
Customer-Facing ApplicationsReal-Time Recommendations
Graph-Based SearchIdentity and
Access Management
ON
21
Solving Massive-Scale Challenges: Recommendations
People, Places, Things +Interests +
Transactions + Activity
Each requires a new & higherlevel of scaling
ON
22
Solving Massive-Scale Challenges: Fraud Detection
Estimated cost in 2014 $16.31B 1
Fraud and the costs to prevent fraud are up 94% year over year 2
62% of companies subject to payment fraud 3
Nearly 1 out of 4 declined transactions are false positives 4
1 The Nilson Report, 2015; 2, 4 2015 LexisNexis True Cost of Fraud Survey; 3 2015 AFP Payments Fraud and Control Survey
ON
23
ProsSimpleStops rookies
Traditional Fraud Detection: Discrete Data Analysis
RevolvingDebt
INVESTIGATE
INVESTIGATE
Number of accounts
ConsFalse positivesFalse negatives
ON
24
What’s Needed: Connected Analysis
RevolvingDebt
Number of accounts
ADVANTAGESDetect fraud rings
Fewer false negatives
Stop fraud attempts in flight: Carry out graph “walks” in real-time for transactions &
key events, revealing suspicious “loops”
Speed manual fraud analysis: Empower fraud analysts with graph-based analysis, to
more quickly and accurately identify complex fraud
ON
25
Gartner’s Layered Fraud Prevention Approach (4)
(4) http://www.gartner.com/newsroom/id/1695014
Traditional Fraud Prevention: Batch Analysis
Analysis of users and their endpoints
Analysis ofnavigation
behavior and suspect patterns
Analysis of anomaly
behavior by channel
Analysis of anomaly behavior
correlated across channels
Analysis of relationships
to detect organized crime
and collusion
Layer 1
Endpoint-Centric
Navigation-Centric
Account-Centric
Cross-Channel
Entity Linking
Layer 2 Layer 3 Layer 4 Layer 5
DISCRETE DATA ANALYSIS CONNECTED ANALYSIS
ON
27
4X
Threads per core*4X
Mem. Bandwidth*4X
More cache* @ Lower Latency
SMT=Simultaneous Multi-Threading OLTP = On-Line Transaction Processing
These design decisions result in best performance for data centric workloads like: Database, NoSQL, Big Data Analytics, OLTP
POWER8: Designed for data to deliver breakthrough performance
POWER8SMT8
x86Hyperthread
Parallel Processing
POWER8pipe
Data flow
x86 pipe POWER8
x86 POWER8 + OpenPOWER
x86
ON
28
250 Worldwide members of
30 Hardware and technology providers100+
Collaborative
2,500+Linux ISVs
developing on POWER100,000
+Open source packages
innovations under way
The POWER of an open ecosystem
ON
29
Chip / SOC
Boards / Systems
I/O / Storage / Acceleration
System / Integration
Software
Implementation / HPC / Research
This is what a revolution looks like
ON
30
Power Systems Portfolio – Enterprise and Scale-out offerings
Offering OS capability Positioning in the Linux portfolio
Scale-up
E880E870E850
Equally run AIX, IBM i andLinux with IFLs
Enterprise systemsLeadership Performance and ReliabilityUtilization Guarantee (PowerVM – 70%/80%)Flexible, dynamic Capacity on Demand & Enterprise Pools
Scale-Out
S824S822S814
Equally Run AIX, IBM i and Linux
Scale out SystemsUtilization Guarantee (PowerVM – 65)High performance, availability and resiliency
L lineS824LS822LS812L
Linux OnlyScale out Linux SystemsPrice/Performance Leadership vs. X86PowerVM, KVM
LC lineS812LCS822LC BDS822LC HPCS821LC
Linux OnlyCluster-optimized Linux SystemsLowest cost Power SystemKVM
New
ON
- Design and cost optimized for deployments of multiples (cloud and cluster)
- Broad number of optimal solutions
- Co-Designed with the OpenPOWER Ecosystem
IBM SupportCommunity / 3rd Party Support
running
The LC Line
The L Line
PurePower
Enterprise& IFLs
IaaSScale-Out, Linux-Only
ConvergedInfrastructure
Scale-Up
- Enterprise level RAS for single system deployments
- Solutions for Big Data & Analytics
- Converged infrastructure offering
- Rapid time to value and simplicity of management
- Enterprise level robustness and IFL capability
- Solution editions for in memory databases
- (HANA, DB2 BLU)
- Hosted cloud and hybrid cloud solutions
- Rapid deployments and POCs
The IBM Power Systems Linux Portfolio
• Broad Linux portfolio deliver all your Linux deployment needs
• Expanding LC portfolio with two servers for data centric applications and 2nd generation HPC server
POWER8 is designed for the Big Data era and delivers price-performance leadership to the Linux Market!
31
ONPOWER8 CAPI Coherent Accelerator Processor Interface
CustomHardware
Application
POWER8
CAPP
Coherence Bus
PSL
FPGA or ASIC
Customizable HardwareApplication Accelerator • Specific system SW, middleware, or user application• Written to durable interface provided by PSL
POWER8
PCIe Gen3Transport for encapsulated messages
Processor Service Layer (PSL)• Present robust, durable interfaces to applications• Offload complexity / content from CAPP
Virtual Addressing• Accelerator can work with same memory addresses that
the processors use• Pointers de-referenced same as the host application• Removes OS & device driver overhead
Hardware Managed Cache Coherence• Enables the accelerator to participate in “Locks” as a
normal thread Lowers Latency over IO communication model
32
http://opencapi.org/
ONWhy CAPI is Better than Traditional PCIe
CAPP
PCIe
Power Processor
FPGA
AFU
IBM Supplied POWER Service Layer
Typical I/O Model Flow
Flow with a Coherent ModelShared Mem.
Notify Accelerator Acceleration Shared MemoryCompletion
DD Call Copy or PinSource Data
MMIO NotifyAccelerator Acceleration Poll / Int
CompletionCopy or Unpin
Result DataRet. From DD
Completion
Advantages of Coherent Attachment Over I/O Attachment• Virtual Addressing & Data Caching
– Shared Memory– Lower latency for highly referenced data
• Easier, More Natural Programming Model– Traditional thread level programming– Long latency of I/O typically requires
restructuring of application
• Enables Applications Not Possible on I/O– Pointer chasing, etc…
33
Total ~13µs for data prep
Total 0.36µs
ON
IBM Data Engine for NoSQL is an integrated platform for large and fast growing NoSQL data stores. It builds on the CAPI capability of POWER8 systems and provides super-fast access to large flash storage capacity. It delivers high speed access to both RAM and flash storage which can result in significantly lower cost, and higher workload density for NoSQL deployments than a standard RAM-based system. The solution offers superior performance and price-performance to scale out x86 server deployments that are either limited in available memory per server or have flash memory with limited data access latency.
IBM Data Engine for NoSQLCost Savings for In-Memory NoSQL Data Stores
Up to 57TB of extended memory with one POWER8 server + CAPI attach FLASH
Power S822L / S812L Flash System 900 Power S822L / S812L / S822 LC
NEW
External Flash Configuration Integrated Flash Configuration
Up to 8TB of super-fast storage tier on one POWER8 server
34
ONCAPI Unlocks the Next Level of Performance for Flash
Identical hardware with 3 different paths to data
FlashSystem
ConventionalI/O (FC) CAPI - E
Conventional CAPI - I CAPI - E0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
400,000
450,000
IOPS per Hardware Thread
Conventional CAPI - I CAPI - E0
20
40
60
80
100
120
140
160
180
200
Latency (microseconds)
IBM POWER S822L>3x better IOPS per HW thread Lower latency
35
CAPI – I : Integrated CAPI Flash CardCAPI – E: CAPI attached External Flash
CAPI - I
ON
36
POWER8 with CAPI enabled acceleration running Neo4j delivers 1.82X the performance versus Intel Broadwell servers with NVMe
POWER8 x860
100
200
300
400
500
600
700
800
711
390
Rep
rese
ntat
ive
mix
ed w
orkl
oad
Thro
ughp
ut
IBM Power S822LC (20c/160t) x86 Broadwell Server (24c/48t)
82%More
Throughput
• Accelerate Graph Databases with CAPI on POWER8
• Real-World mixed graph transaction workload running Neo4j on IBM Power S822LC server delivers 1.82X the throughput versus Intel Xeon E5-2650 v4 server
– POWER8 (20 cores / 128 GB): 711 Ops/sec– Intel Xeon E5 2650 v4 processor (24 cores / 128
GB): 390 Ops/sec
• Based on IBM internal testing of single system and OS image running mixed graph transaction s based on 200 GB data model internal IBM and Neo4j workload. Conducted under laboratory condition, individual result can vary based on workload size, use of storage subsystems & other conditions. Data as of October 19, 2016• IBM Power System S822LC; 20 cores (2 x 10c chips) / 160 threads, POWER8; 128 GB memory (16 x 8GB), 1.6 TB CAPI NVMe adapter , Neo4j 3.0.4, Ubuntu 16.04. Competitive stack: HP Proliant DL380 Gen9; 24 cores (2 x 12c chips) / 48 threads; Intel E5-2650 v4; 128 GB memory,(16 x 8GB), 1.6 TB NVMe adapter, Neo4j 3.0.4, Ubuntu 15.10.
ON
37
POWER8 with CAPI enabled acceleration running Neo4j delivers 1.61X the price-performance versus Intel Xeon E5-2650 v4 with NVMe
IBM Power S822LC
(20-core, 128GB)
HP DL380 Gen9(24-core, 128GB)
Server price*-3-year warranty
$19,123 $16,911
Mixed graph transaction Workload(total operations per second)
711 390
1.61XPrice-Performance
1.82XPerformance
per Server
• Based on IBM internal testing of single system and OS image running mixed graph transaction s based on 200 GB data model internal IBM and Neo4j workload. Conducted under laboratory condition, individual result can vary based on workload size, use of storage subsystems & other conditions. Data as of October 19, 2016• IBM Power System S822LC; 20 cores (2 x 10c chips) / 160 threads, POWER8; 128 GB memory (16 x 8GB), 1.6 TB CAPI NVMe adapter , Neo4j 3.0.4, Ubuntu 16.04. Competitive stack: HP Proliant DL380 Gen9; 24 cores (2 x 12c chips) / 48 threads; Intel E5-2650 v4; 128 GB memory,(16 x 8GB), 1.6 TB NVMe adapter, Neo4j 3.0.4, Ubuntu 15.10. * Pricing is based bundled pricing for S822LC with Integrated CAPI Flash card (IBM ordering system) and HP Web price https://h22174.www2.hp.com/SimplifiedConfig/Index
ON
ON
Neo4j on IBM Power Systems Solves Massive-Scale, Previously Unsolvable Problems
A paradigm shift accelerating time to insight and real-time decision making…Bringing big data insights into action
ON
39
Where Do I Go Next?
If you think that you have a graph problem
Let’s qualify your use case
• neo4j.com/contact-us
• Your local IBM representative
Learn more…• About graphs & Neo4j @ http://neo4j.com
• Use cases / Case studies / Webinars / Training / Boot camp for your organization or team
• About IBM Power Systems @ http://www-03.ibm.com/systems/power
• About IBM Data Engine for NoSQL @ http://www-03.ibm.com/systems/power/solutions/bigdata-analytics/data-engine-nosql/