Big Data Appliance for Graph Analytics
description
Transcript of Big Data Appliance for Graph Analytics
2
Finance
Many Big Data Business Problems are based on Graphs
Intelligence/Security
Social Networking
Telecom/Mobile
Life Sciences/Biology
Supply Chain
Healthcare/Medicine
Internet/www
Finance
4
Current Big Data approaches are based on SEARCH…
Needle-in-a-Haystack Paradigm
When we know what we’re looking for We can break up the problem
…
Search:
…Partition and Scale out
Current Big Data Approaches:
5
YarcData focuses on DISCOVERY — not search
Needle-in-a-Needlestack When we don’t know what we’re looking for…
We can’t break upthe problem.
When the relationships among data are mysterious, and the nature of the inquiries unknown, no meaningful scheme for partitioning the data is possible.
…when the purpose of the system is discovery of relationships, not extracting information from already known interrelations, achieving satisfactory performance is difficult.
“
“
“
“
Current Big Data Approaches do NOT enable Discovery
(Source: Gartner Report, YarcData's uRiKA Shows Big Data Is More Than Hadoop and Data Warehouses, September 2012)
7
Discovery through fast hypothesis validation
“
In the amount of time it takes to validate one hypothesis, we can now validate 1000 hypotheses– massively improving our success rate and systematizing serendipity.
“ “
YarcData Government Customer
8
Government Organization: Discovering New Plots
Goal: Proactively identify patterns of activity and threat candidates by aggregating intelligence and analysis
Data sets: reference data, People, Places, Things, Organizations, Communications…
Technical Challenges: Volume, Variety and Velocity of data; Inaccurate, incomplete and falsified data
Users: Intelligence Analysts
Usage model: Search for patterns of activity and graphically explore relationships between entities for candidate behavior and activities
Augmenting: Existing Hadoop cluster and multiple data appliances
9
Healthcare Provider: Discovering New Treatments
Goal: Proactively identify optimal treatments for patients based on treatment results for “similar” patients
Data sets: Longitudinal patient data, Family history, Genetics, Reference data
Technical Challenges: Ad-hoc constantly changing definition of “similarity” across thousands of constantly changing attributes
Users: Doctors
Usage model: Compare results of candidate treatment options for “similar” patients based on ad-hoc physician specified patterns
Augmenting: Existing data warehouse
10
Government Organization: Discovering new cyber threats
Goal: Proactively identify unknown cyber threats by examining all relationships
Data sets: IP, MAC, BGP, Firewall, DNS, Netflow, Whois, NVD, CIDR…
Technical Challenges: Volume and Velocity of data; Temporal dependencies; Real-time response
Users: Cyber Analysts
Usage model: Iterative analysis of all patterns across all traffic to explore deviations in frequency of occurrence, derivative patterns of known threats and linking patterns through relationships in offline data
Augmenting: Existing data appliances
11
Healthcare Payer: Discovery New Fraud Patterns
Goal: Proactively identify new patterns of healthcare fraud (perpetrator/provider/patient nexus) by examining all healthcare relationships
Data sets: Provider, Beneficiary, Policy, Claims, Billing, Services, Outcomes, Social Network, Organization…
Technical Challenges: Volume and Velocity of data; Entity Resolution; Complex Inter-relationships; Temporal dependencies
Users: Fraud Inspectors/Analysts
Usage model: Analyze outcome and cost for various disease trajectories and identify outliers for treatment optimization and fraud investigation; Separate fraud patterns from benign treatment or legitimate errors
Augmenting: Existing data warehouse, Predictive Analytics
12
Employing a single system-wide memory means data does not need to be partitioned, as it must be on MapReduce-based systems like Hadoop. Any thread can dart to any location, following its path through the graph, since all threads can see all data elements. This greatly reduces the imbalances in time that plague graph-oriented processing on Hadoop clusters.
A unique approach to processor design, YarcData's Threadstorm chip, shows no slowdown under the characteristic zigs and zags of graph-oriented processing… the data is held in-memory in very large system memory configurations, slashing the rate of file accesses.
YarcData technology enables Big Data Graph Analytics
“
““
“
13
YarcData…uses Linux-ized processors to blow through graph analyses at speeds beyond classic RDBMS’ imagination – jet plane versus burro.
…while making it easy for Enterprises to adopt
Open Standard Interface – RDF/SPARQL from W3C
Admins see just another Linux server – provisioning, security, management…
Packaged as an appliance to hide complexity
Subscription-based pricing for on-premise deployment
“
“
We are also impressed with… its product strategy, which combines the advantages of a pre-integrated hardware appliance with the flexibility of a subscription model.
“
“
14
YarcData: Big Data Appliance for Graph Analytics
Ease adoption with subscription pricing and industry standards support
Purpose built to solve big data graph problems with large shared-memory and massive multi-threading
Graph analytics warehouse supports ad hoc queries, pattern-based searches, inferencing and deduction on dynamic data sets
Data center ready appliance with open interfaces enables re-use of in-house skill sets, no lock-in and simplified integration
Gain business insight by discovering unknown relationships in big data
Achieve competitive Advantage with scalable real-time graph analytics