BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis
-
Upload
bigdatacloud -
Category
Technology
-
view
1.337 -
download
1
description
Transcript of BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis
HPCC PlatformBig Data Analytics and Delivery
http://hpccsystems.com
LexisNexis’ massive parallel-processing open-source
computing platform
Big Data Cloud Meet UpSeptember 8th, 2011
Who’s been using the HPCC Platform and why?
• Very large businesses• Federal Agencies• National research labs
• It’s 4 to 10 times faster• Products and solutions are built much faster• Very complex problems can be modeled and solved• It’s proven
http://hpccsystems.com
What’s changed?
We just Open-Sourced!
The HPCC Platform is now available to you.
http://hpccsystems.com
Big Data…It’s our business.
BigData
Open Source Components
InsuranceInsurance
Financial Services Financial Services
Cyber SecurityCyber Security
GovernmentGovernment
Health CareHealth Care
RetailRetail
TelecommunicationsTelecommunications
Transportation & LogisticsTransportation & Logistics
Weblog AnalysisWeblog Analysis
INDUSTRY SOLUTIONSINDUSTRY SOLUTIONS
Customer Data IntegrationData FusionFraud Detection and PreventionKnow Your CustomerMaster Data ManagementWeblog Analysis
Online ReservationsOnline Reservations
http://hpccsystems.com
The Platform’s Major Parts
• Thor – Data ingestion, hygiene, refining, transformation, linking, fusion• Roxie – Data Delivery Engine
•Supports complex queries and distributed indexes•Low latency -- Latencies grow logarithmically
• ECL – One language•Highly expressive and efficient declarative language
•Solve complex problems•Encourage code reuse
http://hpccsystems.com
How we’re different
It’s not a group of disparate technologies or competing visions bolted together.
It’s one platform with a clear proven vision.
This by itself is powerful.
http://hpccsystems.com
How we’re different
• You can transcend map reduce • Build transformative data graphs and applications using ECL• Solve very complex Big Data problems• Don’t struggle to fit your Big Data problem into groups of map reduce jobs
http://hpccsystems.com
How we’re different
• No need to munge the data before ingestion• No complex block file system• No need to tune number of tasks for different jobs• Data Delivery Engine is included• Use a single language for data cleansing, transformation, linking, fusion, and delivery• ECL promotes language extension and code reuse• Data graphs are built and optimized by the system• The system-generated C++ is highly optimized• Code execution is optimized• Low and predictable latencies
• Modeling data problems as data problems leads to richer solutions
http://hpccsystems.com
Challenges Facing Health Care Enterprises Challenges facing the health insurance industry
Disparate data in spread across separate physical locations
Scale of data. BIG Data is getting BIGGER.
Adding relationships exponentially expands the size of the BIG Data analytics challenge.
LexisNexis has leveraged parallel-processing computing platforms and large scale graph analytics for a over a decade.
http://hpccsystems.com
Potential Fraud – a POC for the State of New York
Applied social network analytics to information provided by the State of New York and public data supplied by LexisNexis to identify relationships between a group of New York Medicaid recipients living in high-end condominiums located within the same complex and any links those individuals might have to medical facilities or others providing care to New York Medicaid recipients.
http://hpccsystems.com
What’s entailed (high level)
Mix First Party data with Public and Third Data sources
Adds fidelity to existing entities Adds new linkages into the
analysis Ads new entities into the
analysis Exposes ring leaders and brokers
that don’t directly participate
Addition of External Data
http://hpccsystems.com
• Graph \ Network 3 Billion derived public data relationships between people merged with risk indicators.
• Graph Analytics examine up to 20 billion data points to create variables that allows for predictive analysis incorporating relationship context and associated risk.
• Targets fraud across all sectors including Healthcare, Financial Services and Government.
How we did it
http://hpccsystems.com
Cluster Visualization Introduction
How many of them are living in expensive residences, owned expensive property or drive expensive cars?
How many recipients are contacts of medical businesses? How many medical businesses are associated with any of the people in the
cluster? How many are currently receiving benefits?
How many of them are living in expensive residences, owned expensive property or drive expensive cars?
How many recipients are contacts of medical businesses? How many medical businesses are associated with any of the people in the
cluster? How many are currently receiving benefits?
Medicaid RecipientMedicaid Recipient
Expensive ResidenceExpensive Residence
Owns expensive propertyOwns expensive property Owns Expensive VehiclesOwns Expensive Vehicles
Business Contact of Medical Business EntityBusiness Contact of Medical Business Entity
Cluster visualization introduction
http://hpccsystems.com
Cluster Visualization Cluster visualization
http://hpccsystems.com
City Walk Sample: Vehicle Statistics
What is the list of preferred expensive vehicles?
Make Description # Owned Make Description # Owned
Mercedes-Benz
46 Chevrolet
2
Lexus
41 Hummer
2
BMW
27 Jeep
2
Infiniti
13 Nissan
2
Acura
9 Toyota
2
Lincoln
8 Aston Martin
1
Audi
7 Bentley
1
Land Rover
7 Cadillac
1
Porsche
6 GMC
1
Jaguar
5 Honda
1
Mercedes Benz
3 Volkswagen
1
Saab
3 Volvo
1
Vehicle Statistics
http://hpccsystems.com
Name Deeds Held Name Deeds Held
Hudson Eight 78 Mike Greem 21
Hudson Five 74 Scott Hill 21
Hudson First 73 Betty Donaway 21
Hudson Nine 65 Al Clark 19
Harry Anderson 45 Dave Miller 17
Hudson Ten 41 Mark Walker 16
Hudson Seven 39 Mike Smith 16
Home Nationwide 33 Val Edwards 15
Hudson Three 33 Eric Garcia 14
Brian Smith 28 Dane Young 14
Alan Stevens 25 Bill Moore 14
Chris Doe 24 Karen Carter 14
Sophie Davis 23 Casey Baker 14
Washington Mutual 23 Art Nelson 14
Fleet Mortgage Co. 21 Cathy Parker 13
Dominant buyers and sellers at City Walk
Property deed reference counts
http://hpccsystems.com
The engineering story
http://hpccsystems.com
One guy (Joe Prichard). Three weeks. Less than part time.
The platform lets him focus on the data.
Joe’s a lot of fun to work with.
Do you do build other POC’s?
YesYes
http://hpccsystems.com
What next?
Try us out!
• Virtual Machine
• Binaries
• EC2 Data Script
• Ensemble Recipe…Juan from Cannonical
Try us out!
• Virtual Machine
• Binaries
• EC2 Data Script
• Ensemble Recipe…Juan from Cannonical
http://hpccsystems.com
Contact Information
Charles Kaminski
Senior Architect
Academic Development Lead
HPCC Systems
402-619-9413
Charles Kaminski
Senior Architect
Academic Development Lead
HPCC Systems
402-619-9413
http://hpccsystems.com