Li charles biometrics analytics & big data 122013a for release
-
Upload
charles-li -
Category
Technology
-
view
278 -
download
1
description
Transcript of Li charles biometrics analytics & big data 122013a for release
© 2013 IBM Corporation
Biometrics, Identity and Big Data Analytics
Dr. Charles LiAnalytics Solution [email protected]
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
2
Topics
� Biometrics, Identity & ID Management
� Views on Biometrics Technology and System
� Big Data Analytics and Challenges
� Identity Establishment from All Sources
� Identity and Biometrics in the Cloud
� Identity and Biometrics Analytics in Motion
� Summary
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Biometrics, Identity and ID Management
Identity
Establishment
Players
Entitlement(s)
Actions
Identity
Trust
(Rules)
Status
(Environment)
Reputation
(History)
Identity Management
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Views on biometrics technology and system
4
What is missing?
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
5
Extract insight from a high volume, variety and velocity of data in a timely and cost-effective manner
Big Data Concept
Data in many forms –structured, unstructured, text and multimedia
Data in Motion – Analysis of streaming data to enable decisions within fractions of a second
Data at Scale - from terabytes to zettabytes
Variety:
Velocity:
Volume:
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
6
Analytics Concept
Structured Data & Unstructured Content
Descriptive Analytics
Prescriptive Analytics
Predictive Analytics
Made consumable and accessible to everyone
What if these trends
continue? Forecasting
How can we achieve the best
outcome and address variability?
Stochastic Optimisation
What is happening
What exactly is
the problem?
How many, how often,
where?
What actions are needed?
What could happen?
Simulation
How can we achieve the best
outcome? Optimisation
What will happen next if?
Predictive
Modelling
Extracting insight,
concepts and relationships
Content Analytics
Deep insights to improve
visualization and
marketing interactions
VisualAnalytics
Biometrics Quality
Monitoring
Biometrics Reports
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Biometrics Data at Scale – Static & Single Instance
1 Billion Arrivals 2012 world wide United States – 100-200 million international arrivals 2012
1 Exabytes traveling data
Unique Identification Authority of India (UIDAI) plans to enroll 1.2 billion citizens.(UID Program) ( enroll million /day; half billion by
2014) 3-4 Exabytes Biometrics &
Biographic Data
Prolific Usage of Mobile Phones 6 Billion Mobile Phones
6 Exabytes of behavior data
ID Cards/Border Crossings/Benefits/Multiple
Instances
7,000,000,000x(10 Print 0.5-1MB + Face 200KB +
IRIS KB)
7 Exabytes
EU VIS Biometrics Matching System (BMS) at
70 million individuals and 100K daily enrollment
~100 Terabyte
US DoS has in the range of 100 million faces & Others~ at least 10-50 Terabytes
DHS IDENT over 150 million identities; 125,000 transactions daily
~100-300 Terabytes
FBI NGI ~ over100 Million Fingerprints & More coming plus Faces/Iris
~100-200 Terabytes
1 GigaBytes = 1000MB
1 TeraBytes = 1000GB
1 PetaBytes = 1000TB
1 ExaByes = 1000PB
1 ZettaBytes = 1000EB
1 YottaBytes = 1000ZB
many instances, history, transaction, logs… data in reality
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
8
Big Data Sources
System Transaction, Log and Transition Data – Several Times More!
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Other Big data examples
150 Exabytes global size of “Big Data” in Healthcare, growing between 1.2 and 2.4 EX / year
For every session, NY Stock
Exchange captures 1 Terabyte of trade information
AT&T transfers about
30 Petabytes of data through its network daily
Hadron Collider at CERN
generates 40 Terabytes of usable data / day
Facebook processes
500+ Terabytes of data daily
Google processes
> 24 Petabytes of data in a single day
Twitter processes
12 Terabytes of data daily
By 2016, annual Internet traffic
will reach 1.3 Zettabytes
We don’t have the most challenging problem!
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
� “Brutal Force” De-Duplication
• Cumulative de-duplication / Total number of checks= N(N-1)/2 –“Combination Problem”
• De-duplicate 100 million population enrollment results 4,999,999,950,000,000 checking!!!
• 15 years to complete with 10 million matches per second
� Biometric Accuracy Challenge
• FMR at 1 Identification false match per million;
• 500 False Matches with 1 million enrollment population (de-duplicate)
• 5 million false matches with 100 million enrollment population
Biometric Performance at Giga Scale*
* Courtesy to Bojan Cukic* Courtesy to Bojan Cukic
Prohibitive! We have some unique challenges!
Prohibitive! We have some unique challenges!
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Face the Challenges
� Identity Establishment with All Data Sources
- Leverage Entity Resolution Technologies
- Leverage ‘Context Accumulation’
� Biometrics Services in the Cloud
- Leverage Big Data Infrastructure, Platforms
- Leverage Software Services
� Biometrics and Identity Analytics in Motion
- Monitor quality
- Monitor performance
11
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Establishment Identity with All Sources
� Biometrics(physical and behavioral)
� Biographic information
� Behavior data (Social media usage)
� Travel data (API, PNR)
� Credit Card/Banking Information
� Web or Mobile App usage behavior
• Emails
• Multimedia
� Spatial and temporal information
12
Entity /Identity
Resolution With all
Sources
Entity / Identity Resolution - a complex process involving the application of sophisticated algorithms across multiple heterogeneous data sources to resolve multiple records into a single fused view of an individual
• Reduce search space and • Reduce search space and computing resources
• Compliment to low quality images • Cost and benefits tradeoff• Systematic research necessary • Successful programs
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
InfrastructurePlatform
Managementand Administration
Availability andPerformance
Security andCompliance
Usage andAccounting
Enterprise
Application Services
ApplicationLifecycle
ApplicationResources
ApplicationEnvironments
ApplicationManagement
Integration
Cloud ServicesInfrastructure and Platform as a Service
Smarter Commerce Smarter Cities
Social BusinessBusiness Analyticsand Optimization
Enterprise+
Cloud SolutionsSoftware and Business Process as a Service
Infrastructure
aaS
Platform
PaaS
Software
SaaS
Business Process
BPaaS
DeploymentPrivate, Public and Hybrid Models
Biometrics Services in the Cloud - Leverage Big Data
Infrastructure, Platform and Software Services
Standard Interface
Process DataProcess DataProcess Data
Process DataProcess DataProcess Data
Process DataProcess DataProcess Data
Enrolment Service
1:1 Identification Service
….
Fingerprint Biometric DataIris
Face
Note: Cloud & Big Data not the same
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
� A Prototype - Leveraging the cloud for Big Data Biometrics
• E. Kohlwey et al. “Leveraging the Cloud for Big Data Biometrics, 2011
• A prototype system for generalized searching of cloud-scale biometric data as well as an application of this system to the task of matching collection of synthetic human iris images
• Implemented with Hadoop (Map/Reduce framework)
� Successful deployment of Identification algorithms for India UID program
• Non-traditional matching vendor technologies
� Biometrics as a Service
• Business process as a service
• Software as a service
14
Exemplary Progress
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
� Focus on Parallelism and Scalability
• Excellent research and testing areas
• Bring algorithms into operational environment
� Explore defining biometrics as a service program –new way of thinking about acquisition
• Business process as a service
• Software as a service
� Encourage partnership among Big Data & Analytics developers, traditional biometrics solution providers
• Big Data and Analytics players
15
Challenges
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Big Data Appliance Examples
� IBM Nettezza
� Oracle EXADATA
� Terradata
� EMC2 Greenplum
� SAP HANA
� Schooner Appliance MySQL
Example - (CBP) 40TB data (per appliance, a few hundreds
cores) hosted by a little more than a dozen appliances support
30 – 40 % of DHS’s operations
16
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
17
Biometrics and Identity Analytics in Motion
� ROC curve calibration along the security vs convenience
• Allow systems to dynamically change operation criteria based on live situation
• This is a real challenge due to the needed ground truth…
� Quality Feedback to the Collection
• Avoid collecting ‘bad’ data to degrade the system
� Operating Metrics Monitoring
• Rates on enrollment, rejection and etc.
• Geo-location and temporal information
� Fuse all data sources based on real time feedback
• Dynamically allocating fusion algorithms and configurations
� Provide controlled parallelism
• System and algorithms levels
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Achieve scale:
By partitioning applications into software components
By distributing across stream-connected hardware hosts
Infrastructure provides services for
Scheduling analytics across hardware hosts,
Establishing streaming connectivity
Transform
Filter / Sample
Classify
Correlate
Annotate
Where appropriate:
Elements can be fused together
for lower communication latency
� Continuous ingestion� Continuous analysis
One Approach - Streams Technology in Working
© 2013 IBM
Corporation1
Near Real Time on Big Data Platform
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
19
Summary
� Re-focus on Identity
• Biometrics as an enabling technology
� Re-thinking on
• Open architecture
• Vendor agnostic solution via biometrics middleware
� Big Impact by Big Data and Cloud Technologies
• Biometrics as a Service to Leverage Cloud Computing
� Big Data Real Time Platform
• Near real time analytics requirements
© 2013 IBM Corporation20
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
21
A New Look - Identity and Biometrics Analytics
Stream in Parallel
Big DataPlatform
Entity /Identity Resolution
Big Data Solution
Pipeline Identification Services
Including many Models
Massively Parallel Processing
Real Time
High Volume
� Travel Data
� Banking Data
� Spatial Data
� Temporal Data
Real-time feeds
� Biometrics Capture Data
� Biographic Data
Unstructured data
� Social Media
� Info on Web
� Behavioral data
Report – Descriptive Analytics
Predictive Models
Business Workflow Resolution
Visualization Analytics
Content Analytics