UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry...
Transcript of UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry...
![Page 1: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/1.jpg)
AadhaarScalability & Data Management Challenges
Dr. Pramod K. Varma
Chief Architect, UIDAI
twitter.com/pramodkvarma pramodkvarma.com
![Page 2: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/2.jpg)
Understanding Aadhaar System
2
Understanding Aadhaar System
![Page 3: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/3.jpg)
Establishing ID is a ChallengeEstablishing ID is a Challenge
Mo
bile
P
ho
ne
Ba
nk
A/C
Issue rations A resident typically accesses multiple service providers, at different times
Needs to repeatedlyre-establish ID =
3
Mo
bile
P
ho
ne
Ba
nk
A/C
NREGA jobsPassport
re-establish ID = problem for the poor
Birth records
Address proof
Money to ‘beat’ the system
= No or limited access to entitlements and opportunities
![Page 4: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/4.jpg)
Why Aadhaar?Why Aadhaar?Difficulty in establishing ID
exclusionWeak authenticationinefficient delivery
Financial
Entitlements Ghost Entries
Food, fuel, fertilizer
4
“…biometric-based unique identity has the potential
to address both these dimensions simultaneously.”
- Thirteenth Finance Commission
Financial
Social
Security Net
Duplication
Multiple layers60% unbanked
(~700mn)
Food, fuel, fertilizer subsidy = ~Rs. 1 lac crore
45% BPL do not havea ration card
![Page 5: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/5.jpg)
Enroll Once …Enroll Once …
Demographic Data Biometric Data
Resident’s Photograph• Compulsory data:
– Name, Age/Date of Birth,
• Aadhaar Number - Unique, lifetime, biometric
based identity
5
Resident’s Photograph
Resident’s
Finger Prints
Resident’s
Iris
– Name, Age/Date of Birth,
Gender and
– Address of the resident.
• Conditional data:
– Parents/Guardian details
• Optional data:
– Phone no., email address
![Page 6: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/6.jpg)
… authenticate many times… authenticate many times
• Online service to verify the claim – “are you who
you claim to be?”
• 1:1 check – only a “yes/no” answer• 1:1 check – only a “yes/no” answer
• Authenticate online– Anytime, anywhere, multi-factor
– Always responds with “yes” or “no”
• Open identity platform
– Can be used in any service, any domain
– using any protocol, any device, any network
6
![Page 7: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/7.jpg)
Application ModulesApplication Modules
• Enrolment
• Geographically Distributed Client (mostly offline)
• Enrolment Server with Multi modal, Multi-vendor ABIS
• Authentication
• Geographically Distributed Servers• Geographically Distributed Servers
• Geographically Distributed Devices (several millions)
• Multi-factor support
• Supporting Systems
• Business Intelligence
• Fraud Detection
7
![Page 8: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/8.jpg)
Enrolment ProcessEnrolment Process
CIDR
Enrolment Service
Biometric
De-duplication
UID
Assignment
Letter
Delivery &
Verification
Biometric
De-duplication
UID
Assignment
Logistics
Partner Aadhaar Number
Enrolment
Processing3
4
RegistrarPartner
(India Post)
Customer
Contact
Center
Information/
Issue resolution
(Option A)
Enrolment
Data
to CIDR
(Option B)
Enrolment Data
to CIDR Aadhaar letter or
Rejection letter
Aadhaar Number
And rejection data
Enrolment
Agency
Automatic
Synch
(software/data)
Data
Capture12
2
4
5
8
![Page 9: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/9.jpg)
Authentication ProcessAuthentication Process
9
![Page 10: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/10.jpg)
Enrolment ServerEnrolment Server
• Manages complete Aadhaar enrolment and
lifecycle process
• Features
– Data validation
– Operator, supervisor verification– Operator, supervisor verification
– Biometric de-duplication (1:N matching)
– Manual inspection
– Aadhaar number allocation / rejection
– Letter generation and delivery tracking
– Registrar integration
10
![Page 11: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/11.jpg)
Biometric DeBiometric De--duplicationduplication
• Multi-modal matching
– 1:N matching (Every resident is matched using his/her
biometrics against every entry in the ABIS system)
• Multi-vendor interface through ABIS API
– Dynamic allocation to ABIS vendor based on their – Dynamic allocation to ABIS vendor based on their
accuracy and performance
– Multi-DC architecture adds complexity
• Exception handling
– Mostly automated and manual
– Volumes require highly automated and learning systems
to handle exceptions in an effective manner
11
![Page 12: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/12.jpg)
AuthenticationAuthentication
• Supports answering the question “is a resident the person he/she
claims to be”
– Verifies resident information (demographics, biometrics) for a given
Aadhaar number against the stored data
– Online service that is lightweight, ubiquitous, and secure
– Only responds with a “yes/no” and no personal identity information is
returned as part of the responsereturned as part of the response
• Supports multi-factor authentication using biometrics, PIN, OTP and
combinations thereof
• Supports multiple protocols and devices
– Personal computer, mobile, PoS terminals, etc.
– Many protocols (USSD, SMS, HTTPS) over data and mobile connections
– Works with assisted and self-service applications
12
![Page 13: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/13.jpg)
Scalability and Data Management
Challenges
13
Challenges
![Page 14: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/14.jpg)
Architecture HighlightsArchitecture Highlights
• Support large scaling of enrolments and authentications
• No vendor lock-in across the system
• Use of open-source technologies wherever available and
prudent
• Use of open standards to ensure interoperability• Use of open standards to ensure interoperability
• Ensure wide device driver support for biometric devices
through standardization
• Use of widely adopted technology platforms and tools
• Make all performance metrics (no PII) public through
business intelligence portal for transparency
• Build strong end-to-end security upfront
14
![Page 15: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/15.jpg)
Enrolment Server ArchitectureEnrolment Server Architecture
• Throughput is the key
• Fully distributed compute platform
• Data sharded across multiple RDBMS instances
and DFS
• Highly asynchronous using a high speed
messaging layer
• SEDA (Staged Even Driven Architecture) allows
smarter failure handling
• Multi-DC architecture for near-zero RTO and zero
RPO (adds complexity in biometric d-deuplication)
15
![Page 16: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/16.jpg)
Enrolment VolumeEnrolment Volume• 600 to 800 million UIDs in 4 years
• 1 to 4 million enrolments a day
• When we cover half the country, we will end up doing
– 4 m * 12 * 500 m * 12 biometric matches a – 4 m * 12 * 500 m * 12 biometric matches a day!!!
• Data updates and new enrolments will continue for ever
• Enrolment data moves from very hot to cold needing multi-layered storage architecture
16
![Page 17: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/17.jpg)
Enrolment Data ManagementEnrolment Data Management
• Enrolment require handling of large binary data for all residents– ~5 MB per resident biometrics
– ~3 MB for supporting docs
– Maps to about 8 PB of raw data!
– With replication, it means managing about 25 PB of source data– With replication, it means managing about 25 PB of source data
– Replication and backup across DCs of 4+ TB of incremental data every day for near-zero RTO
• Additional workflow/process/event data– 15+ million events on an average moving through async channels
– Needing complete update and insert guarantees across data stores
• Lifetime updates adds several more petabytes
17
![Page 18: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/18.jpg)
Authentication ServerAuthentication Server
• Authentication poses response time issue
– Match demographics (partial, fuzzy, Indian language
matching)
– Match biometrics (balancing FPIR)
• Needs to scale to handle 100’s of million requests • Needs to scale to handle 100’s of million requests
every day with sub-sec response
• Edge cached, in-memory operation
• Async data updates to the cache
• Stateless service
• Audits maintained asynchronously on HDFS
18
![Page 19: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/19.jpg)
Authentication VolumeAuthentication Volume
• Few 100 million authentications per day
– mostly during 10 hr period
– High variance on peak and average
– Requires async request handling on HTTP server
– Sub second response with support for OTP, guaranteed audits
Multi-DC architecture• Multi-DC architecture
– Fully load balanced
– Mostly reads with some updates (OTP, Audit)
• All changes needs to be propagated from enrolment data
stores to all authentication sites
– PIN updates, OTP requests, and less occasional demographic data
updates
19
![Page 20: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/20.jpg)
Authentication Data ManagementAuthentication Data Management
• Minutiae based authentication request is about 1 K
– Image based ones are about 10 K on an average
• 100 million authentications / day means• 100 million authentications / day means
– 1 billion audit records in 10 days
– 1 TB encrypted audit logs in 10 days
– Need to keep recent audits online accessible any time and older ones in achieve until deleted
– Audit write must be guaranteed
20
![Page 21: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/21.jpg)
Analytics/Mining ArchitectureAnalytics/Mining Architecture
• Analyzing terabytes of data generated out of
billion+ events every day
– Constantly aggregating data across billions of records
on a distributed compute grid to analyze and create
patterns for operational and strategic decision makingpatterns for operational and strategic decision making
• Fraud detection
– Detecting fraud during enrolment
– Detecting identity fraud scenarios near real-time during
authentication
– Building mining, clustering, learning tools to work on
top of billions of events
21
![Page 22: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/22.jpg)
Technology StackTechnology Stack
• Java application deployed on Linux stack with
virtualization
• Multiple MySQL instances as RDBMS
• Apache Hadoop (HDFS, Hive, HBase, Pig) stack for large
scale compute and distributed storagescale compute and distributed storage
• RabbitMQ (AMQP standard) as messaging framework
• Drools for rules engine
• Several other open source libraries
• All 3rd party interfaces abstracted through standard API
layer (VDM, ABIS, Language Support, etc)
22
![Page 23: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/23.jpg)
Final ThoughtsFinal Thoughts
• Largest biometric identity system is about 120
million. Scaling needs are unprecedented.
• Completely built on open standards and open
source platforms
• Scalability, Security, interoperability, and vendor • Scalability, Security, interoperability, and vendor
neutrality a must
• Next generation e-governance applications require
cloud based, large data-driven, open platforms
• Research community support required
23
![Page 24: UID Pramod Varma - Indian Institute of Technology Bombaycomad/2010/pdf/Industry Sessions/UID_P… · – Image based ones are about 10 K on an average • 100 million authentications](https://reader033.fdocuments.net/reader033/viewer/2022060213/5f0558777e708231d4128201/html5/thumbnails/24.jpg)
Thank You!
24
Dr. Pramod K. Varma
Chief Architect, UIDAI
twitter.com/pramodkvarma pramodkvarma.com