Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to MongoDB
-
Upload
mongodb -
Category
Data & Analytics
-
view
641 -
download
10
Transcript of Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to MongoDB
4 TB Audit Log from SQL Server to MongoDB
Michael Poremba
Director, Data Architecture
Practice Fusion
May 2015
+ 20 years software engineering
+ Data architect / application architect
+ High-volume OLTP relational databases
+ Application performance and scalability
+ Domain experience:Health care; financial services; IT management; content management and distribution;
targeted advertising; telecom billing; manufacturing; insurance
Michael Poremba @ Practice Fusion
2
Project BackgroundGetting started
3
+ Cloud-based electronic health records service (EHR)
+ Over 100,000 health care providers in US
+ Over 100,000,000 patient medical records
+ SQL Server OLTP database
Weekday peak ~ 60,000 transactions per second
+ Primary database = 8 TB
+ 50% of primary database is security audit records + indexes
Practice Fusion
4
+ HIPAA: Health Insurance Portability and Accountability Act of 1996
+ Who did what to which patient’s medical record when?
+ Regulatory requirement—audit log must be kept and reviewed
+ Law enforcement and evidence in legal discovery
+ Save the audit log forever
+ Primary use cases:
Audit report in EHR: Security audit log viewer
Physician data analytics: Clinical quality measures (CQM)
HIPAA Security Audit Log
5
6
HIPAA Security Auditing on MongoDB
Project anatomy & lessons learned
7
+ Latency on SAN increased
+ Database writes slowed down
+ Database connections held longer
+ Connection pool expanded
+ User interface locked up—waiting
+ Users tried to log in again
+ Login is heaviest user operation
+ [Repeat]
The Log Jam
Found at: http://anchorhardwoods.com/wp-content/uploads/2011/08/log-jam.jpg
8
Security Auditing – Legacy Architecture
Public
Load
Balancer
App 1
App 2
App n
.
.
.
EHR
(OLTP DB)
ActivityFeed
ActivityFeedParameter
2..10
CQM
Reporting
ETL
Audit
Report
9
Audit Service – New Architecture
Public
Load
Balancer
App 1
App 2
App n
.
.
.
MongoDB
Audit Log
Audit
ServiceAMQ
Queue
Listener
Audit
Report
CQM
Reporting
ETL
10
+ Isolate auditing system from EHR OLTP database
+ Move audit IO off of EHR SAN to AWS
+ New service interface for audit events using .NET
+ Scale out audit service interface on IIS farm
+ Scale out audit data store using MongoDB
Technical Benefits of New Architecture
11
+ Transaction volume: Sustain 1,000 new documents per second
+ Data volume: Scale to 10’s of billions of audit event records
+ High availability and disaster recovery—higher SLA than EHR
+ Quick UI response time for interactive audit report
+ Tamper prevention and detection
No updates or deletes permitted on audit log
Security alerts when audit log is altered
+ Leverage industry standards for health care security audit logging
~300 distinct auditable user actions
Required and varying data elements
Security Auditing – Application Requirements
12
Project Objectives
+ New infrastructure for MongoDB
and AMQ
+ Modernize audit service API
+ Convert ~200 audit events to new
audit service interface
+ Data warehouse ETL from MongoDB
+ Modernize audit report UI
+ Migrate 4 billion exiting audit records
Project: Audit 2.0Coletteprogram management
Ernestservices expert
Bhaviktest engineering
JayMongoDB expert
Jeffcluster architecture
Michaeldata architecture
BrettAMQ expert
Bryaninfrastructure coordination
Rajanidata warehouse ETL 13
Audit
Event
Participant
Object
Audit
SystemUser
0..n1..1 1..2
Health Care Industry Standards for Audit Logging
+ ISO 27789:2013: Health
Informatics – Audit trails for
electronic health records
+ ASTM E2147-01(2013):
Standard Specification for Audit
Disclosure Logs for Use in
Health Information Systems
+ FHIR SecurityEvent – resource
definition for auditing
14
{
"_id" : <BinaryData(4)>, // The audit event GUID
"docHash" : <String; Required>, // Tamper detection
"audOrgGuid" : <BinaryData(4); Required>, // Shard key
"crtdDttmUtc" : <Date; Required>, // Datetime record was inserted
"evnt" : {// Required subdocument
"dttmUtc" : <Date; Required>, // Date/time that event occurred
"typ" : <String; Required>, // Event record type; ~ 300 types
"ptDataTyp" : <String; Required>, // Standard set of patient data types
"actn" : <String; Required>, // Standard set of actions
"sys" : <String; Required> // Source system for audit event
},
"usr" : { // Required subdocument
"usrId" : <String; Required>, // Human-readable ID
"usrGuid" : <BinaryData(4); Required>, // Machine-readable ID
"dispNm" : <String; Required>, // Required; Display name for user
"orgId" : <String; Required>,
"orgNm" : <String; Required>
},
"altUsr" : { // Optional subdocument for second user
... // Subdocument contains same properties as "usr"
},
"pt" : { // Optional subdocument
"ptId" : <String; Required>, // Human-readable ID for patient
"ptPracGuid" : <BinaryData(4); Required>, // Machine-readable ID for patient
"dispNm" : <String; Required>, // Display name for patient
"orgId" : <String; Required>,
"orgNm" : <String; Required>
},
"body" : { // Optional subdocument
... // Flattened list of attributes, specific to audit event subtype
}
}
JSON Document Schema for Audit Events
AuditEvent
ParticipantObject
AuditSystem
User
0..n1..1 1..2
15
Schema Design – Lessons Learned
+ Prop nms strd per doc Long names add up for large collections (ours: 1 TB)
Consider using abbreviated property names
Up-vote this feature request:
https://jira.mongodb.org/browse/SERVER-863
+ Know your application read/write patterns
+ Application responsible for data integrity
+ Be aware of data type behaviors Indexed string search is case sensitive. Upvote:
https://jira.mongodb.org/browse/SERVER-90
Several binary data types for UUID—use type 4
(default type is specific to database driver)Found at: http://www.milesfinchinnovation.com/blog/wp-
content/uploads/2013/02/iStock_000019474446Medium.jpg
16
Schema Design – Lessons Learned
Leverage native data types:
+ Date
+ Boolean
+ Numeric "1" + "1" "11"
"11" + "1" "111"
+ UUID "8c290139-f4e3-49c1-9ba2-a883defc6a15"
"8C290139-F4E3-49C1-9BA2-A883DEFC6A15"
"8c29-0139-f4e3-49c1-9ba2-a883-defc-6a15"
"8c290139f4e349c19ba2a883defc6a15"
"{8c290139-f4e3-49c1-9ba2-a883defc6a15}"
"{8C290139-F4E3-49C1-9BA2-A883DEFC6A15}"
Found at: http://www.industryweek.com/innovation/innovation-one-size-fits-one
17
ActivityFeed
Audit EventType
ActivityFeed
Parameter
Action TypePatient
Data Type
(~300)
(~4 billion)
(~30 billion)
(10) (18)
UserPatient
(~100,000)(~100 million)
Practice
(~50,000)
Legacy Auditing System – Relational Schema
Issues around data normalization
+ New requirements introduced
+ Filter criteria and sort criteria
stored in five different tables
+ Audit events must be read into
memory for filtering and sorting
Join and expand data set by practice
Sort and filter expanded data set
+ Response time suffers for large
practices with many audit events
18
Schema Design – Lessons Learned
ActivityFeed
Audit EventType
ActivityFeed
Parameter
Action TypePatient
Data Type
UserPatient
Practice
Denormalize with care:
{
"_id" : <BinaryData(4)>,
"docHash" : <String; Required>,
"audOrgGuid" : <BinaryData(4); Required>,
"crtdDttmUtc" : <Date; Required>,
"evnt" : {
"dttmUtc" : <Date; Required>,
"typ" : <String; Required>,
"ptDataTyp" : <String; Required>,
"actn" : <String; Required>,
"sys" : <String; Required>
},
"usr" : {
"usrId" : <String; Required>,
"usrGuid" : <BinaryData(4); Required>,
"dispNm" : <String; Required>,
"orgId" : <String; Required>,
"orgNm" : <String; Required>
},
"pt" : {
"ptId" : <String; Required>,
"ptPracGuid" : <BinaryData(4); Required>,
"dispNm" : <String; Required>,
"orgId" : <String; Required>,
"orgNm" : <String; Required>
},
"body" : { ... }
}19
+ Millions of audit events per medical practice
+ Require fast response time for interactive audit report UI
+ Audit report UI allows events to be sorted/filtered five different ways
+ UI allows paging through audit events
+ Create a secondary index for each sort method
Index Design
20
+ Organization, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "evnt.dttmUtc": -1} );
+ Organization, patient, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "pt.ptId": 1, "evnt.dttmUtc": -1 } );
+ Organization, user, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "usr.usrId": 1, "evnt.dttmUtc": -1 } );
+ Organization, patient data type, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "evnt.ptDataTyp": 1, "evnt.dttmUtc": -1
} );
+ Organization, user action type, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "evnt.actn": 1, "evnt.dttmUtc": -1} );
+ Document created date DESCdb.auditEvent.ensureIndex ( {"crtdDttmUtc": -1 } );
Index Definitions
21
+ Filter by practice GUID
+ Sort by event created date time, descending order
+ Limit to 20 documents
db.auditEvent.find( {"audOrgGuid": BinData(4,"ABrlAG57Rx6gY3zyHzFK3Q==")} )
.sort( {"evnt.dttmUtc" : -1} ).limit(20).explain();
{
"clusteredType" : "ParallelSort",
"shards" : {
"RepSet02/MNGODDB03-SHRD02:27018, MNGODDB04-SHRD02:27018" : [
{
"cursor" : "BtreeCursor auditEvent_audOrgGuid_dttmUtc",
...
} ] }
...
"numshards" : 1,
...
Query Plan
22
Indexing Strategy – Lessons Learned
+ As with relational databases,
indexes are essential for efficient
queries
+ Learn how to use .explain()
to read query plans
+ Avoid collection scans"cursor" : "BasicCursor"
+ For compound indexes, query sort
order must match index sort order
+ Enable mongod --notablescan
option in test / staging environmentsFound at: http://www.ebay.com/itm/13-pc-Hex-Shank-Titanium-Drill-Bit-Set-Quick-Change-
Bits-/350526103504?pt=LH_DefaultDomain_0&hash=item519cfbdfd0
23
Principle of least privilege
+ MongoDB cluster not accessible from public Internet
+ Security enabled on cluster
+ Application users granted minimum permissions required
Signed audit events
+ Audit events signed with hash of audit event contents
+ Recompute hash on reads—test the data against hash value
+ Send security alert when hash does not match
Oplog monitoring
+ Use mongo-connector Python scripts to monitor oplog
+ Watch for .update() and .delete() operations on collection
+ Send security alert when data changes are detected
Tamper Prevention and Detection
Found at:http://legacymedia.localworld.co.uk/275663/Article/images/17639732/4416792.jpg
24
Security – Lessons Learned
+ Minimize network access to
MongoDB cluster
+ Enable authentication
+ Leverage role-based
authorization
+ Use SSL (MongoDB Enterprise)
+ Disable REST interface and
HTTP status interface
Found at: http://www.harborfreight.com/3-1-2-half-inch-circular-padlock-98972.html
25
+ Shard the database to scale out
+ Begin with small number of shards (2 or 3)
+ Group all audit events from the same medical practice
Every audit event is “owned” by some practice
Audit report UI always queries events by medical practice
+ Composite shard key on { PracticeGuid, _id }db.runCommand({
shardcollection : "AuditLog.auditEvent",
key: {audOrgGuid: 1,
_id: 1}});
Transaction Volume: 1,000 New Documents per Second
Found at:http://s3.amazonaws.com/Reconsales/800/0bfe72e0-9b06-42ac-9644-5727a3ca9c79.jpg
26
Sharding the Database – Lessons Learned
+ At the onset of development
determine whether to shard
+ Specify shard key in queries Allows mongos to route query
Minimize distributed “scatter/gather” queries
Queries spanning chunks likely span shards
+ Choose a key that allows even
balancing Balancing is performed in 32 MB chunks
Design shard key to ensure chunks will not
exceed 32 MB
Found at: http://www.airbrushaction.com/content/sites/default/files/tipstricks-images/4_27.png 27
High Availability and Disaster Recovery – Replica Sets
+ If audit log is down, then 100,000
health care providers are idle
+ Audit logging subsystem must be
more reliable than customer EHR
+ Node failover must be automatic
+ Protect against network and data
center failure scenarios
Found at: http://www.huntsmart.com/App_Themes/hs.com/ProductImages/250/DNSBC.jpg
28
Disaster Recovery
DCPrimary DC DC2 AZ2
Sharded Cluster Replicated Across Multiple Data Centers
config
mongos shard 2
arbitermongos
amq
arbiter
amq
DC3 AZ1
shard 2
DC2 AZ1
shard 2
mongos shard 3
arbitermongos
arbiter
shard 3shard 3
mongos shard 1
arbitermongos
arbiter
shard 1shard 1
config config
amq amq
29
Performance and Stress Testing – Lessons Learned
+ Acquire or build load testing tools
+ Test using a realistic, unbiased data set
+ Test database cluster to ensure write
throughput
+ Ensure read & write performance meets
load requirements
+ Find the performance ceiling
+ Find and resolve bottlenecks
+ Tune IO and memory
Found at: http://www.webdesign.org/img_articles/21892/broken_chain.jpg
30
Data Migration – Lessons Learned
Data Migration
+ Parallelize data migration process
+ Identify and remove bottlenecks
+ Scale out MongoDB cluster to handle
heavy write load
+ Determine whether best to add
indexes before or after migration
+ It takes a while to extract, transform,
and load billions of documentsFound at: http://www.dennissy.com/wp-content/uploads/2010/07/house_moving_malaysia.jpg
31
Data Repair – Lessons Learned
32
Bulk update on collections
+ Use Bulk() operation builder
bulk.find.update()
Simple, unordered parallelized
> 200,000 updates/minute
+ Regular update operation
~ 2,000 updates/minute
Choosing the Appropriate Data Store
MongoDB over relational?
+ Scale out for transaction volume
and data volume
+ Developer productivityEasy map between application and data store
+ Highly varying document
structure
+ Offload read activity in optimized
format different from data writes(a.k.a. CQRS pattern)
Found at: http://www.meonuk.com/hammers-mauls
33
Choosing the Appropriate Data Store
Relational over MongoDB?
+ Complex normalized data model
+ Diverse read patterns requiring
joins
+ Ad hoc reporting and analysis
+ Data integrity difficult to manage
in application layerFound at:
http://3.bp.blogspot.com/_QUmmdgc7l6A/TTPUyRWFNPI/AAAAAAAAAO8/KV_i2c2lrRk/s1600/saws+various.jpg
34
MongoDB @ Practice Fusion
Upcoming MongoDB projects
+ Observations data store
Scale-out data store for
patient vital signs, etc.
+ Clinical data repository
Read cache for patient medical
records (CQRS pattern)
+ Upgrades for Audit 2.0
WiredTiger + compressionFound at: http://jbirdmedia.org/vessels/images/uploads/framing-new-const-lg.jpg
35
Q&A
Michael Poremba
linkedin.com/in/michaelporemba
@mporemba36