Health Db Primer

4
healthDB: A Primer Parag Patel, Shahid Shah Overview healthDB is a incrementally scalable, fault-tolerant, ACID compliant, key/value document based database designed to hold huge amounts of data and has high throughput read/writes and high availability. It is based on an open source project called couchDB. It is designed to be a data warehouse for the disparate systems that might be part of a healthcare practice or hospital. Due to it!s semi-structured data storage nature, it can hold data of any type. The end user need not worry about structuring the data in the data warehouse; the data will be stored in the warehouse for future extraction and structuring as the user sees fit. Future versions of healthDB will help the end user structure data from the semi-structured state it is in. Conceptually one can think of lazy evaluation in scheme, lisp, haskell. Once the user knows the structure they want to put the data in, it will be a cinch to implement the structure in healthDB. The design of database encompasses a “just works” philosophy. The database should work as advertised. The end user should only have to worry about building their application or service, instead of worrying about the storage of there data and performance. Most of the traditional work that a DBA has done will be done by healthDB. All the end user has to do is start it up initially and add additional servers as the healthDB will dictate in order scale. HealthDB will have a connector engine, that will connect to common interfaces such as HL7, JMS, ODBC, various delimited file formats, and has the ability to develop custom connectors to connect to unusual interfaces. HealthDB will support in the future health query language (HQL) (as an external or internal component tbd), will allow them to search all their structured and semi- structured data to find knowledge they seek in a health domain. HealthDB will come with some sample applications to show end users just the power it holds. Architecture healthDB uses couchDB to primarily take care of the low level storage. It communicates to couchDB (couchDB might need to be modified for encryption) using encrypted REST. A diagram shows the basic outline of healthDB.

Transcript of Health Db Primer

Page 1: Health Db Primer

healthDB: A PrimerParag Patel, Shahid Shah

Overview

healthDB is a incrementally scalable, fault-tolerant, ACID compliant, key/value document based database designed to hold huge amounts of data and has high throughput read/writes and high availability. It is based on an open source project called couchDB. It is designed to be a data warehouse for the disparate systems that might be part of a healthcare practice or hospital. Due to it!s semi-structured data storage nature, it can hold data of any type. The end user need not worry about structuring the data in the data warehouse; the data will be stored in the warehouse for future extraction and structuring as the user sees fit. Future versions of healthDB will help the end user structure data from the semi-structured state it is in. Conceptually one can think of lazy evaluation in scheme, lisp, haskell. Once the user knows the structure they want to put the data in, it will be a cinch to implement the structure in healthDB.

The design of database encompasses a “just works” philosophy. The database should work as advertised. The end user should only have to worry about building their application or service, instead of worrying about the storage of there data and performance. Most of the traditional work that a DBA has done will be done by healthDB. All the end user has to do is start it up initially and add additional servers as the healthDB will dictate in order scale. HealthDB will have a connector engine, that will connect to common interfaces such as HL7, JMS, ODBC, various delimited file formats, and has the ability to develop custom connectors to connect to unusual interfaces. HealthDB will support in the future health query language (HQL) (as an external or internal component tbd), will allow them to search all their structured and semi-structured data to find knowledge they seek in a health domain. HealthDB will come with some sample applications to show end users just the power it holds.

Architecture

healthDB uses couchDB to primarily take care of the low level storage. It communicates to couchDB (couchDB might need to be modified for encryption) using encrypted REST. A diagram shows the basic outline of healthDB.

Page 2: Health Db Primer

The healthDB engine is the main control unit of the healthDB. It has a job of ensuring the user can store data in a seamless fashion. It takes care of such task as automatic partitioning, replication, encryption of the data, automatic load balancing, automatic system backup, error logging.

healthDB engine

The healthDB engine is made of up various components such as the partitioner, replicator, connector engine, healthCPU, security, and healthDB API (healthSearch will be additional component, it is undetermined whether it should sit in the healthDB engine or couchdB. We shall look at each component of the engine briefly. Note: additional components maybe added, components maybe merged or deleted.

healthDB API

Provides the healthDB interface to the outside world. It will be the only way to communicate with the database, Multiple API should be developed such as python, ruby, java, C#, REST.

connector engine

This connector engine allows data from a variety of different formats to be converted to a format that healthDB can understand while preserving integrity.

healthDB

couchDB

healthDB engine

Page 3: Health Db Primer

healthCPU

This is the brain of the healthDB database. It controls when the healthDB should replicate data and when it should partition data. It does the job of the looking up data in the datastore (couchDB), formatting, structuring, and semi-structuring data that will be stored in the datastore. It ensures that data HIPPA compliant, by having he security component encrypt it. HealthCPU also maintains which nodes are alive and what the status is. It does the job of load balancing. Filters out data based on the users permissions.

security

This performs the encryption, authentication, and tells the healthCPU the user has permission to certain data or not.

replicator

Creates a new database replication based on what the healthCPU tells it.

partitioner

Creates new partitions on the data and places the data on server(s) the healthCPU specifies.

Diagram of the healthDB engine below.

healthDB engine

healthDB API

connector engine

healthCPU

security

replicator partitioner

Page 4: Health Db Primer

Storage Structure

The healthCPU will store unstructured data as follows. It will have a series of documents that keep track of data from various sources. Each source will have its own document(s). The document will contain (key,values) for (hash(document_sourcesystem_objectID),document_sourcesystem_objectID). A record from a source system will be store in its own separate document which will have system values such as last modified date, and the actual data itself. The record will be called a DBobject. The document name will be used to identify the DBobject.

Other entities like DBobject can be created. We might have a person entity, which would be identified by document_person_personID. Very similar to the DBobject concept in which a series of documents contain references or indexes to the actual records.