C DAC Winter Project Report-7

download C DAC Winter Project Report-7

of 29

description

Project Work at Centre for Development of Advanced Computing, Pune in December 2014

Transcript of C DAC Winter Project Report-7

  • Building a DICOM/HL7 compliant

    Anonymizer Service conforming to

    HIPAA De-identification Regulations

    Winter Project Work December-January 2014-15

    by

    Sanchit Alekh

    under the guidance of

    Mr. Gaur Sunder

    Advanced Computing Training School

    Centre for Development of Advanced Computing

    Westend Center III, Aundh

    Pune - 411007 , Maharashtra

  • Contents

    1 Introduction 4

    1.1 An Introduction to the Standards . . . . . . . . . . . . . . . . . . . 5

    1.1.1 DICOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.1.2 Health Level 7 . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.1.3 HIPAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    1.2 An Introduction to the SDKs . . . . . . . . . . . . . . . . . . . . . 8

    1.2.1 C-DAC Medical Informatics Standards Software Develop-

    ment Kit for DICOM . . . . . . . . . . . . . . . . . . . . . . 8

    1.2.2 C-DAC Medical Informatics Standards Software Develop-

    ment Kit for HL7 . . . . . . . . . . . . . . . . . . . . . . . . 9

    2 Why De-identify DICOM and HL7 Files? 11

    3 Implementation 13

    3.1 Analysis after Preliminary Study of DICOM Standard . . . . . . . . 13

    3.2 Clinical Trial De-identification Profiles - Supplement 142 . . . . . . 14

    3.3 Determining the Attributes to be De-identified . . . . . . . . . . . . 16

    3.4 How the determined attributes are De-identified . . . . . . . . . . . 19

    3.5 PERT Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    3.6 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    4 Results and Conclusion 23

    i

  • 5 Scope for Future Work 24

    ii

  • Acknowledgements

    The author would like to thank Mr Gaur Sunder for building the foundation

    of the project, as well as providing necessary technical guidance at the outset,

    which made the direction of the project absolutely lucid and clear. He would also

    like to acknowledge Mr Aditya Kumar Sinha for giving him a chance to work

    on a Winter Project at C-DAC Aundh, Pune, and for the constant support and

    motivation through the course of the project. A special mention to Ms Vineeta

    Tiwari and Ms Avanti Joshi who were always present for any kind of assistance

    related to the project, and who monitored the progress on a daily basis to make it

    reach its conclusion.

    Date: January 2nd 2015

  • Certificate

    This is to certify that Mr. Sanchit Alekh has successfully completed the project

    titled Building a DICOM/HL7 compliant Anonymizer Service conform-

    ing to HIPAA De-identication Regulations under the guidance of Mr. Gaur

    Sunder as a Project Trainee at the Centre for Development of Advanced Comput-

    ing, Pune

    This Project Report is the proof and record of the bonafide research work carried

    out by him from December 2014-January 2015.

    Mr. Aditya Kumar Sinha

    Principal Technical Officer

    Centre for Development of Advanced Computing, Pune

    Date: January 2nd 2015

  • Abstract

    DICOM and HL7 have emerged as the most widely used standards for representation

    and communication of medical textual as well as imaging data. However, as the

    amount of transactions between PACS systems increase, so does the probability of

    data pilferage, which puts a huge amount of sensitive Patient Health Information

    at risk. Ensuring HIPAA conformance is a now a legal requirement in the United

    States. We have used CDACs DICOM and HL7 Software Development Kit to

    create an Anonymizer Service that takes in a DICOM/HL7 compliant data, de-

    identifies it to comply to HIPAA, and serializes the modified file, which can then

    be used for transmission to external PACS or for research. The software identifies

    and extracts the DICOM and HL7 tags for the 18 Personal Health Identifiers, as

    prescribed by HIPAA and feeds in dummy anonymized data in place of the original

    data.

    Keywords: DICOM, HL7, HIPAA, PACS, De-identification, Anonymization

  • Chapter 1

    Introduction

    In the present age of unprecedented blast of data, not just in Medical Sciences, but

    in different other domains, privacy has become an important concern. With refer-

    ence to Medical Informatics, if databases containing sensitive archived information

    are made available to the public, there can be a serious breach of privacy. Legal

    systems such as HIPAA are in place, to ensure that medical service providers can

    not unscrupulously reveal Personal Health Information (PHI). At the same time,

    we are presented with a trade-off. The information removed while de-identifying

    structured or unstructured data must not affect the dataset to an extent that it

    becomes useless for research.

    As Medical Science spreads its wings, need was felt to adopt standards for the

    methods of exchange of data, data security and storage, queries and communication.

    Additionally, the reliance on automated information systems has made imperative

    to make sure that data architecture across different applications are not ad-hoc

    and idiosyncratic. At the same time, with sophisticated machinery operating in

    the Health Domain, need was also felt to minimize incompatibility and maximize

    interoperability [14]. Keeping the aforementioned points in mind, two standards,

    namely HL7 (abbreviation for Health Level Seven) and DICOM (abbreviation for

    Digital Imaging and Communications in Medicine) rose into prevalence.

    4

  • 1.1 An Introduction to the Standards

    1.1.1 DICOM

    DICOM is an ISO certified standard (ISO 12052) that defines the formats for med-

    ical images that can be exchanged with the data and quality necessary for clinical

    use. Additionally, it also defines the data structures (formats) for medical images

    and related data, network oriented services, e. g. image transmission, query of an

    image archive (PACS), print and RIS - PACS - modality integration. DICOM is im-

    plemented in almost every radiology, cardiology imaging, and radiotherapy device

    (X-ray, CT, MRI, ultrasound, etc.), and increasingly in devices in other medical

    domains such as ophthalmology and dentistry. With tens of thousands of imaging

    devices in use, DICOM is one of the most widely deployed Healthcare Messaging

    standards in the world. There are literally billions of DICOM images currently in

    use for clinical care. Since its first publication in 1993, DICOM has revolutionized

    the practice of radiology, allowing the replacement of X-ray film with a fully digital

    workflow. Much as the Internet has become the platform for new consumer informa-

    tion applications, DICOM has enabled advanced medical imaging applications that

    have changed the face of clinical medicine. From the emergency department, to car-

    diac stress testing, to breast cancer detection, DICOM is the standard that makes

    medical imaging work for doctors and for patients. [10]In 1995, DICOM was ac-

    cepted as a formal standard in Europe (MEDICOM, ENV 12052). The latest official

    documentation of DICOM can be found at http://medical.nema.org/standard.html

    1.1.2 Health Level 7

    Health Level Seven refers to the set of ANSI certified international standards for

    the exchange of clinical and administrative information between Hospital Informa-

    tion Systems. The standard aims to achieve a reduction in custom interface pro-

    gramming, support exchanges among the widest variety of technical environments,

    programming languages and operating systems, and to be built upon the experi-

    5

  • ence of existing protocols and accepted industry-wide standard protocols. The term

    Level 7 refers to the highest level of the Open System Interconnection (OSI) model

    of the International Organization for Standardization (ISO). HL7 corresponds to

    the conceptual definition of an application-to-application interface operating in the

    seventh layer of the OSI model, and therefore the reference. The Standard cur-

    rently addresses the interfaces among various systems that send or receive patient

    admissions/registration, discharge or transfer (ADT) data, queries, resource and

    patient scheduling, orders, results, clinical observations, billing, master file update

    information, medical records, scheduling, patient referral, and patient care. It does

    not try to assume a particular architecture with respect to the placement of data

    within applications but is designed to support a central patient care system as well

    as a more distributed environment where data resides in departmental systems.

    Instead, HL7 serves as a way for inherently disparate applications and data archi-

    tectures operating in a heterogeneous system environment to communicate with

    each other. [14] In the current project, we used the version 2.5 of Health Level 7.

    The full documentation of the various versions of Health Level 7 can be found at

    http://www.hl7.org/implement/standards/index.cfm?ref=nav

    1.1.3 HIPAA

    HIPAA stands for Health Insurance Portability and Accountability Act - 1996. A

    major goal of the Privacy Rule is to assure that individuals health information is

    properly protected while allowing the flow of health information needed to provide

    and promote high quality health care and to protect the public health and well

    being. The Rule strikes a balance that permits important uses of information, while

    protecting the privacy of people who seek care and healing. Given that the health

    care marketplace is diverse, the rule is designed to be flexible and comprehensive

    to cover the variety of uses and disclosures that need to be addressed.[9] In its

    definition of Personal Health Identifiers, HIPAA enlists 18 attributes that must be

    de-identified before the medical data can be made public for research and analysis.[2]

    6

  • They are :-

    1. Names

    2. All geographical subdivisions smaller than a State, including street address,

    city, county, precinct, zip code, and their equivalent geocodes, except for the

    initial three digits of a zip code, if according to the current publicly avail-

    able data from the Bureau of the Census: (1) The geographic unit formed

    by combining all zip codes with the same three initial digits contains more

    than 20,000 people; and (2) The initial three digits of a zip code for all such

    geographic units containing 20,000 or fewer people is changed to 000.

    3. All elements of dates (except year) for dates directly related to an individual,

    including birth date, admission date, discharge date, date of death; and all

    ages over 89 and all elements of dates (including year) indicative of such age,

    except that such ages and elements may be aggregated into a single category

    of age 90 or older

    4. Phone numbers

    5. Fax numbers

    6. Electronic mail addresses

    7. Social Security numbers

    8. Medical record numbers

    9. Health plan beneficiary numbers

    10. Account numbers

    11. Certificate/license numbers

    12. Vehicle identifiers and serial numbers, including license plate numbers

    7

  • 13. Device identifiers and serial numbers

    14. Web Universal Resource Locators (URLs)

    15. Internet Protocol (IP) address numbers

    16. Biometric identifiers, including finger and voice prints

    17. Full face photographic images and any comparable images

    18. Any other unique identifying number, characteristic, or code

    1.2 An Introduction to the SDKs

    The code for this project was written in Java SE 1.7 on Eclipse IDE using the

    Centre for Development of Advanced Computings Medical Informatics Standards

    SDK for DICOM and HL7

    1.2.1 C-DAC Medical Informatics Standards Software De-

    velopment Kit for DICOM

    C-DACs Medical Informatics Standards Software Development Kit for DICOM is

    a toolkit that provides APIs for applications/ medical devices to comply with NE-

    MAs DICOM Standard. The toolkit is designed to be easily programmable allowing

    DICOM developers to build sophisticated and complex applications on top of it.

    The layered approach gives ability to build applications exploiting different levels

    of capabilities defined by DICOM standard. The toolkit provides comprehensive

    implementation of all standard-defined IODs with supported transfer syntax. It

    also has complete support for all Composite and Normalized services defined in the

    standard. This enables developers to build DICOM-defined SCU/SCP components

    for any required set of IODs for required set of transfer syntaxes. The toolkit is

    provided as pre-built set of SKUs of combination of DIMSE Services and IODs.

    Object oriented approach of the toolkit allows developers to build applications in

    8

  • short time. The toolkit provides default implementations for almost all the func-

    tionalities defined by the standard. This enables rapid development of applications

    through default mechanisms. [3] Some of the most notable features of the SDK are

    as follows :-

    Complete set of supported IODs and DIMSE Services

    Common API Framework

    Rapid Application Development Tool

    High Return on Investment

    Dual View Approach

    Highly configurable and extensible

    Debugging, Validation and logging support

    Memory and speed efficient

    1.2.2 C-DAC Medical Informatics Standards Software De-

    velopment Kit for HL7

    C-DACs Medical Informatics Standards Software Development Kit for HL7 is a

    toolkit that provides APIs for applications/ medical devices to comply with Health

    Level 7s HL7 Standard.It is a rapid application development tool which provides

    high return on investment through cost effective implementation of the standard.The

    toolkit provides comprehensive implementation of all standard-defined systems like

    Patient Administration System, Financial Management System, etc. All HL7 de-

    fined messages/ events and queries are supported by the toolkit. This enables

    programmers to build comprehensive Health Information Systems using toolkit.

    Since the toolkit helps building applications and is itself not an application, it is

    designed to be highly configurable and extensible. It can be configured for logging

    9

  • mechanism, internationalization support. HL7 defined value tables are available

    by default with the library. Along with this, the programmer can add custom

    value tables to the library. The toolkit performs strict validations as per HL7 stan-

    dard specifications. Logging support is customizable and it supports different levels

    during logging enabling developers to debug applications and view data generated

    during protocol violation.[4]Some of the notable features of the SDK are :-

    Complete set of HL7-defined systems

    Common API Framework

    Rapid Application Development Tool

    High Return on Investment

    Object-Oriented Approach

    Highly configurable and extensible

    Debugging, Validation and logging support

    10

  • Chapter 2

    Why De-identify DICOM and

    HL7 Files?

    Electronic Health Records are changing the landscape of Medical Sciences. They are

    not only making patient-data more accessible and organized, but also cost-effective.

    A survey by the eHealth Initiative [5] proved the increasing impact of e-health sys-

    tems on Health Information Exchange. In the survey conducted, about 69 percent

    of the 130 participants reported a positive effect on cost reduction resulting from

    reduced staff time, less redundant tests and a reduction of patient admissions. Over

    two-thirds of US hospitals (68 percent) had either fully or partially implemented

    Electronic Health Records (EHRs) in 2006. EHRs promise long-term cost reduc-

    tion (about $175 billion a year in the US) and faster processing in the evaluation

    of patient data [6]

    However, such a massive prevalent usage of EHRs have led to a compromise of

    patient privacy. The introduction of electronic health systems is accompanied by the

    discussion of security and privacy issues. For example, HIPAA privacy Complaints

    grew by 6% in 2008 [7][13]and surveys show that the majority of organizations

    is unprepared for many security threats. DICOM offers a low degree of security

    mechanisms, and it is unto the implementer to implement the security features that

    form a part of DICOM.

    11

  • Both DICOM and HL7 standards were made for easy interoperability of Medical

    Data, and privacy concerns were not top priority. The Security Modules were added

    later, but not without inherent deficiencies. The security of DICOM is heavily

    dependent on the encryption of the Communication Channels. However, for highly

    sensitive medical data, this is often not sufficient. Therefore security measures need

    to be coupled with health archive systems such as the PACS.[1] Abouakil et. al

    suggested a method of policy based disclosure system, where patient must have the

    right to decide who should be able to access his/her data. [1] Insider threats have

    become a more serious problem than external attackers, and therefore sharing data

    with all doctors without policy-based access is problematic. [8]

    Moreover, research conducted by Lakhani et. al has shown that out of the

    existing DICOM anonymizers, only two DICOM programs anonymized 100% of

    tags directly containing PHI named by HIPPA. DICOMWORKS anonymized the

    least PHI (18%), followed by MIRC (45%), DVTK (55%), RUBO (64%), OSIRIX

    (64%), FP IMAGE (100%), and DICOM ANON LIGHT (100%) using their de-

    fault settings. All of the programs failed to anonymize some degree of patient- or

    study-specific information that could potentially represent or lead to discovery of

    PHI or patient identity. MIRC anonymized the least of this information (16%),

    followed by DICOMWORKS (26%), OSIRIX (39%), FP IMAGE (50%), RUBO

    (53%), DVTK (66%), and DICOM ANON LIGHT (84%). Many freely available

    DICOM anonymizers still leave PHI and other specific information that could rep-

    resent or lead to the identity of a patient, and therefore vigilance should be used

    even after anonymization.[12] However, if HIPAA-conformance is not ensured, DI-

    COM data can not be published for research purposes, and can not be freely shared.

    Therefore, the need for a fresh approach to DICOM anonymization was felt, which

    led to the birth of this project.

    12

  • Chapter 3

    Implementation

    3.1 Analysis after Preliminary Study of DICOM

    Standard

    After a comprehensive study of the DICOM 2014c standard as well as its security

    profiles, it was found that since DICOM specifies all its attributes across all Infor-

    mation Object Definitions in the form of tags, which are a compiund of group and

    element numbers, the first step of any de-identification process would be to extract

    the tags of potentially identifying Patient Information, as prescribed by HIPAA

    in its set of 18 Personal Health Identifiers (PHI). On the basis of a Preliminary

    Analysis, a broad Project Outline was constructed, which is illustrated in Fig 4.1

    Data Dictionary in the DICOM standard contains the registry of all DICOM

    Data Elements and all DICOM Unique Identifiers that are defined within the stan-

    dard, therefore the software would have to parse the Data Dictionary and find the

    pre-identified tags. Fig 4.2 shows the DICOM Data Dictionary format (taken from

    Table 6-1 of the DICOM Standard Documentation) . C-DACs DICOM SDK de-

    fines the DictionaryReader class that gives the populated instance of the DICOM

    Dictionary through which all DICOM specified data elements can be accessed. The

    following code snippet, for example, creates an instance of the DictionaryReader

    13

  • class called as newDictionary, and retrieves the name of the attribute given by the

    tag 0x0020, 0x000E, which is then printed to console.

    DictionaryReader newDictionary = DictionaryReader.createInstance();

    String strName = dicreader.getName(0x0020,0x000E);

    System.out.println(strName);

    Figure 3.1: Proposed Workflow after Preliminary Analysis

    3.2 Clinical Trial De-identification Profiles - Sup-

    plement 142

    Supplement 142 was introduced later, as an addition to the DICOM Security Pro-

    file, and was later added to the DICOM documentation as part of Chapter 15, ie.

    14

  • Figure 3.2: Typical Structure of DICOM Data Dictionary

    DICOM Security Modules. DICOM Supplement 142 states the following :-

    In clinical trials, images are often acquired during the course of clinical care, in

    which case the patients individually identifiable information needs to be removed to

    protect the patients privacy. In addition, there is often a need to remove other infor-

    mation not directly related to the patients identity per se, but which might assist in

    recovering their identity or bias the image interpretation in some way. Conversely,

    it is important to preserve certain specific information for quality control and anal-

    ysis that is essential to the conduct of the clinical trial, which might otherwise be

    removed. Since many clinical trials are conducted globally, both nationally and lo-

    cally specific privacy concerns (such as espoused by the EU Directive and HIPAA

    rule and individual IRBs and ethics committees) need to be addressed. Data and

    images acquired for clinical trials are also often released for secondary re-use, in

    which case addressing privacy concerns requires great vigilance. In general, it is

    impractical to leave the decisions as to what to retain or remove the individual sites

    or trials. There are also other scenarios in which de-identification may be required,

    15

  • such as creation of teaching files, other types of publication, as well as submission

    of images and associated information to registries, such as oncology or radiation

    dose registries. [11]

    The Supplement 142 contained an exhaustive list of attributes and their tags

    and the actions for de-identification of the DICOM file. An illustration of what

    fields and content a typical Supplement 142 table contained is illustrated in Fig

    4.3 On the basis of what data was removed, DICOM created four kinds of Clinical

    trials de-identification profiles. They were :-

    1. Retain Patient Characteristics

    2. Retain Device Information

    3. Retain UIDs

    4. Retain Safe Private Option

    3.3 Determining the Attributes to be De-identified

    The two reference sources from where the names of the attributes can be retrieved

    is by using the DICOM Data Dictionary in entirety, HIPAA Safe Harbor List [2]

    and the exhaustive list given by DICOM Supplement 142 [11]. However, since the

    aim of the project is to build a HIPAA-conforming anonymizer, the best approach is

    to pick up those attributes from the exhaustive list given in Supplement 142 which

    also do not comply to HIPAA regulations.

    Eg. when we try to look at the first out of the 18 PHI attributes as given

    by HIPAA, i.e. name, and try to find out the exact attibutes and their tags of

    all the name attributes in the Supplement 142 document, we come across 22 such

    attributes, which are given in Fig 4.4. Therefore, in the de-identification process,

    for the PHIs of type Name, these 22 attributes will be de-identified by the software.

    16

  • Figure 3.3: De-identification Table as given in the Supplement 142

    17

  • Figure 3.4: Name Attributes to be De-identified

    18

  • 3.4 How the determined attributes are De-identified

    Once the required attributes have been determined, the SDK can be used to parse

    the DICOM file, de-identify the relevant attributes, and serialize the file back in the

    DICOM format. A Deidentifier class has been created that will execute the process

    of the De-identification. In the implementation, two arrays are created for each PHI

    type, storing the group and element numbers of the attributes to be de-identified.

    An instance of the Data Dictionary is created, and for each of the indexed group-

    element pair thus formed, the code looks up the name of the attribute from the

    Data Dictionary. The code then retrieves the DataElement for that group-element

    pair, and instead, puts in the de-identified string given by Anonymized + name of

    attribute . As for example, if the group-element pair is 0x0010,0x0010, then the

    de-identified string will be Anonymized Patient Name.

    3.5 PERT Chart

    For a month-long project, time management and planning are detrimental and play

    a major role in determining the efficacy of the project. Fig 3.5 illustrates the project

    timeline in the form of a Pert Chart.

    Figure 3.5: PERT Chart for the Project

    19

  • 3.6 Code

    import java.io.BufferedReader;

    import java.io.IOException;

    import java.io.InputStreamReader;

    import cdac.medinfo.sdk.dcm30_04.baselibs.DCMInit;

    import cdac.medinfo.sdk.dcm30_04.baselibs.ImplicitDE;

    import cdac.medinfo.sdk.dcm30_04.utils.DicomConstants;

    import cdac.medinfo.sdk.dcm30_04.commoninterface.IDataElement;

    import cdac.medinfo.sdk.dcm30_04.commoninterface.IDataSet;

    import cdac.medinfo.sdk.dcm30_04.commoninterface.IDicomFileHeader;

    import cdac.medinfo.sdk.dcm30_04.commoninterface.IDictionaryReader;

    import cdac.medinfo.sdk.dcm30_04.commoninterface.ITransferSyntax;

    import cdac.medinfo.sdk.dcm30_04.dicomparserserializers.DicomParser;

    import cdac.medinfo.sdk.dcm30_04.dicomparserserializers.DicomSerializer;

    import

    cdac.medinfo.sdk.dcm30_04.dictionaryreaderwriters.DictionaryReader;

    import cdac.medinfo.sdk.dcm30_04.enums.EnumLoggingLevel;

    import cdac.medinfo.sdk.dcm30_04.enums.EnumValidationMode;

    import cdac.medinfo.sdk.dcm30_04.utils.DicomConfig;

    import cdac.medinfo.sdk.dcm30_04.utils.ICollectionIterator;

    public class Deidentifier {

    int name_group[] =

    {0x0070,0x4008,0x0040,0x0008,0x0008,0x0008,0x0040,0x0008,0x0010,0x0010,0x0010,

    0x0010,0x0040,0x0008,0x0040,0x0018,0x0008,0x300E,0x0040,0x0040,0x0008,0x0040

    };

    int name_element[] = {0x0084,0x0119,0x4037,0x0080,0x1040,0x1060,

    0x1010,0x1070,0x0001,0x1005,0x1060,0x0010,0x0242,0x1050,0xA123,0x1030,0x0090,

    20

  • 0x0008,0x0006,0x0010,0x1010,0xA075};

    int num_name = 22;

    DictionaryReader dicreader;

    IDataSet objIDataSet = null;

    IDataElement objIDataElement = null;

    ITransferSyntax objITransferSyntax = null;

    IDicomFileHeader objDicomFileHeader = null;

    DicomParser objDicomParser = null;

    public IDataSet deidentifyName(IDataSet dataset, boolean pass){

    dicreader = DictionaryReader.createInstance();

    int iterator;

    String value;

    objITransferSyntax = dataset.getTransferSyntax();

    //Im saving the dictionary reference in a variable called as

    strName

    for(iterator=0 ; iterator

  • value = (String) dataset.getDataForTag(iterator,

    iterator).getValue();

    System.out.println("Anonymized value is" + value);

    }

    }

    return dataset;

    }

    }

    22

  • Chapter 4

    Results and Conclusion

    Although adopting a straight-forward approach to DICOM De-identification, the

    application was able to identify the attribute tags to be anonymized, parse and

    extract the tag data from the supplied DICOM file, feed it with the dummy data,

    and serialize it into a new DICOM file. In this way, while maintaining the Value

    Representations of the DICOM File Structure and the Information-Object Defini-

    tions intact, the file follows a Part-10 DICOM compliance, as well as the legality is

    maintained as HIPAA has been conformed to. A PACS server can now be used to

    facilitate DICOM communication over the network, as well as act as a authenticator

    of Image Origin and a hospital-wide HIPAA compliance guarantor.

    De-identification, on one hand, should be able to remove all the risky elements,

    but the same time, should not remove important notes which can be detrimental

    for Biomedical Research. Such free-text notes in DICOM headers can be passed

    through an open-source anonymizer service such as MIST, which consists automatic

    de-identification to comply to HIPAA standards. MIST is a Machine-Learning tool,

    which needs to be manually-trained on a corpus of data. Then, a standard model

    is formed, which is then used for the de-identification. By conforming to HIPAA

    regulations, the application creates a data set, that is not only legal for research

    and analysis purposes, but also contains important information in its original form,

    which may be very important for Biomedical Research.

    23

  • Chapter 5

    Scope for Future Work

    De-identification of DICOM and HL7 data is only a one-step approach towards

    ensuring data privacy. Violations are still possible. Using a modality of security

    measures such as Digital Signature Profile, Secure Transport Connection Profile,

    Media Storage Security Profiles along with de-identification results in a wholesome

    approach where data sniffing over the communication network is also taken into

    account. The application currently does not implement DIMSE services for com-

    munication of the DICOM files, and which will be the next priority in terms of

    research. Moreover, as hospitals shift to Web-based services, WADO (Web Access

    to DICOM Objects) shall also come into play, and Part-10 DICOM compliance shall

    not serve the purpose. An issue with any irreversible data removal is possible loss

    of important information that you might need to use later. This might well be one

    of the major disadvantages of de-identification. Some alternative approaches such

    as an efficient method of selective anonymization or pseudonymization along with

    other DICOM Security Profiles can be used to ensure better Data Security. These

    issues will be addressed in our future work.

    24

  • References

    [1] Daniel Abouakil, Johannes Heurix, and Thomas Neubauer. Data models for

    the pseudonymization of dicom data. In System Sciences (HICSS), 2011 44th

    Hawaii International Conference on, pages 111. IEEE, 2011.

    [2] UC Berkeley Research Administration and Compliance. HIPAA PHI: List

    of 18 Identifiers and Definition of PHI, accessed December 31, 2014. http:

    //cphs.berkeley.edu/hipaa/hipaa18.html.

    [3] Medical Informatics Division Centre for Development of Advanced Comput-

    ing. DICOM SDK Overview, accessed December 31, 2014. http://cdac.in/

    index.aspx?id=hi_hs_medinfo_dicom_home.

    [4] Medical Informatics Division Centre for Development of Advanced Computing.

    HL7 SDK Overview, accessed December 31, 2014. http://cdac.in/index.

    aspx?id=hi_hs_medinfo_hl7_home.

    [5] eHealth Initiative et al. EHealth Initiatives Fifth Annual Survey of Health In-

    formation Exchange at the State and Local Levels: Overview of 2008 Findings.

    eHealth Initiative, 2008.

    [6] Frank R Ernst and Amy J Grizzle. Drug-related morbidity and mortality:

    updating the cost-of-illness model. Journal of the American Pharmaceutical

    Association (Washington, DC: 1996), 41(2):192199, 2000.

    [7] Ponemon Institute and Accenture. How global organizations approach the

    challenge of protecting personal data.

    25

  • [8] Michelle Keeney. Insider threat study: Computer system sabotage in critical

    infrastructure sectors. US Secret Service and CERT Coordination Center, 2005.

    [9] United States Department of Health and Human Services. Summary of the

    HIPAA Privacy Rule, accessed December 31, 2014. http://www.hhs.gov/

    ocr/privacy/.

    [10] American College of Radiology (ACR) and the National Electrical Manufac-

    turers Association (NEMA). DICOM : About Dicom, accessed December 30,

    2014. http://dicom.nema.org.

    [11] American College of Radiology (ACR) and the National Electrical Manufac-

    turers Association (NEMA). Clinical Trial De-identification Profiles, accessed

    January 2, 2014. http://webcache.googleusercontent.com/search?q=

    cache:Yu3TBMMrUmoJ:ftp://medical.nema.org/medical/dicom/Final/

    sup142_ft.pdf+&cd=1&hl=en&ct=clnk&gl=in&client=safari.

    [12] Paul G. Nagy Nabile M. Safdar Paras Lakhani, James Y. Chen. Protecting

    your patients privacy: Is your dicom anonymizer working for you? In Radio-

    logical Society of North America 2009 Scientific Assembly and Annual Meeting,

    November 29 - December 4,Chicago IL. Radiological Society of North America,

    2009.

    [13] Forrester Research. The value of corporate secrets: How compliance and col-

    laboration affect enterprise perceptions of risk.

    [14] Health Level Seven. Hl7 messaging standard version 2.5, 2003.

    26