Dataguise hortonworks insurance_feb25

77
Page 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Leveraging Big Data for Insurance Insights Without Putting PII/PHI at Risk February 25, 2016

Transcript of Dataguise hortonworks insurance_feb25

Page 1: Dataguise hortonworks insurance_feb25

Page 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Leveraging Big Data for Insurance Insights Without Putting PII/PHI at Risk

February 25, 2016

Page 2: Dataguise hortonworks insurance_feb25

Page 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Today’s Speakers

Syed Mahmood, Sr. Product Marketing Manager – Hortonworks [email protected]

Cindy Maike, GM-Insurance Hortonworks [email protected]

Venkat Subramanian, CTO and VP of Engineering – Dataguise [email protected]

Page 3: Dataguise hortonworks insurance_feb25

Page 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Is Sensitive Data (PII/PHI) a challenge for your company’s analytics & big data programs? A. Yes B. No

Page 4: Dataguise hortonworks insurance_feb25

Page 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

If Yes, do you have capabilities in place to manage sensitive data discovery, protection and audit? A. Yes B. No

Page 5: Dataguise hortonworks insurance_feb25

Page 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Big Data Business Insights Insurance Opportunities

Data Privacy Protection Requirements •  Regulatory •  Customer Expectations

Page 6: Dataguise hortonworks insurance_feb25

Page 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

The Insurance Data Landscape has Changed Dramatically

Customer centric / need based Insurance Offerings

500GB data per annual vehicle in UBI programs

Drones will make the workflow efficient by 2020

Digital becoming consumer / Insured preferred interaction channel

Growing availability & usage of geospatial data

Change in Claim frequency & severity, fraud anomaly analytics

Page 7: Dataguise hortonworks insurance_feb25

Page 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Industry Opportunity

High-performance analytics, or a combination of structured and unstructured data, is changing

the ways of the insurance industry after decades of conservatism.

Page 8: Dataguise hortonworks insurance_feb25

Page 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

View of Insurance Industry Data Landscape B

atch

R

eal-t

ime

Dat

a ve

loci

ty

Structured Unstructured Data variety

Semi-structured

Weather-event Drone image feeds

Social media Sensor (GoT)

Geo-location

Deposition recording

Notes and diary

Medical records & bills

Transcriptions

Photos

Investigation

TPA invoices

FNOL intake

Claims triage Vendor invoices

Forms and letters

Claim system

Policy verification

Applications/Submissions

3rd party risk models

Prior loss runs

Page 9: Dataguise hortonworks insurance_feb25

Page 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

New Opportunities – Security Challenges Use Cases & Opportunities

Data Sources (examples)

New Security Challenges

Know Your Customer Application documents, clickstream and web logs, marketing research, CRM records, and social media

•  Coverage for multiple file types and sources

•  Critical detection to find and measure sensitivity risk

Claims Optimization & Fraud Detection

Policy records, claims databases, receipts, accident reports, emails, and transcriptions

•  Reduce or eliminate PCI scope for Hadoop

•  Detect new sensitivity risks in hard-to-reach unstructured data

Evaluate Risk / New Products

Mobile telematics, sensor data, social media, and voice-to-text files

•  High scale •  Large sets of small files •  Detection and protection of

unstructured data Traditional Documents & Attachments

Claims data, insured prior loss data, and claims adjuster notes

•  Masking of sensitive data for data sharing

•  Sensitive data auditing Third-party Data Sharing Reporting bureaus, third-party claims

administrators (TPAs), telematics service providers (TSPs)

•  Tiered access — highly granular roles with differing needs/views for sensitive data

Page 10: Dataguise hortonworks insurance_feb25

Hortonworks + Dataguise = SECURE BUSINESS EXECUTION

CTO, DATAGUISE

VENKAT SUBRAMANIAN

Page 11: Dataguise hortonworks insurance_feb25

Dataguise  enables  Secure  Business  Execu3on  for  data-­‐driven  enterprises  

by  delivering  data-­‐centric  security  solu3ons  that  Detect,  Audit,  Protect  and  Monitor  

sensi3ve  data  assets  where  they  are  wherever  they  move  

across  repositories.  

©2015  Contains  confiden3al  and  proprietary  informa3on  and  may  not  be  disclosed  by  the  recipient  to  any  third  

party.  11  

Page 12: Dataguise hortonworks insurance_feb25

©2015  Dataguise,  Inc.      Confiden3al  and  Proprietary  

Secure Business Execution

The ability of an Enterprise to safely and responsibly leverage the value of all of their data assets

for the purpose of gaining new business insights,

maximizing competitive advantage, and driving revenue growth

12  

Page 13: Dataguise hortonworks insurance_feb25

©2015  Dataguise,  Inc.      Confiden3al  and  Proprietary  

Business  Intelligence  Trend  for  2016  Shi8  from  

IT-­‐led,  System-­‐of-­‐record  repor>ng    

Pervasive,  Business-­‐led,  self-­‐service  analy>cs    

•  Easy-­‐to-­‐use,  fast,  agile  BI  &  Analy>cs  •  Deeper  Insights  into  diverse  data  sources    **  Rita  Sallam,  Gartner  

13  

Page 14: Dataguise hortonworks insurance_feb25

©2015  Dataguise,  Inc.      Confiden3al  and  Proprietary  

       

Data  is  your  biggest  Asset    

It  is  also  your  biggest  Vulnerability  

14  

Page 15: Dataguise hortonworks insurance_feb25

©2016  Dataguise,  Inc.      Confiden3al  and  Proprietary  

DgSecure

15  

DETECT  Where  sensi3ve  content  is  present  in  struct/unstruct/  semi-­‐struct  data  

AUDIT  Who  has  access  to  which  sensi3ve  data  &  iden3fy  misalignments  and  risk  factors  

PROTECT  Sensi3ve  data  at  the  element  level–encrypt/decrypt  with  RBAC,  mask  

MONITOR  Based  on  metadata,  track  how  and  where  sensi3ve  data  is  being  accessed  through  a  360°  dashboard  

Across  Hadoop,  RDBMS,  Files,  NoSQL  DB  

On  Premise,  in  the  Cloud,  or  Hybrid  

Page 16: Dataguise hortonworks insurance_feb25

PHI: Guidance for Data De-Identification Sensitive/Privacy Data

16  

•  Name •  Address •  Dates – Birth, Death, .. •  Telephone Numbers •  Device Identifiers and serial numbers •  Email addresses •  SSN •  Medical record numbers •  Account Numbers …..

Page 17: Dataguise hortonworks insurance_feb25

Secure Environment Perimeter Security, Volume/File encryption

17  

•  I have strong perimeter security Physical Security, Firewall, IDS/IPS… Isn’t that enough?

•  I  have  turned  on  volume/file-­‐level  encryp>on  

 Control  data  access      Mee>ng  regulatory  compliance    Isn’t  this  enough?  

Need  BOTH  and  *more!  

Page 18: Dataguise hortonworks insurance_feb25

What Should We Do?  

18  

1.  Precisely locate sensitive content across ALL repositories 2.  Protect those assets appropriately – masking, encryption 3.  Open up ‘controlled’ access to data now that sensitive elements are

protected 4.  Enable employees, trusted partners and customers to make data-driven

decisions RISKS    BREACH    SECURITY    COMPLIANCE  

VALUE    REVENUE    DATA  DRIVEN  DECISIONS    BUSINESS  INTELLIGENCE  

At the cell-level…

Page 19: Dataguise hortonworks insurance_feb25

©2015  Dataguise,  Inc.      Confiden3al  and  Proprietary  

How do we do it in DgSecure

19  

Page 20: Dataguise hortonworks insurance_feb25

Complex Sensitive Data Discovery

20  

Sensitive Data Type Sample Data

Address 50920 April Blvd. Apt. 181, Lalana ME 83271 1000 Coney Island Ave. Brooklyn NY 11230

Name George Smith Smith, A. George

Credit Card Number 3710 664089 10315 345039502030507 3780-331072-30547

Telephone Number (510) 824-1036 510-824-1036 510.814.1036 5108141036

Page 21: Dataguise hortonworks insurance_feb25

Sensitive Data Protection Masking & Encryption in Hadoop

21  

•  MASKING –  Obfuscation, one-way operation –  Multiple options in DgSecure – fictitious but realistic values, X’ing out part of the

content…. –  Consistent masking to retain statistical distribution of data

•  ENCRYPTION –  Encrypted cell/row –  Accessible by authorized users only – Hive, bulk, via App –  Granular protection

•  REDACTION –  X’ing out entire sensitive data cell –  Nullifying

Page 22: Dataguise hortonworks insurance_feb25

Masking Data in Hadoop (Cell Level)

22  

Page 23: Dataguise hortonworks insurance_feb25

Masking Data in Hadoop (Cell Level)

©2015  Contains  confiden3al  and  proprietary  informa3on  and  may  not  be  disclosed  by  the  recipient  to  any  third  

party.  23  

Page 24: Dataguise hortonworks insurance_feb25

Masking Data in Hadoop (Cell Level)

©2015  Contains  confiden3al  and  proprietary  informa3on  and  may  not  be  disclosed  by  the  recipient  to  any  third  

party.  24  

Page 25: Dataguise hortonworks insurance_feb25

Encrypting Data in Hadoop (Cell Level)

25  

Page 26: Dataguise hortonworks insurance_feb25

Encrypting Data in Hadoop (Cell Level)

26    26  

Page 27: Dataguise hortonworks insurance_feb25

©2016  Dataguise,  Inc.      Confiden3al  and  Proprietary  

Decryption through hive queries

27  

User  WITHOUT  access  privileges  on  Names  &  SSN  

Page 28: Dataguise hortonworks insurance_feb25

©2016  Dataguise,  Inc.      Confiden3al  and  Proprietary  

Decryption through hive queries

28  

User  WITH  access  privileges  on  Names  &  SSN  

Page 29: Dataguise hortonworks insurance_feb25

Encryption or Masking in Hadoop        Analy3c  

             Transac3onal  

     

Trading  System  Perf.  

Customer  reten3on  

Payments  Risk  Mgmt.  

IT  Security  Intelligence  

IP  Addresses  

Name  

Personal  Health  Info  

Credit  Card  Number  

Dynamic  pricing  

Process  efficiency  

Log  analysis  

Insurance  Premiums  

Clinical  trial  analysis  

Smart  metering  

Risk  Modeling  

Supply  chain  op3miza3on  

Brand  sen3ment  

Real-­‐3me  upsell  

Monitoring  Sensors  

Social  Security  Number  

Date  of  Birth  (DOB)  

IP  Address  

URL  

Email  Address  

Telephone  Number  

Credit  limit  

Purchase  amount  

Customer  life3me  value  

Address  

Device  ID  

Transac3on  Date  

VIN  

Person  of  Interest  Discovery  

Session  Op3miza3on  

Page 30: Dataguise hortonworks insurance_feb25

Encryption or Masking in Hadoop        Analy3c  

             Transac3onal  

     

Trading  System  Perf.  

Customer  reten3on  

Payments  Risk  Mgmt.  

IT  Security  Intelligence  

Medical  test  results  

Name  

Personal  Health  Info  

Credit  Card  Number  

Dynamic  pricing  

Process  efficiency  

Log  analysis  

Insurance  Premiums  

Clinical  trial  analysis  

Smart  metering  

Risk  Modeling  

Supply  chain  op3miza3on  

Brand  sen3ment  

Real-­‐3me  upsell  

Monitoring  Sensors  

Social  Security  Number  

Date  of  Birth  (DOB)  

IP  Address  

URL  

Email  Address  

Telephone  Number  

Credit  limit  

Purchase  amount  

Customer  life3me  value  

Address  

Mask  

Encrypt  Device  ID  

Transac3on  Date  

VIN  

Person  of  Interest  Discovery  

Session  Op3miza3on  

Page 31: Dataguise hortonworks insurance_feb25

Encryption or Masking in Hadoop        Analy3c  

             Transac3onal  

     

Trading  System  Perf.  

Customer  reten3on  

Payments  Risk  Mgmt.  

IT  Security  Intelligence  

Biometric  IDs  

Name  

Personal  Health  Info  

Credit  Card  Number  

Dynamic  pricing  

Process  efficiency  

Log  analysis  

Insurance  Premiums  

Clinical  trial  analysis  

Smart  metering  

Risk  Modeling  

Supply  chain  op3miza3on  

Brand  sen3ment  

Real-­‐3me  upsell  

Monitoring  Sensors  

Social  Security  Number  

Date  of  Birth  (DOB)  

IP  Address  

URL  

Email  Address  

Telephone  Number  

Credit  limit  

Purchase  amount  

Customer  life3me  value  

Address  

Mask  

Encrypt  Device  ID  

Transac3on  Date  

VIN  

Person  of  Interest  Discovery  

Session  Op3miza3on  

Page 32: Dataguise hortonworks insurance_feb25

Encryption or Masking in Hadoop        Analy3c  

             Transac3onal  

     

Trading  System  Perf.  

Customer  reten3on  

Payments  Risk  Mgmt.  

IT  Security  Intelligence  

Dynamic  pricing  

Process  efficiency  

Log  analysis  

Insurance  Premiums  

Clinical  trial  analysis  

Smart  metering  

Risk  Modeling  

Supply  chain  op3miza3on  

Brand  sen3ment  

Real-­‐3me  upsell  

Monitoring  Sensors  

Person  of  Interest  Discovery  

Session  Op3miza3on  

Medical  test  results  

Name  

Personal  Health  Info  

Credit  Card  Number  

Social  Security  Number  

Date  of  Birth  (DOB)  

IP  Address  

URL  

Email  Address  

Telephone  Number  

Credit  limit  

Purchase  amount  

Customer  life3me  value  

Address  

Mask  

Device  ID  

Transac3on  Date  

VIN  

Number  

Encrypt  

Page 33: Dataguise hortonworks insurance_feb25

Encryption or Masking in Hadoop        Analy3c  

             Transac3onal  

     

Trading  System  Perf.  

Customer  reten3on  

Payments  Risk  Mgmt.  

IT  Security  Intelligence  

Medical  test  results  

Name  

Personal  Health  Info  

Credit  Card  Number  

Dynamic  pricing  

Process  efficiency  

Log  analysis  

Insurance  Premiums  

Clinical  trial  analysis  

Smart  metering  

Risk  Modeling  

Supply  chain  op3miza3on  

Brand  sen3ment  

Real-­‐3me  upsell  

Monitoring  Sensors  

Social  Security  Number  

Date  of  Birth  (DOB)  

IP  Address  

URL  

Email  Address  

Telephone  Number  

Credit  limit  

Purchase  amount  

Customer  life3me  value  

Address  

Mask  

Encrypt  Device  ID  

Transac3on  Date  

VIN  

Person  of  Interest  Discovery  

Session  Op3miza3on  

Page 34: Dataguise hortonworks insurance_feb25

©2016  Dataguise,  Inc.      Confiden3al  and  Proprietary   34  

     

How  does  this  work  in  DgSecure  

Page 35: Dataguise hortonworks insurance_feb25

©2016  Dataguise,  Inc.      Confiden3al  and  Proprietary  

HIGH-LEVEL DgSECURE FOR HADOOP FUNCTIONALITY

35  

Policy  Management  Domain  Defini3on  custom  Elements      -­‐    Composite      -­‐    Dependent  Policy      -­‐    Per  Data  Feed?  Protec3on  Op3ons    

Detec3on  

In-­‐Flight  Within  HDFS  Full  vs.  Incremental  Structured  vs.  Semi/Unstructured  

Quick  scan  Element  Count  

Audi3ng  

Files/Dirs  -­‐    Sensi3ve  elements  -­‐      Protected?  -­‐    Who  has  access      

Users  -­‐  What  can  they  see  

Protec3on  Domain  based  Masking  Redac3on  Encryp3on    -­‐  Field  or  Record  

 -­‐  AES  or  FPE    

Repor3ng  Job  Level      -­‐    Sensi3ve  elements      -­‐    Directories  &  Files      -­‐    Remedia3on  applied  

Dashboard      -­‐  Directory  or    by  policy        -­‐  Drill-­‐down  

Audit  report    -­‐  User  ac3ons  No3fica3ons    

Page 36: Dataguise hortonworks insurance_feb25

Set  Policy

©2015  Contains  confiden3al  and  proprietary  informa3on  and  may  not  be  disclosed  by  the  recipient  to  any  third  

party.  36  

Page 37: Dataguise hortonworks insurance_feb25

Data  Elements

©2015  Contains  confiden3al  and  proprietary  informa3on  and  may  not  be  disclosed  by  the  recipient  to  any  third  

party.  37  

Page 38: Dataguise hortonworks insurance_feb25

Define/Execute  Detec>on/Protec>on  Task

©2015  Contains  confiden3al  and  proprietary  informa3on  and  may  not  be  disclosed  by  the  recipient  to  any  third  

party.  38  

Page 39: Dataguise hortonworks insurance_feb25

Discovery  Task  Result

©2015  Contains  confiden3al  and  proprietary  informa3on  and  may  not  be  disclosed  by  the  recipient  to  any  third  

party.  39  

Page 40: Dataguise hortonworks insurance_feb25

MaskingTask  Result

©2015  Contains  confiden3al  and  proprietary  informa3on  and  may  not  be  disclosed  by  the  recipient  to  any  third  

party.  40  

Page 41: Dataguise hortonworks insurance_feb25

Masking  Task  Result  

41  

Page 42: Dataguise hortonworks insurance_feb25

Dashboard

©2015  Contains  confiden3al  and  proprietary  informa3on  and  may  not  be  disclosed  by  the  recipient  to  any  third  

party.  42  

Page 43: Dataguise hortonworks insurance_feb25

Entitlement Reports

©2015  Contains  confiden3al  and  proprietary  informa3on  and  may  not  be  disclosed  by  the  recipient  to  any  third  

party.  43  

Page 44: Dataguise hortonworks insurance_feb25

Audit Reports

©2015  Contains  confiden3al  and  proprietary  informa3on  and  may  not  be  disclosed  by  the  recipient  to  any  third  

party.  44  

Page 45: Dataguise hortonworks insurance_feb25

©2016  Dataguise,  Inc.      Confiden3al  and  Proprietary   45  

       

Sample  Secure  Business  Workflow    in  an  Enterprise  

Page 46: Dataguise hortonworks insurance_feb25

Sample  End  to  End  Flow  

46  

Page 47: Dataguise hortonworks insurance_feb25

Sample  End  to  End  Flow  

47  

CISO/CPO:  Set  policy  per  data  feed  

type  

Page 48: Dataguise hortonworks insurance_feb25

Sample  End  to  End  Flow  

48  

Data  Asset  Owner:  Provenance  metadata  

Page 49: Dataguise hortonworks insurance_feb25

Sample  End  to  End  Flow  

49  

IT/Set  Process:  Run  Discovery  to  detect  

sensi3ve  data  Metadata  to  repository  

(Atlas)  

Page 50: Dataguise hortonworks insurance_feb25

Sample  End  to  End  Flow  

50  

IT/Set  Process:  Use  Metadata  to  set  access  

control  in  Ranger  

Page 51: Dataguise hortonworks insurance_feb25

Sample  End  to  End  Flow  

51  

Run  Masking/Encr  to  protect  sensi3ve  data  

Metadata  incl.  lineage  to  repository  (Atlas)  

Page 52: Dataguise hortonworks insurance_feb25

Sample  End  to  End  Flow  

52  

IT/Set  Process:  Use  Metadata  to  set  access  

control  in  Ranger  

Page 53: Dataguise hortonworks insurance_feb25

Sample  End  to  End  Flow  

53  

Data  Asset  owner  adds  annota3ons  &  adds  to  Data  

Asset  Index  

Page 54: Dataguise hortonworks insurance_feb25

Sample  End  to  End  Flow  

54  

Data  Scien3st  browses  available  data  sets  and  makes  access  request  

Page 55: Dataguise hortonworks insurance_feb25

Sample  End  to  End  Flow  

55  

Data  owner  approves  request  

Sets  access  control  in  Ranger  

Page 56: Dataguise hortonworks insurance_feb25

Sample  End  to  End  Flow  

56  

Data  Scien3st  runs  data  mining/BI/Analy3cs  

Page 57: Dataguise hortonworks insurance_feb25

Sample  End  to  End  Flow  

57  

Page 58: Dataguise hortonworks insurance_feb25

Page 58 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger and Knox: Building on the Vision of Comprehensive Security Syed Mahmood

Page 59: Dataguise hortonworks insurance_feb25

Page 59 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Security Challenges of Data Lake

Central repository of critical and sensitive data

Data maintained over long duration

External ecosystem is in flux

Users can access and analyze data in new

and different ways

Page 60: Dataguise hortonworks insurance_feb25

Page 60 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

How do I set policy across the entire cluster?

Who am I/prove it?

What can I do?

What did I do?

How can I encrypt at rest and over the wire?

Differentiator 1: Comprehensive Approach to Security

Data Protection

Protect data at rest and in motion

In order to protect any data system you must implement the following:

Audit

Maintain a record of data access

Authorization

Provision access to data

Authentication

Authenticate users and systems

Administration

Central management and consistent security

Page 61: Dataguise hortonworks insurance_feb25

Page 61 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDP Security: Comprehensive, Complete, Extensible

Data Protection

Protect data at rest and in motion

Security in HDP is the most comprehensive, complete and extensible for Hadoop

Audit

Maintain a record of data access

Authorization

Provision access to data

Authentication

Authenticate users and systems

Administration

Central management and consistent security

Single administrative console to set policy across the entire cluster: Apache Ranger

Authentication for perimeter and cluster; integrates with existing Active Directory and LDAP solutions: Kerberos | Apache Knox

Consistent authorization controls across all Apache components within HDP: Apache Ranger

Record of data access events across all components that is consistent and accessible: Apache Ranger | Apache Atlas

Encrypts data in motion and data at rest; refer partner encryption solutions for broader needs: HDFS TDE with Ranger KMS

Page 62: Dataguise hortonworks insurance_feb25

Page 62 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

   YARN : Data Operating System

DATA ACCESS SECURITY GOVERNANCE & INTEGRATION OPERATIONS

1   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°  

N  

Data Lifecycle & Governance Falcon Atlas

Administration Authentication Authorization Auditing Data Protection Ranger Knox Atlas HDFS Encryption

Data Workflow Sqoop Flume Kafka NFS WebHDFS

Provisioning, Managing, & Monitoring Ambari Cloudbreak Zookeeper

Scheduling Oozie

Batch

MapReduce

Script

Pig

Search

Solr

SQL

Hive

NoSQL

HBase Accumulo Phoenix

Stream

Storm

In-memory

Spark

Others

ISV Engines

Tez Tez Tez Slider Slider

HDFS Hadoop Distributed File System

DATA MANAGEMENT

Hortonworks Data Platform 2.3

Deployment  Choice  Linux Windows On-Premise Cloud

Differentiator 2: Security Built into the Platform

Page 63: Dataguise hortonworks insurance_feb25

Page 63 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Security Built into the Platform

Security is consistently administered across data

access engines

Build or retire applications

without impacting security

   YARN : Data Operating System

DATA ACCESS SECURITY GOVERNANCE & INTEGRATION OPERATIONS

1   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°  

N  

Data Lifecycle & Governance Falcon Atlas

Administration Authentication Authorization Auditing Data Protection Ranger Knox Atlas HDFS Encryption

Data Workflow Sqoop Flume Kafka NFS WebHDFS

Provisioning, Managing, & Monitoring Ambari Cloudbreak Zookeeper

Scheduling Oozie

Batch

MapReduce

Script

Pig

Search

Solr

SQL

Hive

NoSQL

HBase Accumulo Phoenix

Stream

Storm

In-memory

Spark

Others

ISV Engines

Tez Tez Tez Slider Slider

HDFS Hadoop Distributed File System

DATA MANAGEMENT

Hortonworks Data Platform 2.3

Deployment  Choice  Linux Windows On-Premise Cloud

Page 64: Dataguise hortonworks insurance_feb25

Page 64 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Security in Hadoop with HDP

•  Wire encryption in

Hadoop •  HDFS Encryption

with Ranger KMS

•  Centralized audit

reporting with Apache Ranger

•  Fine-grain access

control with Apache Ranger

Authorization What can I do?

Audit What did I do?

Data Protection Can data be encrypted at rest and over the wire?

•  Kerberos •  API security with

Apache Knox

Authentication Who am I/prove it?

HD

P 2.

3

Centralized Security Administration with Ranger

Page 65: Dataguise hortonworks insurance_feb25

Page 65 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Ranger Comprehensive security for Enterprise Hadoop

Page 66: Dataguise hortonworks insurance_feb25

Page 66 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Centralized Security with Ranger

Centralized platform

•  Centralized platform to define, administer and manage security policies consistently

•  Define security policy once and apply it to all the applicable components across the stack

Page 67: Dataguise hortonworks insurance_feb25

Page 67 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Page 68: Dataguise hortonworks insurance_feb25

Page 68 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Centralized Security with Ranger

Centralized platform

•  Administer security for: –  Database

–  Table

–  Column

–  LDAP Groups

–  Specific Users

Fine-grained security definition

•  Centralized platform to define, administer and manage security policies consistently

•  Define security policy once and apply it to all the applicable components across the stack

Page 69: Dataguise hortonworks insurance_feb25

Page 69 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Page 70: Dataguise hortonworks insurance_feb25

Page 70 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Centralized Security with Ranger

•  Administrators have complete visibility into the security administration process

Deep visibility Centralized platform

•  Administer security for: –  Database

–  Table

–  Column

–  LDAP Groups

–  Specific Users

Fine-grained security definition

•  Centralized platform to define, administer and manage security policies consistently

•  Define security policy once and apply it to all the applicable components across the stack

Page 71: Dataguise hortonworks insurance_feb25

Page 71 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Page 72: Dataguise hortonworks insurance_feb25

Page 72 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Authorization and Auditing with Ranger

HDFS

Ranger Administration Portal

HBase

Hive Server2

Ranger Audit Server

Ranger Plugin

Had

oop

Com

pone

nts

Ent

erpr

ise

Use

rs

Ranger Plugin

Ranger Plugin

Legacy Tools and Data Governance

HDFS

Knox

Storm

Ranger Plugin

Ranger Plugin

RDBMS

Solr Ranger Plugin

Ranger Policy Server

Future Additions

Currently Supported in HDP 2.2

Integration API

Kafka Ranger Plugin

YARN Ranger Plugin

TBD Ranger Plugin

Page 73: Dataguise hortonworks insurance_feb25

Page 73 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Atlas is Now Included in HDP

Apache Atlas

Knowledge Store

Audit Store

Models Type-System

Policy Rules Taxonomies

Tag Based Policies

Data Lifecycle Management

Real Time Tag Based Access Control

REST API

Services Search Lineage Exchange

Healthcare

HIPAA HL7

Financial

SOX Dodd-Frank

Energy

PPDM

Retail

PCI PII

Other

CWM

Rest API Modern, flexible access to Atlas services, HDP components and external tools

Search—SQL, like DSL (Domain Specific Language) Support for key word, faceted and full text searches

Lineage Capture all SQL runtime activity on HiveServer2 providing lineage for both data and schema

Exchange Leverage existing metadata by importing it from ETL tools, ERP systems and data warehouses Export metadata to downstream systems

Page 74: Dataguise hortonworks insurance_feb25

Page 74 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Atlas Vision 2015

Metadata Services

Business Taxonomy - classification Operational Data – Model for Hive: DB, Tables, Col,

Centralized location for all metadata inside HDP Single Interface point for Metadata Exchange with platforms outside of HDP. Search & Prescriptive Lineage – Model and Audit

Apache Atlas

Hiv

e

Ran

ger

Falc

on

Kaf

ka

Stor

m

Page 75: Dataguise hortonworks insurance_feb25

© Hortonworks Inc. 2015. All Rights Reserved

The Insurance Data Landscape has Changed u  The insurance industry is joining and analyzing data which has never

been analyzed before

u  Many of these sources can be “murky” and sensitive

u  Traditional PII/PHI data sources ingested into Hadoop needs to be: •  Discovered

•  Protected

Ø  Protecting PII/PHI data is not an option for Insurers, TPAs and Brokers…. it is a Requirement

Summary

Page 76: Dataguise hortonworks insurance_feb25

Page 76 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Questions ?

Page 77: Dataguise hortonworks insurance_feb25

Page 77 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Call to Action

Additional Information : q Data Protection Optimized for Insurance Big Data – A Dataguise and

Hortonworks Capability Overview

q Hortonworks: Comprehensive Security in Hadoop – Solving Security in Hadoop Whitepaper

q Hortonworks: Building Governance into Big Data – Whitepaper