Dr. Bhavani Thuraisingham The University of Texas at Dallas (UTD) February 2013 Cloud Computing...

26
Dr. Bhavani Thuraisingham The University of Texas at Dallas (UTD) February 2013 Cloud Computing for Assured Information Sharing

description

Outline Objectives Assured Information Sharing Layered Framework Our Research Education Acknowledgement: – Research Funded by Air Force Office of Scientific Research – Education funded by the National Science Foundation

Transcript of Dr. Bhavani Thuraisingham The University of Texas at Dallas (UTD) February 2013 Cloud Computing...

Dr. Bhavani ThuraisinghamThe University of Texas at Dallas (UTD)

February 2013

Cloud Computing for Assured Information

Sharing

Team Members• Sponsor: Air Force Office of Scientific Research• The University of Texas at Dallas

– Dr. Murat Kantarcioglu; Dr. Latifur Khan; Dr. Kevin Hamlen; Dr. Zhiqiang Lin, Dr. Kamil Sarac

• Sub-contractors– Prof. Elisa Bertino (Purdue)– Ms. Anita Miller, Late Dr. Bob Johnson (North Texas Fusion Center)

• Collaborators– Late Dr. Steve Barker, Kings College, U of London (EOARD)– Dr. Barbara Carminati; Dr. Elena Ferrari, U of Insubria (EOARD)

Outline• Objectives• Assured Information Sharing• Layered Framework• Our Research• Education• Acknowledgement:

– Research Funded by Air Force Office of Scientific Research– Education funded by the National Science Foundation

Objectives• Cloud computing is an example of computing in which dynamically scalable

and often virtualized resources are provided as a service over the Internet. Users need not have knowledge of, expertise in, or control over the technology infrastructure in the "cloud" that supports them.

• Our research on Cloud Computing is based on Hadoop, MapReduce, Xen• Apache Hadoop is a Java software framework that supports data intensive

distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers.

• XEN is a Virtual Machine Monitor developed at the University of Cambridge, England

• Our goal is to build a secure cloud infrastructure for assured information sharing applications

Information OperInformation Operaations Across Infospheres: tions Across Infospheres: Assured Information SharingAssured Information Sharing

Scientific/Technical ApproachConduct experiments as to how much information is lost

as a result of enforcing security policies in the case of trustworthy partners

Develop more sophisticated policies based on role-based and usage control based access control models

Develop techniques based on game theoretical strategies to handle partners who are semi-trustworthy

Develop data mining techniques to carry out defensive and offensive information operations

Accomplishments Developed an experimental system for determining

information loss due to security policy enforcement Developed a strategy for applying game theory for semi-

trustworthy partners; simulation results Developed data mining techniques for conducting

defensive operations for untrustworthy partners

Challenges Handling dynamically changing trust levels; Scalability

ObjectivesDevelop a Framework for Secure and Timely Data Sharing

across InfospheresInvestigate Access Control and Usage Control policies for

Secure Data SharingDevelop innovative techniques for extracting information

from trustworthy, semi-trustworthy and untrustworthy partners

Budget FY06-8: AFOSR $300K, State Match. $150K

ComponentData/Policy for Agency A

Data/Policy for Coalition

Publish Data/Policy

ComponentData/Policy for Agency C

ComponentData/Policy for Agency B

Publish Data/Policy

Publish Data/Policy

Our Approach • Integrate the Medicaid claims data and mine the data; next

enforce policies and determine how much information has been lost (Trustworthy partners); Prototype system; Application of Semantic web technologies

• Apply game theory and probing to extract information from semi-trustworthy partners

• Conduct Active Defence and determine the actions of an untrustworthy partner

– Defend ourselves from our partners using data mining techniques– Conduct active defence – find our what our partners are doing by

monitoring them so that we can defend our selves from dynamic situations

Coalition

Policy Enforcement PrototypeDr. Mamoun Awad (postdoc) and students

05/14/23 8

Layered Framework for Assured Cloud Computing

User Interface

Hadoop/MapReduc/Storage

HIVE/SPARQL/Query

XEN/Linux/VMM

Secure Virtual Network Monitor

PoliciesXACML

Risks/Costs

QoSResource Allocation

Cloud Monitors

Figure2. Layered Framework for Assured Cloud

Secure Query Processing with Hadoop/MapReduce

• We have studied clouds based on Hadoop• Query rewriting and optimization techniques designed and

implemented for two types of data• (i) Relational data: Secure query processing with HIVE• (ii) RDF data: Secure query processing with SPARQL• Demonstrated with XACML policies• Joint demonstration with Kings College and University of Insubria

– First demo (2011): Each party submits their data and policies– Our cloud will manage the data and policies – Second demo (2012): Multiple clouds

Fine-grained Access Control with HiveSystem Architecture

Table/View definition and loading, Users can create tables as well as

load data into tables. Further, they can also upload XACML policies

for the table they are creating. Users can also create XACML

policies for tables/views. Users can define views only if

they have permissions for all tables specified in the query used to create the view. They can also either specify or create XACML policies for the views they are defining.

CollaborateCom 2010

ServerBackend

SPARQL Query Optimizer for Secure RDF Data Processing

Web Interface

Data Preprocessor

N-Triples Converter

Prefix Generator

Predicate Based Splitter

Predicate Object Based Splitter

MapReduce Framework

Parser

Query Validator & Rewriter

XACML PDP

Plan Generator

Plan Executor

Query Rewriter By Policy

New Data Query AnswerTo build an efficient storage mechanism using Hadoop for large amounts of data (e.g. a billion triples); build an efficient query mechanism for data stored in Hadoop; Integrate with Jena

Developed a query optimizer and query rewriting techniques for RDF Data with XACML policies and implemented on top of JENA

IEEE Transactions on Knowledge and Data Engineering, 2011

Demonstration: Concept of Operation

User Interface Layer

Fine-grained Access Control with Hive

SPARQL Query Optimizer for Secure RDF Data

Processing

Relational Data RDF Data

Agency 1 Agency 2 Agency n

RDF-Based Policy Engine

Policies

Ontologies

Rules

In RDF

JENA RDF EngineRDF Documents

Inference Engine/Rules Processore.g., Pellet

Interface to the Semantic WebTechnologyBy UTDallas

RDF-based Policy Engine on the Cloud

Policy Transformation

Layer

Result Query

DB

DB

RDF

Policy Parser Layer

Regular Expression-Query

Translator Data Controller Provenance Controller

. . . RDF

XML

Policy / Graph Transformation Rules

Access Control/ Redaction Policy (Traditional Mechanism)

User Interface Layer

High Level Specification Policy

Translator

A testbed for evaluating different policy sets over different data representation. Also supporting provenance as directed graph and viewing policy outcomes graphically

Determine how access is granted to a resource as well as how a document is shared

User specify policy: e.g., Access Control, Redaction, Released Policy

Parse a high-level policy to a low-level representation

Support Graph operations and visualization. Policy executed as graph operations

Execute policies as SPARQL queries over large RDF graphs on Hadoop

Support for policies over Traditional data and its provenance

IFIP Data and Applications Security, 2010, ACM SACMAT 2011

Integration with Assured Information Sharing:

User Interface Layer

RDF Data Preprocessor

Policy Translation and Transformation Layer

MapReduce Framework for Query Processing

Hadoop HDFS

Agency 1 Agency 2 Agency n

RDF Data and Policies

SPARQL Query

Result

Architecture

Policy Engine

Provenance

Agency 1

Agency 2 Agency n

User Interface Layer

Connection Interface

RDF GraphPolicy Request

RDF Graph: ModelRDF Query: SPARQL

RDBMS Connection: DB

Connection: Cloud

Connection: Text

Cloud-based Store

Local

Access Control

Combined

Redaction

Policy n-2

Policy n-1 Access Control

Combined

Redaction

Policy n

Key Feature 1: Policy Reciprocity Agency 1 wishes to share its resources if

Agency 2 also shares its resources with it

Use our Combined policies Allow agents to define policies based on reciprocity and mutual interest amongst

cooperating agencies

SPARQL query:SELECT B FROM NAMED uri1 FROM NAMED uri2WHERE P

Key Feature 2: Develop and Scale PoliciesAgency 1 wishes to extend its existing policies

with support for constructing policies at a finer granularity.

The Policy engine

– Policy interface that should be implemented by all policies

– Add newer types of policies as needed

Key Feature 3: Justification of Resources

Agency 1 asks Agency 2 for a justification of resource R2

• Policy engine – Allows agents to define policies over provenance– Agency 2 can provide the provenance to Agency 1

• But protect it by using access control or redaction policies

Key Feature 4: Development Testbed Policy framework provides three configurations

– A standalone version for development and testing; – A version backed by a relational database– A cloud-based version

• achieves high availability and scalability while maintaining low setup and operation costs

Secure Storage and Query Processing in a Hybrid Cloud

• The use of hybrid clouds is an emerging trend in cloud computing– Ability to exploit public resources for high throughput– Yet, better able to control costs and data privacy

• Several key challenges– Data Design: how to store data in a hybrid cloud?

• Solution must account for data representation used (unencrypted/encrypted), public cloud monetary costs and query workload characteristics

– Query Processing: how to execute a query over a hybrid cloud?• Solution must provide query rewrite rules that ensure the

correctness of a generated query plan over the hybrid cloud

Hypervisor integrity and forensics in the Cloud

Cloud integrity & forensics

Hardware Layer

Virtualization Layer (Xen, vSphere)

Linux Solaris XP MacOS

Secure control flow of hypervisor code Integrity via in-lined reference monitor

Forensics data extraction in the cloudMultiple VMsDe-mapping (isolate) each VM memory from physical memory

Hypervisor

OS

Applications

integrityforensics

Cloud-based Malware Detection

Benign

Buffer

Feature extraction and selection using

Cloud

Training & Model update

Unknown executable

Feature extraction

Classify

ClassMalware

Remove Keep

Stream of known malware or benign executables

Ensemble of Classification

models

A Cloud Map-reduce framework is used to extract and select features from each chunkA 10-node cloud cluster is 10 times faster than a single nodeVery effective in a dynamic framework, where malware characteristics change rapidly

Identity Management Considerations in a Cloud

• Trust model that handles – (i) Various trust relationships, (ii) access control policies based on roles and

attributes, iii) real-time provisioning, (iv) authorization, and (v) auditing and accountability.

• Several technologies have to be examined to develop the trust model – Service-oriented technologies; standards such as SAML and XACML; and

identity management technologies such as OpenID.

• Does one size fit all? – Can we develop a trust model that will be applicable to all types of clouds

such as private clouds, public clouds and hybrid clouds Identity architecture has to be integrated into the cloud architecture.

Education• NSF Capacity Building Grant on Assured Cloud Computing

– Introduce cloud computing into several cyber security courses• Completed courses

– Data and Applications Security– Data Storage– Digital Forensics– Secure Web Services– Computer and Information Security

– Capstone Course• One course that covers all aspects of assured cloud computing

– Week long course to be given at Texas Southern University

Directions• Secure VMM (Virtual Machine Monitor) and VNM (Virtual

Network Monitor)– Exploring XEN VMM and examining security issues– Developing automated techniques for VMM introspection– Will examine VMM issues January 2012

• Integrate Secure Storage Algorithms into Hadoop• Identity Management• Social network systems on the Cloud (e.g., Use Storm

framework)