CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and...

52
HDAP: A Breakthrough in Directory Technology Bringing Together LDAP, Context, and Big Data

description

Michel Prompt, Chairman & CEO, Radiant Logic There's a sea of change coming in terms of scaling identity and access management. This session will look at what's next in directory technology, scalability and possibility.

Transcript of CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and...

Page 1: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

HDAP:

A Breakthrough in Directory Technology Bringing Together LDAP, Context, and Big Data

Page 2: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

• What Is HDAP?

• Why HDAP?• Why even LDAP?

• Evaluating the models for structured data

• Hierarchical model and LDAP

• The requirements/ drivers for more scalability• Using Identity and Context Virtualization to build a Federated Identity Service (FID)

• Why FID is essential

• Powering a new use case: Contextual Search

• How HDAP works/ Performance.

What We’ll Cover Today

Page 3: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

What is HDAP?

Page 4: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

• This highly-available version of LDAP offers better performance and

increased scalability.

• Now, you may be thinking:

• LDAP is already very fast and scalable.

• And who needs LDAP anyway? Shouldn’t we do as Ian Glazer says, and

“kill IdM in order to save it”?

• But HDAP goes beyond LDAP, delivering much more and doing it all

much faster.

A Next-Gen LDAP Directory Driven by

Hadoop and Search Technology

7/15/2013 4

Page 5: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Why HDAP?W

Why HDAP?

Page 6: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

• Identity remains essential to IT because people are often the center

of activities.

• While there are multiple use cases, one of the key functions of

identity is to act as an integration point.

• As such, identity management is at the center of application

integration.

• We need a way to store identities and their attributes, but is LDAP

still relevant?

• Do we really need a hierarchical system, when the world is moving

toward these models?

• Path

• Graph

• Directed Graph

• Relational

To Bring New Life to the Heart of IT:

People and What They Do

Page 7: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Roadmap:

The Role of Identity and Context Virtualization

in the Technology Food Chain

Company Confidential

Page 8: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Are the Hierarchies of LDAP Still

Necessary?

• The Protocol

• The Schema

• The Storage: Hierarchy

• Searching and Navigation: Traversing the Tree

• Searching by Attributes

• Navigation: One level or sub-tree. There are not many ways to navigate

a tree:• First, you enumerate the children.

• Then you reiterate for each child node.

• So you either believe that a hierarchical system is sufficient, or you don’t.

• The storage

Page 9: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

The World of Data

Structured

(SQL)Unstructured

(Search)

Page 10: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Relational

Structured Data: The Three Models and

Their Respective Installed Bases

Network/Graph

Graph

Database

Hierarchical

Database

SQL

Database

Page 11: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

• These three models are similar in terms of what you can represent

with them. But they are optimized for different functions.

• Relational (SQL) is the most ubiquitous for good reasons:

• The most complete model and extremely flexible

• ACID properties make it great for capturing and updating information,

and it’s optimized for non-redundant write

• But it’s also slow to navigate and perform ad-hoc query and search

• Graphs and hierarchies belong to the same family; after all, trees

are “DAG” or “directed acrylic graphs:

• Slow for write and update (NO ACID properties in general)

• Fast in navigation and ad hoc query and search

The Three Models

Page 12: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Object/Entity, Attribute, Value/Keyword

Attribute 1 Attribute 3Attribute 2

Keyword/Value Keyword/Value Keyword/Value

Attribute 4

Keyword/Value Keyword/Value Keyword/Value

Page 13: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Object, Relationship, Data Model

Object

Relationship

Page 14: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Network Data Model

Page 15: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Hierarchical Data Model

1

2

3

1

2

3

Page 16: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Relational Data Model (ERM, ORM, & UML)

Tables/Entities/Object & Relations

Page 17: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

From Graph to Functions to E/R

Page 18: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

From E/R to Semantic Model

Verb

Verb

Verb

Subject Object

Page 19: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

How The Models Stack Up

Relational

Graph/Hierarchy

FasterSlower

Slower

Faster

Write

Update

Query

Search

Navigation/Traversal

Page 20: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

SQL is the Workhorse for Modern

Data Management

Data Management

ETLMDM/CDI

Data Warehouse

Analytics/BISearch

Big DataSQL

IntegrationUnstructured Data

Page 21: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

LDAP is Key to Identity Management

Identity Management

(ETL)

Sync engine

Provisioning

MDM

Metadirectory

Analytics/SIEMSearch

Big Data

(along with

Web Services

and SQL)

Integration

LDAP

Virtualization

Page 22: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Why Should Identity Management be

Separate from the Rest of the Chain?

Identity Management

ETLMDM/CDI

Data Warehouse

Analytics/BISearch

Big Data (SIEM)

Directory

Web Services

SQL

Integration

Page 23: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Identity and Context Virtualization Process

Page 24: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Foundation for an Identity Service:

Building a Global Virtual Identifier

and Global Virtual Registry

Page 25: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Solution:

Building a Global List with No Duplicates

Page 26: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Link Identity to Context, Regrouping Objects into

Sentences and Sentences into Contexts

Page 27: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Solution: Gather Attributes and Join Them

to Build a Virtualized Global Profile

Page 28: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

• A system made of two parts

• Integration layer based on virtualization

• Storage layer (Persistent Cache)• LDAP (up to R1 V 6.1)

• HDAP (based on Hadoop/Lucene/Solr, V 7.0)

Integration and Cache/Storage Layer

Page 29: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Why We Need a Federated Identity

That’s Based on Virtualization and

Stored in HDAP Directories

Page 30: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

The World of Access Keeps Expanding

App sourcing and hosting

User

populationsApp access

channels

SasS apps

Apps in public clouds

Partner apps

Apps in private clouds

On-premise enterprise apps

Enterprise computers

Enterprise-issued devices

Public computers

Personal devices

Employees

Contractors

Customers

Partners

Members

Page 31: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

The Challenges of implementing an Enterprise IdP:

How to Handle Different Internal Security Domains?

Federation

Cloud Apps

IdP

Authentication and SSO

Enterprise Identity

Data Sources? ??

Imp

lem

en

tation

Page 32: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

A Federated Identity Hub Manages Authentication

and Attributes to Support the IdP

ADForest/Domain A

ADForest/Domain B Databases

Internal

Enterprise

Apps

Directories

Federation

Cloud Apps

Identity

Sources

IdP

Page 33: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Federated Identity Service and Provisioning

Legacy Applications(and respective stores)

AD Sun LDAP

Cloud Apps

LDAP/

SQL/

SPML

FIDas reference store

SPML

SCIM

Internal

SystemsExternal

Systems

Page 34: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Virtual View Based on Org Chart

Top Manager

Full

Management

Hierarchy

Page 35: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Virtual View Based on Location

CountryState

City

Page 36: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Virtual View Based on Role, Location,

and Territory

RoleLocation

Territory

Page 37: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

New Use Case: Contextual Search

Page 38: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Company Confidential

Webster’s Definition of “Context”

Latin Contextus: a joining together, origin pp of contexere “to weave

together.”

1.The parts of a sentence, paragraph, discourse immediately next

to or surrounding a specified word or passage and determining

its exact meaning [to quote a remark out of context] (Language

Representation)

2.The whole situation, background, or environment relevant to a

particular event, personality, creation, etc…(Perception)

Page 39: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Company Confidential

Trees as a Representation of Sentences

Page 40: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Company Confidential

Trees as a Way to Represent Sentences

and Context

Page 41: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Searching for HDAP on Google

Page 42: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Diving into one sentence from the

contextual search result

Page 43: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Navigating the different sentences returned in the

context search:

Account the Great Outdoors purchased Order 21

Page 44: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Navigating sentences returned in the search:

SalesRep Nancy Davolio has account The Great

Outdoors

Page 45: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

HDAP:

RadiantOne High-Availability LDAP Based on Lucene/ZooKeeper

(Sub-components of Hadoop)

Page 46: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

• An LDAP directory is a hierarchical database with this architecture:

• A set of entries, indexed by a main index: the directory tree

• A set of indexes to support attribute search (one per attribute).

• The core technology over the last 10 years was to implement the tree as

a set of B-tree indexes. B-trees can scale to 100’s of millions of entries.

Current Implementation of LDAP Servers

is Based on B-Tree Indexation

Entries

B Tree

Page 47: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

From Lucene to Hadoop to ZooKeeper

• Hadoop is an offshoot of the Lucene/Nutch project, aimed at

creating an open source search engine.

• Lucene is the search and index part of the search engine.

• Hadoop is the distributed storage (HDFS) and compute

(Map/Reduce batch-oriented) engine, offering very sizable

throughput on a large cluster of commoditized servers.

• There are many components and sub-projects that came out of the

Hadoop project.

• ZooKeeper is a low-level component for managing configuration and

replication for a large number of nodes in a Hadoop cluster.

Page 48: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

Millions of

Entries

Millions of

Users

Node management

LDAP Front-End

Components(BER encoding etc…...)

Distributed

Configuration ManagerAdd Node, Define new

leader, SWAP in and

SWAP out dynamically.

Scale OutAdd more VDS for faster

queries and more

documents

Replication

(Leader/Followers)Add more replicas

(followers) for better

throughput (queries/sec)

and fault toleranceHard commit

(Flushed to

disk)

configures

Manage

Configuration

and State

Per Node

We are getting

60000 LDAP q/sec

before VDS,

30000q/sec after

VDS

LDAP Front End

functions)

One Core per JVM

Java Web App

VDS CoreLDAP Processing

add/update/del

LDAP

Query Processing

and Caching

Schema

etc….xml

<fields>

<types>

VDS Config

Distributed VDS + Lucene Index on each node

Soft commit

(in memory)

Near Real-Time

Replica n

Follower

replica1

cluster of commodity

servers

Zookeeper

For VDS

LDAP and Other

Protocols: Front-End

XML/JSON/HTTP

Indexing Queries

Leader Follower

Page 49: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

• HDAP (VDS + Lucene)/10M entries

• 1 node: 30k/sec

2 nodes: 65k/sec

3 nodes: 95k/sec

4 nodes: 130k/sec

5 nodes: 149k/sec

• Google daily average load: 3 million q/minute or 50,000 q/sec

Initial Performance Tests (LDAP Search)

0

20000

40000

60000

80000

100000

120000

140000

160000

1 2 3 4 5

Series1

Series2

Page 50: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

The Architecture of the

RadiantOne Federated Identity Service:

• Acting as an abstraction layer between applications and the underlying identity

silos, virtualization isolates applications from the complexity of backends.

Aggre

gation

Co

rre

latio

n

Inte

gra

tion

Virtualization by model

Population

C

Population

B

Population

A

Groups Roles

LDAP

SQL

Web

Services

/SOA

App A

App B

App C

App D

App E

App F

Contexts

Se

rvic

es

REST

Page 51: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

• An LDAP directory is a hierarchical database with this architecture:

• A set of entries, indexed by a main index: the directory tree

• A set of indexes to support attribute search (one per attribute).

• The core technology over the last 10 years was to implement the tree as

a set of B-tree indexes. B-trees can scale to 100’s of millions of entries.

Current Implementation of LDAP Servers

is Based on B-Tree Indexation

Entries

B Tree

Page 52: CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

• Everything is automatically indexed in HDAP so you can search the

directory the same way you search Google…

• An inverted tree is not necessarily balanced; you could have some

paths that are very shallow, while some are very deep.

HDAP Uses a Key/Value System Based on

Search Technology: Inverted Tree

Inverted Tree