DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.

20
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer

Transcript of DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.

DATABASE MANAGEMENT SYSTEMS

IN DATA INTENSIVE ENVIRONMENNTS

Leon Guzenda

Chief Technology Officer

DMW2004 3/16/04Copyright Objectivity, Inc. 2004

AGENDA

• Introduction

• Issues and Approaches

• Summary & Resources

Objectivity, Inc. & Objectivity/DB

DMW2004 3/16/04Copyright Objectivity, Inc. 2004

Objectivity Corporate InformationObject Database Management for:

• Data intensive applications that manipulate complex data • High throughput systems• Very large volumes of data

Main Markets

• Government

• Scientific

• Telecommunications

• Engineering

• Manufacturing

• Complex IT

Product Highlights

• High Performance with complex data

• Scalability and High Availability

• Fully Distributed

• Interoperability

- C++, Java, Smalltalk, SQL and XML

- Linux, LynxOS, Unix and Windows

• Productivity

- Eclipse IDE

- Eliminates the object to DB mapping layer

DMW2004 3/16/04Copyright Objectivity, Inc. 2004

SCALABILITY

• Data Volume - 890 Terabytes [BaBar]

• Throughput – Ingested 32 Terabytes per Day [Benchmark]

In a recent benchmark with Objectivity/DB running on 64 Irix processors (600 MHz), CXFS and a 100 Terabyte SAN we achieved:

• An ingest rate of 32 Terabytes per day (input, correlate and commit)• Simultaneous queries from 32 processors running at near to 100% CPU capacity• Simultaneous movement and deletion of aged data to a long term repository

• Simultaneous Users – 100s of Thousands [SprintPCS]

Issues and Approaches

DMW2004 3/16/04Copyright Objectivity, Inc. 2004

ISSUES

• Describing complex data

• Exponentially increasing data volumes

• Sharing data across sites

• Querying huge datasets

• Cost of Ownership

DMW2004 3/16/04Copyright Objectivity, Inc. 2004

DESCRIBING COMPLEX DATA Approaches:

• Old Way- Definitions buried in header files

- Language-specific schema language (DDL/SQL)

• Current Approaches- Unified Modeling Language [UML]

- XML

• Trends- Java Database Objects [JDO]

- Grid Database Access and Integration Services

- Higher level schemas and ONTOLOGIES

DMW2004 3/16/04Copyright Objectivity, Inc. 2004

DATA VOLUMES Approaches:

• Old Way- Keep data in compressed files and index them in a DBMS- Proprietary tape archives

• Current Approaches- Store everything in an ODBMS (lower overheads than an RDBMS)- Hierarchical storage systems (HPSS etc.)

• Trends- Solid State Disks at the front end, commodity disks at the back end- Heterogeneous Storage Area Networks [SAN], e.g. CXFS- Fiber Optic processor-to-SAN switches- Grid enablement (totally distributed archives)

DMW2004 3/16/04Copyright Objectivity, Inc. 2004

SHARING DATA ACROSS SITES Approaches:

• Old Way- Transfer files/disks/tapes- Filesystem or no security

• Current Approaches- Distributed databases and the World Wide Web- High bandwidth networks- Authentication and secure transport layers

• Trends- Grid enablement- Federated databases- Ultra-high bandwidth networks and remote replication- Flexible, localized security mechanisms

DMW2004 3/16/04Copyright Objectivity, Inc. 2004

Distributed Federations

A2

Replica of A

A Organization X

Organization Y

User X1

User X2

User X3

User Y1

A3

Replica

of A

DMW2004 3/16/04Copyright Objectivity, Inc. 2004

Distributed Federations

A2

Replica of A

A Organization X

Organization Y

User X1 Mobile and Detached

User X2

User X3

User Y1

A3 Replica of A

DMW2004 3/16/04Copyright Objectivity, Inc. 2004

QUERYING HUGE DATASETS Approaches:

• Old Way- Hold metadata (indexes and relationships) in a searchable file

• Current Approaches- Hold metadata in a RDBMS and data in files

- Hold metadata and data in an ODBMS

• Trends- Adaptations of text search engines

- Distributed Parallel Query Engines

- Specialized search accelerators

DMW2004 3/16/04Copyright Objectivity, Inc. 2004

Current ArchitectureQueries run synchronously within the client

Networking & Event Managers

Storage & Transaction Managers

Query & Index Managers

Object & Schema Managers

Language Interfaces

APPLICATIONDBA ToolsLock Server

Lock Server

Data “Page” Server

Mass Storage

Data “Page” Server

DMW2004 3/16/04Copyright Objectivity, Inc. 2004

Parallel Query Engine [PQE]Queries run asynchronously and in parallel, either locally or distributed

Networking & Event Managers

Storage & Transaction Managers

Query & Index Managers

Object & Schema Managers

Language Interfaces

APPLICATIONDBA Tools

Lock ServerLock Server

Data “Page” ServersPQE

DMW2004 3/16/04Copyright Objectivity, Inc. 2004

PQE and Search AcceleratorQueries run asynchronously and in parallel, but with Predicate Management within the Search Accelerator

Networking & Event Managers

Storage & Transaction Managers

Query Manager

Object & Schema Managers

Language Interfaces

APPLICATIONDBA Tools

Lock ServerLock Server

Data ServersPQE

FPGA & RAM

Search Accelerator

DMW2004 3/16/04Copyright Objectivity, Inc. 2004

COST OF OWNERSHIP Approaches:

• Old Way- Build It Yourself (many hidden costs)

- Run It Yourself

• Current Approaches- Use Commercial Off The Shelf [COTS] software

- Open Source

- Commodity hardware & tiered storage

• Trends- Heterogeneous storage

- Grid Enablement

- Resource and Skill Brokers (Future)

SUMMARY

DMW2004 3/16/04Copyright Objectivity, Inc. 2004

SUMMARY • Database languages are still evolving• Data throughput and system latency times are decreasing• Sharing data across sites still presents many challenges• Querying vast datasets will become faster and cheaper• Software vendors are wrestling with Open Source issues• Startup costs are still high, but the trends are downward• Grid enablement will help• Keep working on the Standards!

DMW2004 3/16/04Copyright Objectivity, Inc. 2004

RESOURCES

• http://www.objectivity.com• Technical Overview

• Data Sheets and White Papers

• Free downloadable Java and C++ evaluation software and tutorials

• Global Grid Forum• http://www.ggf.org

• Email: [email protected]

ANY QUESTIONS?