Iod 2013 Jackman Schwenger

39
Scientific Research with DBaaS on IBM PureApplication System & PureData System for Transactions IPT – 1961A Tom Jackman, DRI Maria N. Schwenger, IBM Vikram Khatri, IBM © 2013 IBM Corporation

description

Scientific Research with DBaaS on IBM PureApplication System & PureData System for Transactions

Transcript of Iod 2013 Jackman Schwenger

Page 1: Iod 2013 Jackman Schwenger

Scientific Research with DBaaS on IBM PureApplication System & PureData System for TransactionsIPT – 1961ATom Jackman, DRIMaria N. Schwenger, IBMVikram Khatri, IBM

© 2013 IBM Corporation

Page 2: Iod 2013 Jackman Schwenger

Please noteIBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.

Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.

The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

Page 3: Iod 2013 Jackman Schwenger

Acknowledgements and DisclaimersAvailability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates.

The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.

All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.

© Copyright IBM Corporation 2012. All rights reserved.• U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract

with IBM Corp.

IBM, the IBM logo, ibm.com, WebSphere, DB2, PureSystems, PureData and PureApplication System are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBMtrademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtmlOther company, product, or service names may be trademarks or service marks of others.

Page 4: Iod 2013 Jackman Schwenger

Assumptions What we expect you to know

• You have a good understanding of cloud computing concepts

• You have a reasonable working level knowledge of Relational database designs, principles, architectureo Some knowledge of DB2 database and its features (i.e. DB2

HADR, DB2 pureScale, etc.)

• You are familiar with the IBM PureSystems family o You are aware of the value of pattern based deployments in the

IBM PureSystems

• Application architecture knowledge preferred, but not essential

• Knowledge of DBaaS principles is highly appreciated!

Page 5: Iod 2013 Jackman Schwenger

AgendaWhat this presentation is all about?

• The Nature of Scientific Data o One client’s perspective

o Scientific Data (SD) vs Business Data (BD)

o High reliability and availability for SD management

• DataBase-as-a-Service (DBaaS)o Why DBaaS and why now?

o Scientific research and DBaaS

o DBaaS in PureSystems

Page 6: Iod 2013 Jackman Schwenger

Non-profit research arm of the Nevada System of Higher Education More than 550 scientists, engineers and technicians Campuses in Reno and Las Vegas 60 specialized labs & research facilities (e.g., Virtual Reality lab)

Non-tenured, entrepreneurial faculty 300 research projects happening on all continents $459 million in sponsored research projects since 2000

About Desert Research Institute (DRI) Applied research addressing environmental issues globally

Page 7: Iod 2013 Jackman Schwenger

The Story Emergence of innovation-based economy Disruption by knowledge-based technology Non-traditional science institute (DRI) adapting Academia-Government-Industry partnerships Catalyzing change with IBM Pure Systems New science, new engineering, new model

7

Government

Academia Industry

empowering, responsive, fiscally prudent

diffusive, relevant, sustainable differentiated, competitive, profitable

Society

Cooperating on shared values: innovation clustering

Page 8: Iod 2013 Jackman Schwenger

8

● High Performance Computing

● Data Science & Engineering

● Cyber-physical Systems

● Advanced Visualization

DATAacquiring, computing, processing, archiving, correlating, visualizing, exploring, analyzing, mining, …

Applied Innovation Center for Advanced AnalyticsSupporting Nevada’s Economic Development with Innovation Services

Page 9: Iod 2013 Jackman Schwenger

Why is Scientific Data Important to You?

• SD has the characteristics of Big Data• SD is your facilities data• Your BD will become more like SD• To remain competitive, you need research data• SD is relevant to your region/planet/solar system/galaxy/universe

ByBob Violino, New IDC Research shows Impact of Big Data on High Performance Computing Systems: October 28, 2013

Gary M. Johnson, Convergence: HPC, Big Data & Enterprise Computing, October 28, 2013 |

Page 10: Iod 2013 Jackman Schwenger

The Evolution of Scientific Investigation

Ancient Greece Observation

Renaissance –Enlightenment Observation Experimentation

Industrial Revolution –Atomic Age

Observation Experimentation Theory

Electronics Age Observation Experimentation Theory Computation

Data and Communications Age

Observation Experimentation Theory Computation Telemetry

Page 11: Iod 2013 Jackman Schwenger

SD Management

Structured, semi-structured or unstructuredHeterogeneous (sources, units, types, dimensions)Reliance on arrays and other complex data structuresLarge data objects; sensitive to I/O & network performance Distributed data repositoriesRepositories are open, or notDatasets are cleansed, and notMany protocols, too few (persistent) standards

Increasing need for rigorous data provenance

Page 12: Iod 2013 Jackman Schwenger

SD is Heterogeneous

Popular Formats HDF5 netCDF SEG-Y FITS Shapefile XML 3DXML JSON

Structures raster vector point relational human-derived

documents lab notes social

Atomic Types * #

array image table tuple string reference

* Structures can be composed of type float, double, integer, fixed-point, categorical, binary, string

# Data may be noisy and have associated uncertainties

Page 13: Iod 2013 Jackman Schwenger

Sources of SD In Situ sensing

Remote sensing

Computed/Simulated

Machine-derived

Human-derived (text, media)

o Active o Aircrafto Passive o Orbital craft (satellite)

o Sensor arrays o Smart meterso RFID o Surveillance

o Forecasts o Hydro modelso Earth models o Brain simulations

o Seismograms o Gene sequencerso Tomograms o Accelerators

ADC

Tx

RAMNVM

I/O DAC

RxμP

ROM sensorsensor

sensorsensorsensor

Sensor

actuatoractuator

Actuator~

Page 14: Iod 2013 Jackman Schwenger

Patterns of SD Database Design Design 0: File based approaches

Ad hoc management system lacking high availability Design 1: RDBMS

Data is relational or can be made relational Design 2: Metadata in RDBMS

Only metadata abstraction is kept in relational database Design 3: Metadata in RDBMS with file pointers

Metadata is kept in relational database File pointers to non-relational data also included in RDBMS

Design 4: ETL subsets into a working RDBMS Spatially register, temporally synchronize, and coherently fuse

data extractions for use in a “working” database Design 5: NoSQL DBMS’s

Page 15: Iod 2013 Jackman Schwenger

Accessing Applications for SDSD access patterns:•Large and bursty•Coupled to data analysis applications

o Data integrationo Feature extraction, segmentationo Interpolation, regression, krigingo Correlation

− ~O(N2) complexityo Pattern discovery

− naively, ~O(N4) complexityo Classification,

Access to software applications and hardware processors needs to be part of the design

APPDataWhere are each of these located?

APPData

Full Service Cloudminimal data movementnetwork

Page 16: Iod 2013 Jackman Schwenger

Jim Gray’s Rules for Database-centric Computing

1. Scientific computing is increasingly data intensive2. The applications need a scale-out architecture3. Bring computations to data, rather than the other

way4. Design the database environment around 20

queries5. Be agile, be modular, design for change

Page 17: Iod 2013 Jackman Schwenger

Examples of SD Databases Sloan Digital Sky Survey (SDSS)

o Public data resource with JHU as lead institutiono 1) 5 band photometric, 2) redshift surveyso 5 Tpx images, 120 TB processed, 35 TB catalogo Rich application portfolio

1000 Genomes Projecto Part of the Bionimbus scientific cloud

(Note ~0.5 TB/genome, ~1 TB/patient)o Inst. for Genomics & Systems Biology at UChicagoo Human diversity project using Next Gen Sequencing (NGS)

http://www.sdss.org

Both SDSS and 1000 Genomes are member projects in the Open Science Data Cloud (OSDC).

Page 18: Iod 2013 Jackman Schwenger

Structured,Repeatable,Linear

Cloud-based, High-Availability, Distributed SD

The Contextual EnterpriseScientificV

Data•Transaction•Client app•OLTP

Data•Sensor•RFID•Text

Content Accumulation

and Integration

Unstructured,Exploratory,Dynamic

Adapted from IBM GTO 2013

DataWarehouse

Hadoop &Streams

Page 19: Iod 2013 Jackman Schwenger

In Summary SD is similar to Big Data – heterogeneous, multi-contextual There is no uniform infrastructure in science

Solutions must be flexible and generally interoperable SD needs BD reliability and accessibility SD access is not generally transactional

More typically involves large data extractions for analysis There are alternative approaches to reliable SD management RDBMS can be a practical approach to reliable SD access when

coupled with application delivery As businesses embrace Big Data, they face similar challenges

What is DBaaS for science?Why DBaaS for science?

How can DBaaS for science be implemented?

Page 20: Iod 2013 Jackman Schwenger

Why DBaaS for scientific research?Optimization & integration for delivering higher values

• Scientific research is mainly based on HPC practices

o Often deals with unstructured data & file based processing

o Traditionally has not embraced high-availability, business solutions

o Capital cost and funding are significant issues

• Scientific research just starts to adopt RDBMS processing (where feasible)

o Process less and only relevant data, producing results faster

o Improved consumability - forced to integrate with other (i.e. commercial, portal) applications to deliver the value

Data Collection

Data Integration

Data Analytics

Data Presentation

Today, the scientific research starts to rethink its participation and possible new collaboration in the different phases of data lifecycle:

Page 21: Iod 2013 Jackman Schwenger

File vs. data driven processing

VM 2

VM 1

DB2

VM 3

VM N

TXT 1

DB2

VM 1

DB2

VM 1

DB2

VM 1

DB2

VM 1

DB2

VM 1

DB2

VM 1

DB2

VM 1

DB2

VM 1

DB2

File based processing

Files loaded into PureData

Single call to the database (parallelism)

Only relevant data set is retuned to the user

Parallel or sequential (!!!)file reads

TB Size

GB Size

MB Size

Page 22: Iod 2013 Jackman Schwenger

What is Database as a Service (DBaaS)?On PureSystems family (private cloud) Delivery of Database functionally as a Service

Defines the architectural and operational approaches of a new service-oriented delivery

Often defined as “Database in a Cloud”

Characteristics of DBaaS architecture: Self-service interaction models to reduce complexity of database

service delivery - on-demand usage, rapid self-provisioning and management of database instances

Multi-tenancy capabilities Elasticity of workloads Multiple levels of high availability Automated resource management and monitoring Metering of database usage (to allow a charge-back functionality)

Page 23: Iod 2013 Jackman Schwenger

Why DBaaS? Why now?The 4 Vs: Volume, Variety, Velocity, Veracity• Database sprawl and infrastructure growth is overwhelming

o With the growth of data, database infrastructure management has become hugely expensive, complicated and introduced many risks

• Self service technology is needed o Today we need “IT on demand” for fast business response while keep up

with compliance, less risk, and proper security

• Cost savings from virtualization & smart IaaS are “a must”o Database needs/volumes grow while IT budgets are shrinking

• Data driven business decisions are the only way to goo The business wants the data delivered faster, simpler and more reliable

• Cost-effectively scaling the data layer o Companies are looking to replace the traditional expensive

database/infrastructure model for scaling an enterprise level of SLAs

Page 24: Iod 2013 Jackman Schwenger

New Technical Concepts in DBaaS

• DB Instance: A live database instance

• DB Image: Similar to a HV/VM image, but for databaseso Database backup includes the meta data to reconstitute a deployment

• DB Clone: The act of creating a DB instance from a DB image

• DB Pattern: A saved set of provisioning parameters to encourage standardization on the application group side

• Workload Standard: A package that allows a level of customization for a DB under the virtual application or DB2 Service for Cloud

o Allows configuration of the OS, DB2 instance, DB2 database

o Linked with a workload such as OLTP, Datamart, etc.

• DBaaS: Defines the architectural and operational approaches of a new service-oriented delivery of database functionally (as a service)

Page 25: Iod 2013 Jackman Schwenger

New operational approaches in DBaaS

• Single click provisioning of databases from patterns• Linked with a workload such as OLTP, Data mart, etc.

• Database can be provisioned via cloning (from backup)

• The database might be a part of application pattern

• A database might be provisioned from another system - Integration between PureApplciation and PureData system for transactions

o Use a Workload Standard to enforce your best practices

• Logs and monitoring are available straight in the consoleo Use context links to navigate for troubleshooting, management and

monitoring

• New considerations on upgrades – system and workload upgrades

• Use of command line – only when feasible

Page 26: Iod 2013 Jackman Schwenger

Where is the database? A Maximo deployment from pattern

Page 27: Iod 2013 Jackman Schwenger

Workloads standards and database patterns

Single click database deployment

Page 28: Iod 2013 Jackman Schwenger

DB2 HADR pattern in Virtual System on PureApplciation System

Match versionsMatch versions

Match editionsMatch editions

Page 29: Iod 2013 Jackman Schwenger

Deploy PureData database as part of application pattern from PureApplication

New option added when PureData is registered

Page 30: Iod 2013 Jackman Schwenger

Manage Logging (Database Service Console)

Database Service Console

OS logs

DB2 logs

Agent logs

Bring cursor on file – arrow link

will pop up –click to

download log file

Page 31: Iod 2013 Jackman Schwenger

Pre-integrated DB2 MonitoringSee detailed DB2 metrics from the Workload Console

Launches a new browser Tab/window in context to Database Overview page.

Page 32: Iod 2013 Jackman Schwenger

Further Drill Down: Detailed DB2 metrics

Can drill-down & focus on “popular“ problems• Inflight Database Memory Dashboard• Inflight Rogue Query Dashboard• Inflight I/O Dashboard• Inflight Locking Dashboard• Inflight Logging Dashboard• Inflight Utilities Dashboard• Inflight Throughput Dashboard

Page 33: Iod 2013 Jackman Schwenger

IBM PureSystems & DBaaSThe ideal Platform as a Service (PaaS) for databases

• DBaaS provides a deep built-in integration of application and database server capabilities in a simple, but powerful combination intended to simplify the way applications and databases are designed, deployed, run and managed.

• DBaaS offers a single-click pattern based development and deployment via IBM provided database patterns and workloads thatspeeds up the deployment of new applications and databases and enforces creating of reusable assets for consistent enterprise interactions.

• The capabilities to create custom patterns and workloads provide optimized way of establishing and enforce enterprise standards.

• The pattern based management simplifies the database development and deployment while the inbuilt best practices allow to obtain optimized deployments right out of the box.

• DBaaS provides a simplified way of database development even for complex task like creating of high availability and disaster recovery (HADR) or DB2 cluster setups.

Page 34: Iod 2013 Jackman Schwenger

What is new in DBaaS on PureApplication System DBaaS 1.1.0.8 - Sept 2013

• Added support for DB2 v10.5 (AKA Kepler) and DB2 BLU (for data mart)

o IBM DB2 for BLU Acceleration Pattern was added

• Added HADR for OLTP (HA in same rack with auto failover) (not related to HADR in vSys)

• Increased max VM size to 16 cores and 2TB disk

• Allow manual scaling up for existing DBaaS VM (CPU/Memory/Disk)

• DB2 versions available on IPAS:

o a choice of DB2 10.5.0.1 (DB2 10.5 FP1)

o a choice of DB2 10.1.0.2 (DB2 10.1 FP2)

o a choice of DB2 9.7.0.8 (DB2 9.7 FP8)

NOTE: DBaaS 1.1.0.8 is available separately on Fix Central (9/26/13) from where it can be downloaded and imported as needed

Page 35: Iod 2013 Jackman Schwenger

1) Explore the value the SD might provide to your business

• SD is Big Data

• The scientific research is motivated to collaborate more than ever

• The PureSystems family provides an easy way for collaboration

Two key takeaways How DBaaS applies to your business?

2) Explore the values of DBaaS for your organization

• Rapid transformation in data delivery is required by the businesses today and is touching every side of our society

o Even more conservative environments like scientific research have to adapt to the new requirements to stay relevant

• IBM PureSystems provide an ideal platform in enabling the efficiency of database provisioning and management

• Use the patterns of expertise

o They deliver real value in time and resources savings for applications and databases alike.

• Embrace the change DBaaS brings to you and your organization

o Simplicity means automation, less risk, more reliable and cost effective data delivery for your business

Page 36: Iod 2013 Jackman Schwenger

Thank YouYour feedback is important!

• Access the Conference Agenda Builder to complete your session surveys

o Any web or mobile browser at http://iod13surveys.com/surveys.html

o Any Agenda Builder kiosk onsite

Thomas JackmanDRI/AICTechnical Lead for Analysis & [email protected]

Questions?

Maria Nichole SchwengerIBMPureSystems Technical Specialist

[email protected]

Page 37: Iod 2013 Jackman Schwenger

Learn More about IBM Cloud

ibm.com/cloud

twitter.com/ibmcloud

youtube.com/ibmcloud

Online

Business Leadership ForumsConnected Car is Mobile, Social, Cloud, Big Data – Tues, 10-11 a.m. in S. Pacific ISocial, Mobile, Analytics, Cloud, and Beyond for the Automotive Industry --Tues, 4:30-5:45 p.m. in S. Pacific B

Technology Forums

Forty unique Cloud Sessions across 72 time slots – check your event guide for details!

Visit the EXPO Cloud Sessions

Cloud Booth

SoftLayer Booth

Connected Car

Page 38: Iod 2013 Jackman Schwenger

Backup Slides

Page 39: Iod 2013 Jackman Schwenger

DB2 deployment options in PureApplication system

Virtual systems using DB2 hypervisor-edition images

Provides patterns for common topologies

Ability to create custom patterns

Traditional configuration and administration model

Automated provisioning of images into patterns

DBaaS (Database-as-a-Service) using Database Patterns (virtual applications)

Patterns are solutions derived from standardized industry best practices

Simplified interaction model

Highly standardized and automated

Integrated life cycle management

Shared between users/teams

Connections to existing remote or existing local databases - option for both Virtual Applciations and Virtual systems