Greenplum feature

31
1 © Copyright 2012 EMC Corporation. All rights reserved. Greenplum Database Overview Michael Crutcher Greenplum Product Management

description

 

Transcript of Greenplum feature

Page 1: Greenplum feature

1 © Copyright 2012 EMC Corporation. All rights reserved.

Greenplum Database Overview

Michael Crutcher Greenplum Product Management

Page 2: Greenplum feature

2 © Copyright 2012 EMC Corporation. All rights reserved.

Page 3: Greenplum feature

3 © Copyright 2012 EMC Corporation. All rights reserved.

Page 4: Greenplum feature

4 © Copyright 2012 EMC Corporation. All rights reserved.

Page 5: Greenplum feature

5 © Copyright 2012 EMC Corporation. All rights reserved.

Greenplum Unified Analytic Platform

Page 6: Greenplum feature

6 © Copyright 2012 EMC Corporation. All rights reserved.

GREENPLUM DATABASE

Industry Leading Database with

Massively Parallel Performance

To Empower your Analytics

Page 7: Greenplum feature

7 © Copyright 2012 EMC Corporation. All rights reserved.

Extreme Performance for Analytics

Optimized for BI and analytics

– Deep integration with statistical packages

– High performance parallel implementations

• Simple and automatic

– Just load and query like any database

– Tables are automatically distributed across nodes

• Extremely scalable

– MPP shared-nothing architecture

– All nodes can scan and process in parallel

– Linear scalability by adding nodes

GREENPLUM DATABASE

Page 8: Greenplum feature

8 © Copyright 2012 EMC Corporation. All rights reserved.

Performance Through Parallelism

GREENPLUM DATABASE

Network Interconnect

... ...

... ... Master Servers

Query planning & dispatch

Segment Servers

Query processing & data storage

External Sources

Loading, streaming, etc.

Page 9: Greenplum feature

9 © Copyright 2012 EMC Corporation. All rights reserved.

Greenplum Data Computing Appliance

Choose Greenplum Database and/or Hadoop modules in ¼ rack increments

Scale up by adding your choice of additional modules

Minimal time to value

Greenplum Software Solutions

Greenplum Database, Hadoop, & Chorus on your x86 hardware

Flexibility for any workload or environment

Perpetual or subscription licenses

Greenplum Delivers Choice & Flexibility

GREENPLUM DATABASE

Page 10: Greenplum feature

10 © Copyright 2012 EMC Corporation. All rights reserved.

Core Functionality

GREENPLUM DATABASE

Page 11: Greenplum feature

11 © Copyright 2012 EMC Corporation. All rights reserved.

Component Overview

PRODUCT FEATURES

CLIENT ACCESS & TOOLS

Multi-Level Fault Tolerance (RAID, Mirroring, DR with

Data Domain Boost)

Shared-Nothing MPP

Parallel Query Optimizer

Polymorphic Data Storage™

CLIENT ACCESS

ODBC, JDBC, OLEDB,

MapReduce, etc.

CORE MPP ARCHITECTURE

Parallel Dataflow Engine

gNet™ Software Interconnect

Scatter/Gather Streaming™ Data Loading

Online System Expansion Workload Management

GREENPLUM DATABASE ADAPTIVE

SERVICES

LOADING & EXT. ACCESS

Petabyte-Scale Loading

Trickle Micro-Batching

Anywhere Data Access

STORAGE & DATA ACCESS

Hybrid Storage & Execution (Row- & Column-Oriented)

In-Database Compression

Multi-Level Partitioning

Indexes – Btree, Bitmap, etc.

External Table Support

LANGUAGE SUPPORT

Comprehensive SQL

Native MapReduce

SQL 2003 OLAP Extensions

Programmable Analytics

Analytics Extensions (GeoSpatial, PR/R, PL/Java,

PL/Python, PL/Perl)

3rd PARTY TOOLS

BI Tools, ETL Tools

Data Mining, etc

ADMIN TOOLS

Greenplum Command Center

Greenplum Package Manager

GREENPLUM DATABASE

Page 12: Greenplum feature

12 © Copyright 2012 EMC Corporation. All rights reserved.

SINGLE RACK COMPARISON

Most Powerful Data Loading Capabilities

Industry leading performance at 10+TB per-hour per-rack

Scatter-Gather Streaming™ provides true linear scaling

Support for both large-batch and continuous real-time loading strategies

Enable complex data transformations ―in-flight‖

Transparent interfaces to loading via support files, application, and services

Greenplum load rates scale linearly with the number of racks, others do not.

For example, two racks = >20TB/H

Greenplum Oracle Exadata

Netezza Teradata

GREENPLUM DATABASE

Page 13: Greenplum feature

13 © Copyright 2012 EMC Corporation. All rights reserved.

Polymorphic Table StorageTM

• Storage types can be mixed within a table or database

– Four table types: heap, row-oriented AO, column-oriented AO, external

• Rich compression functionality, definable column by column

– Block compression: Gzip (levels 1-9), QuickLZ

– Stream compression: RLE (levels 1-4)

• Flexible indexing, partitioning, and more

TABLE ‗CUSTOMER‘

Mar ‗11

Apr ‗11

May ‗11

Jun ‗11

Jul ‗11

Aug ‗11

Sept ‗11

Oct ‗11

Nov ‗11

Row-oriented for HOT DATA Column-oriented for COLD DATA

GREENPLUM DATABASE

Page 14: Greenplum feature

14 © Copyright 2012 EMC Corporation. All rights reserved.

A supercomputing-based ―soft-switch‖

responsible for

– Efficiently pumping streams of data between motion

nodes during query-plan execution

– Delivers messages, moves data, collects results, and

coordinates work among the segments in the system

gNet Software Interconnect

gNet Software Interconnect

GREENPLUM DATABASE

Page 15: Greenplum feature

15 © Copyright 2012 EMC Corporation. All rights reserved.

Parallel Query Optimizer

Cost-based optimization

looks for the most

efficient plan

Physical plan contains

scans, joins, sorts,

aggregations, etc.

Global planning avoids

sub-optimal ‘SQL

pushing’ to segments

Directly inserts ‘motion’

nodes for inter-segment

communication

PHYSICAL EXECUTION PLAN

FROM SQL OR MAPREDUCE

Gather Motion 4:1(Slice 3)

Sort

HashAggregate

HashJoin

Redistribute Motion 4:4(Slice 1)

HashJoin

Hash Hash

HashJoin

Hash

Broadcast Motion 4:4(Slice 2)

Seq Scan on motion

Seq Scan on customer

Seq Scan on lineitem

Seq Scan on orders

GREENPLUM DATABASE

Page 16: Greenplum feature

16 © Copyright 2012 EMC Corporation. All rights reserved.

Analytics Overview

GREENPLUM DATABASE

Page 17: Greenplum feature

17 © Copyright 2012 EMC Corporation. All rights reserved.

Greenplum gNet

Data Access & Query Layer

GREENPLUM HD

Analytical Capabilities Overview

Stored Procedures

MapReduce

Polymorphic Storage

SQL 2003 OLAP

SQL

GREENPLUM DATABASE

ODBC JDBC

GREENPLUM DATABASE

In-Database Analytics

Page 18: Greenplum feature

18 © Copyright 2012 EMC Corporation. All rights reserved.

Data Access & Query Layer

SQL

GREENPLUM DATABASE

ODBC JDBC

In-Database Analytics: Categories

In-Database Analytics

Partner

Open-Source

User-written

Embedded

SAS/HPA High Performance Analytics

SAS Scoring Accelerator

Open Source Extensions

User-Written Analytical Algorithms

GPDB Embedded Analytics

GREENPLUM DATABASE

Page 19: Greenplum feature

19 © Copyright 2012 EMC Corporation. All rights reserved.

Analytics Highlight: MADlib

Scalable in-database analytics

Data-parallel – Mathematical Algorithms

– Statistical Algorithms

– Machine learning Algorithms

– Supports structured and unstructured data.

Open-source software – Source Accessibility

– Converge business, academic, and open-source communities

GREENPLUM DATABASE

Page 20: Greenplum feature

20 © Copyright 2012 EMC Corporation. All rights reserved.

Manageability, Extensions

GREENPLUM DATABASE

Page 21: Greenplum feature

21 © Copyright 2012 EMC Corporation. All rights reserved.

Single console for both Database and Hadoop

Administration – Start, Stop Database – Recover, Rebalance Segments

Interactive view of System Metrics – Real-time – Historic (Configurable by time period)

In-depth view for System Health – Hardware health – Software (Database, Hadoop)

Query Monitoring – Search, Prioritize, Cancel Queries – View Query‘s Execution Plan

Workload Management – Configure Resource Queues – Prioritize Users

Easy Manageability for Big Data

GREENPLUM DATABASE

Page 22: Greenplum feature

22 © Copyright 2012 EMC Corporation. All rights reserved.

Master Servers

Segment Servers ... ...

Greenplum supports easy deployment of numerous extensions like Madlib, PL/Perl, PL/Java, PostGIS, etc.

GREENPLUM DATABASE

Easy Extension Installation Greenplum Package Manager

Page 23: Greenplum feature

23 © Copyright 2012 EMC Corporation. All rights reserved.

Connect any data set in Hadoop to GP DB‘s SQL Engine

Process Hadoop data in place

Parallelize import/export data from/to Hadoop thanks to GP DB‘s market leading data sharing performance

Supported formats: – Text (compressed and

uncompressed)

– binary

– proprietary/user-defined

GP HD 1.x, GP MR 1.x, CDH3u2

Text Binary User-

Defined

gNet for Hadoop

High Performance gNet for Hadoop Parallel Query Access

GREENPLUM DATABASE

Page 24: Greenplum feature

24 © Copyright 2012 EMC Corporation. All rights reserved.

High Availability, Back up, Support

GREENPLUM DATABASE

Page 25: Greenplum feature

25 © Copyright 2012 EMC Corporation. All rights reserved.

GPDB cluster – 2 Master servers

– Multiple Segment servers

Segment servers support multiple database instances

– Primary instances that actively process queries

– Standby mirror instances

Block level mirroring – Low resource

consumption

– Differential resynch capable for fast recovery

Set of Active Segment Instances

High Availability

GREENPLUM DATABASE

Page 26: Greenplum feature

26 © Copyright 2012 EMC Corporation. All rights reserved.

Backup/Restore with EMC Data Domain

Integration options – NFS: Data Domain device mounted

as NFS storage

– DD Boost: Native, client-side deduplication. Supported in GPDB 4.2 and higher

Drastic reduction in backup storage requirement

Backup all segment servers in parallel directly to Data Domain

Data Domain Integrates seamlessly into standard Greenplum full backup data export and data restore procedures

GREENPLUM DATABASE

Full Appliance

+ Data Domain

Boost or NFS

2 X 10GBit IP

Page 27: Greenplum feature

27 © Copyright 2012 EMC Corporation. All rights reserved.

Ideal for configurations with RPO and RTO requirements that can be specified in hours

Supports:

– Collection Replication for DD Boost backup

– Directory-level replication for NFS backup

– Encryption over the WAN

Data Domain Replication

LAN/WAN

Greenplum DCA Greenplum DCA

Data Domain Data Domain

GREENPLUM DATABASE

Backup and restore between remote and primary sites Backup/Restore with EMC Data Domain

Page 28: Greenplum feature

28 © Copyright 2012 EMC Corporation. All rights reserved.

Customer Support Services

• Remote Technical Support

– 24x7 technical support and remote troubleshooting

– Customer-managed case severity level

– Four-hour response objective

• Onsite Support (DCA Only)

– Installation of replacement parts

– Replacement parts shipped for next business day arrival

– GP SW upgrade included

• Proactive Service

– Secure remote monitoring for hardware (DCA)

– Notification of engineering technical advisories

– Built-in tools maximize stability and performance

• Secure Self-Help

– 24x7 access to eService support tools including

knowledgebase, forums, and appropriately licensed

software updates

GREENPLUM DATABASE

Page 29: Greenplum feature

29 © Copyright 2012 EMC Corporation. All rights reserved.

GREENPLUM DATABASE

Other Relevant Greenplum Sessions

Session Presenter Times Unified Analytics Platform Introduction Brian Wilson Tues 10:00-11:00 Thurs 1:00-2:00

Greenplum Hadoop Overview Susheel Kaushik Mon 10:00-11:00 Wed 4:15-5:15

Greenplum DCA Overview Hanxi Chen Mon 4:00-5:00 Thurs 10:00-11:00

Greenplum Analytics Workbench Apurva Desai Wed 8:30-9:30 Thurs 10:00-11:00

Analytics on Hadoop Don Miner Tues 11:30-12:30 Thurs 8:30-9:30

Big Data Driven Businesses in Action: Creating Real Business Value Using Greenplum UAP (Panel w/4 Customers)

Mike Maxey Wed 4:15-5:15 Thurs 11:30-12:30

Analytics for Business Value: Collaboration Josh Klahr Mon 10:00-11:00 Wed 2:45-3:45

Disruptive Data Science — How Data Science and Big Data are Transforming Business, IT and People

Annika Jimenez David Dietrich

Tues 4:15-5:15 Thurs 11:30-12:30

Page 30: Greenplum feature

30 © Copyright 2012 EMC Corporation. All rights reserved.

Thank You

Page 31: Greenplum feature