Greenplum Overview

10
1 02/18/22 XLDB ‘09 Luke Lonergan [email protected]

Transcript of Greenplum Overview

Page 1: Greenplum Overview

105/04/23

XLDB ‘09

Luke [email protected]

Page 2: Greenplum Overview

“Big” numbers for GP today

• 70K/day - Query Rate • 6.5PB – Dataset Size • +100GB/s – Analysis Rate • +3GB/s – Net Loading Rate • 100,000/s – Transaction Rate• 56 TB / kW, 1.6 GB/s/kW – Power Rate• 100s – Number of Data/Compute nodes

05/04/23 2

Page 3: Greenplum Overview

Things I’ve Heard

• Tiered computing– Organizational / Political / Geographic

boundaries require it• Metadata computing for HEP

– “10TB sounds small but it’s not easy”• Processing for Radio Astronomy, HEP

– Data intensive computing– Requires an efficient pipeline from raw to

consumables

05/04/23 3

Page 4: Greenplum Overview

Thoughts

• A lot of plumbing! Moving data around, pipeline processing– Core engine should do this so the plumbing

isn’t done over and over

• Need for specialized access methods and storage classes

• “Computing in data” is key to success

05/04/23 4

Page 5: Greenplum Overview

GP Basic Features

• Access Methods– Compression, Column Store, Heap Store, External

Tables, Indexes (GIST, GIN, Rtree, Bitmap, B-Tree, …)

– Network Ingest / Export directly into parallel pipeline

– Logical Partitioning by Range, List• Parallel Programming Languages

– SQL 2003 with Analytics– Map Reduce in Perl, Python, C, SQL, …– PL/R,python,perl,C,pgSQL,SQL, …

05/04/23 5

Page 6: Greenplum Overview

From Enterprise Data Clouds

• Elastic / adaptive infrastructure for data warehousing and analytics– IT Operations deploy pools of low-cost commodity infrastructure

• Physical servers, virtual infrastructure, or onramp to public cloud– DBAs and Analysts provision sandboxes and warehouses in minutes

• Assemble the data they need (common, private, etc) for agile analytics

05/04/23 6 Proprietary & Confidential

DBA

Analyst

ConsumerDivision

PackagedGoods Finance

408

16 16120Free 16 16 68

Free96 40 64

Free

Infrastructure

Warehouses

IT Operations

Page 7: Greenplum Overview

Use Case: Big TelcoData Mart Consolidation

05/04/23 7 Proprietary & Confidential

Goals:•Reduce maintenance and support costs from proliferation of data mart platforms•Reduce risks and exposure due to data in shadow IT systems•Break down silo walls - provide a unified way to find and access all data

Approach:•Embrace data – encourage ‘physical consolidation’ in advance of data model unification•Provide ‘self serve’ model to bring shadow IT into the light•Allow unified data access and pragmatic ‘logical’ data model unification incrementally

DataSources

US- West100 nodes

XX

X

X

XX

X

X

X

Page 8: Greenplum Overview

Use Case: Big Ad NetworkProject Sandboxes

05/04/23 8 Proprietary & Confidential

Goals:•Remove IT barriers to analyst productivity and value creation•Dramatically reduce IT resource constraints and delays – i.e. realize ideas sooner•Combine centralized ‘EDW’ data with freshly discovered feeds and other useful sources

Approach:•Self-serve creation of project warehouses in minutes – and elastically expand as needed•Load new data feeds without requiring formal modeling•Bring together any data within the EDC – even if globally distributed – and analyze

US- East100 nodes

Analyst’s New Warehouse

Analyst’s Private

Data Feed

EDC

Self-ServeDashboard

Page 9: Greenplum Overview

GP is Software – Develop Now

• Download at:– Gpn.greenplum.com– Get the VMWare image or use it on OSX, Linux,

Solaris

05/04/23 9

Page 10: Greenplum Overview

Think Big. Think Fast.