Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache...

26
© Copyright 2018 Pivotal Software, Inc. All rights Reserved. Kubernetes Operator for Massively Parallel Postgres Goutam Tadi (@goutamtadi) Senior Software Engineer, Pivotal Software Inc Email: [email protected] PGConf India 2019

Transcript of Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache...

Page 1: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

© Copyright 2018 Pivotal Software, Inc. All rights Reserved.

Kubernetes Operator for Massively Parallel Postgres

Goutam Tadi (@goutamtadi)Senior Software Engineer, Pivotal Software IncEmail: [email protected]

PGConf India 2019

Page 2: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Agenda

● Intro to Greenplum

● Kubernetes 101

● Greenplum for Kubernetes

○ Components

■ Greenplum Operator

■ Greenplum Cluster

● Demo

Page 3: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Massively Parallel Postgres

Greenplum

Page 4: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph
Page 5: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Greenplum Data Platform

ANALYTICALAPPLICATIONS

NATIVE INTERFACES

PIVOTALGREENPLUMPLATFORM

MULTI-STRUCTURED DATA

SOURCES &PIPELINES

Structured Data

JDBC, ODBC

SQL

ANSI SQL

FLEXIBLE DEPLOYMENT

LocalStorage

OtherRDBMSes

SparkGemFireCloudObject

StorageHDFS

JSON, Apache AVRO, Apache Parquet and XML

Teradata SQL

Other DB SQL

Apache MADlib

ML/Statistics/Graph

Python. R,Java, Perl, C

Programmatic

Apache SOLR

Text

PostGIS

GeoSpatial

Custom Apps BI / Reporting Machine Learning AI

On-Premises

NEXT GENERATION

DATA PLATFORM

KafkaETLSpringCloud

Data Flow

MassivelyParallel(MPP)

PostgreSQLKernel

PetabyteScale

Loading

QueryOptimizer(GPORCA)

WorkloadManager

PolymorphicStorage

Command Center

SQL Compatibility

(Hyper-Q)

DS AnalystsIT Dev

PublicClouds

PrivateClouds

FullyManaged

Clouds

5

Page 6: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Container Orchestration System

Kubernetes

Page 7: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Greenplum on Kubernetes 101

Kubernetes Master

Page 8: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Greenplum on Kubernetes 101

Kubernetes Master

kubelet kube-proxy docker

Node

kubelet kube-proxy docker

Node

Page 9: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Greenplum on Kubernetes 101

Kubernetes Master

Pod

kubelet kube-proxy docker

Node

Pod

kubelet kube-proxy docker

Node

Page 10: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Greenplum on Kubernetes 101

Kubernetes Master

Pod

kubelet kube-proxy docker

Node

Pod

kubelet kube-proxy docker

Node

Storage volumes

Page 11: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Greenplum on Kubernetes 101

Kubernetes Master

Pod

Postgres Container

kubelet kube-proxy docker

Node

Pod

Postgres Container

kubelet kube-proxy docker

Node

Storage volumes

Page 12: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Greenplum on Kubernetes 101

Kubernetes MasterGreenplum Service

Pod

Postgres container

kubelet kube-proxy docker

Node

Pod

Postgres Container

kubelet kube-proxy docker

Node

Storage volumes

Page 13: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Greenplum on Kubernetes

Node

Pod

segment-b-0

kubelet kube-proxy docker

Pod

segment-a-0

kubelet kube-proxy docker

Node

Storage volumes

Pod

master-0

kubelet kube-proxy docker

Pod

master-1

kubelet kube-proxy docker

Page 14: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Massively Parallel Postgres on Kubernetes

Greenplum for Kubernetes

Page 15: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Components ● Greenplum Operator

● Greenplum Cluster

Page 16: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Greenplum Operator

apiVersion: "greenplum.pivotal.io/v1"kind: "GreenplumCluster"metadata: name: my-greenplumspec: masterAndStandby: …. Segments: ….

Greenplum Operator

Greenplum Cluster

CREATE / UPDATE / DELETE

Page 17: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Greenplum Cluster

Namespace: defaultKubernetes Cluster

Master StatefulSet

master-0 master-1

Page 18: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Greenplum Cluster

Namespace: defaultKubernetes Cluster

Primary StatefulSet

segment-a-0 segment-a-1

Mirror StatefulSet

segment-b-0 segment-b-1 Master StatefulSet

master-0 master-1

Page 19: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Greenplum Cluster

Namespace: defaultKubernetes Cluster

Primary StatefulSet

segment-a-0 segment-a-1

Mirror StatefulSet

segment-b-0 segment-b-1

ConfigMap

Master StatefulSet

master-0 master-1

Page 20: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Greenplum Cluster

Namespace: defaultKubernetes Cluster

Primary StatefulSet

segment-a-0 segment-a-1

Mirror StatefulSet

segment-b-0 segment-b-1

ConfigMap

Master StatefulSet

master-0 master-1

Page 21: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Greenplum Cluster

Namespace: defaultKubernetes Cluster

Primary StatefulSet

segment-a-0 segment-a-1

Mirror StatefulSet

segment-b-0 segment-b-1

ConfigMap

Master StatefulSet

master-0 master-1

Psql query

Page 22: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Benefits

● Declarative style deployments

● Auto cluster initialization

● Quick and fast deployments

● Easy to expand

● Delete the compute and retain storage for later use

Page 23: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Demohttps://youtu.be/d8X2BXSg07Q

Install Greenplum OperatorInstall Greenplum Cluster

Failover ScenariosExpand Greenplum Cluster

Page 24: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Future

Auto FailoverAuto Cluster Rejoin

PVC SnapshotPod Affinity

Page 25: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

Transforming How The World Builds Software

© Copyright 2017 Pivotal Software, Inc. All rights Reserved.25

Page 26: Massively Parallel Postgres Kubernetes Operator for...GemFire Spark Object Storage HDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL Other DB SQL Apache MADlib ML/Statistics/Graph

© Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0

Resources

http://greenplum-kubernetes.docs.pivotal.iohttps://network.pivotal.io