White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white...

32
White Paper EMC Solutions Group Abstract This white paper provides an overview on how you can use EMC ® Greenplum Data Computing Appliance SAN Mirror on EMC VNX to deploy local backup and recovery solutions for EMC Greenplum data analytics environments. April 2012 EMC GREENPLUM DATA COMPUTING APPLIANCE SAN MIRROR ON EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain Use your existing VNX SAN backup Deploy EMC NetWorker for managed backup Deploy EMC Data Domain for data deduplication and scalability

Transcript of White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white...

Page 1: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

White Paper

EMC Solutions Group

Abstract

This white paper provides an overview on how you can use EMC® Greenplum™ Data Computing Appliance SAN Mirror on EMC VNX™ to deploy local backup and recovery solutions for EMC Greenplum data analytics environments.

April 2012

EMC GREENPLUM DATA COMPUTING APPLIANCE SAN MIRROR ON EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

• Use your existing VNX SAN backup

• Deploy EMC NetWorker for managed backup • Deploy EMC Data Domain for data deduplication and scalability

Page 2: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

2

Copyright © 2012 EMC Corporation. All Rights Reserved.

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.

All trademarks used herein are the property of their respective owners.

Part Number H10537

Page 3: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

3 EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

Table of contents

Executive summary ............................................................................................................................... 5

Business case .................................................................................................................................. 5

Solution overview ............................................................................................................................ 5

Key recommendations ..................................................................................................................... 6

Introduction.......................................................................................................................................... 7

Purpose ........................................................................................................................................... 7

Scope .............................................................................................................................................. 7

Audience ......................................................................................................................................... 7

Terminology ..................................................................................................................................... 7

Solution overview ................................................................................................................................. 9

Introduction ..................................................................................................................................... 9

How DCA mirrors data ...................................................................................................................... 9

Mirroring data to the SAN with SAN Mirror ...................................................................................... 11

Solution architecture...................................................................................................................... 13

Hardware resources ....................................................................................................................... 13

Software resources ........................................................................................................................ 14

Key technology components .............................................................................................................. 15

Introduction to the components ..................................................................................................... 15

EMC Greenplum DCA ...................................................................................................................... 15

EMC Greenplum Database .............................................................................................................. 17

EMC VNX ........................................................................................................................................ 17

EMC SnapView ............................................................................................................................... 18

EMC PowerPath .............................................................................................................................. 18

EMC NetWorker .............................................................................................................................. 18 EMC Data Domain deduplication storage system ........................................................................... 18

EMC Data Domain Boost ................................................................................................................ 19

Design and configuration ................................................................................................................... 20

Introduction ................................................................................................................................... 20

SAN configuration .......................................................................................................................... 20

VNX configuration .......................................................................................................................... 21 Allocation and mounting of SAN devices on the DCA ...................................................................... 23

Moving mirrors ............................................................................................................................... 24

Local backup and recovery ................................................................................................................. 26

Introduction to local backup and recovery ...................................................................................... 26

Backup .......................................................................................................................................... 26

Page 4: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

4

Backup options ......................................................................................................................... 26

Backup steps ............................................................................................................................ 27

Recovery ........................................................................................................................................ 28

Recovery options ....................................................................................................................... 28

Recovery steps .......................................................................................................................... 28

Conclusion ......................................................................................................................................... 31

Summary ....................................................................................................................................... 31 Key findings ................................................................................................................................... 31

References.......................................................................................................................................... 32

White papers ................................................................................................................................. 32

Product documentation .................................................................................................................. 32

Page 5: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

5 EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

Executive summary

Today’s data warehouses need to address an increasingly broad range of capabilities, driven by changing business requirements and enabled by advances in technology.

Businesses are being driven to respond to realtime events to improve operational efficiency, to meet or exceed service level agreements, to respond to realtime market conditions, or to predict the next change. Analytics performed on the most recent data are of the greatest value to these organizations.

Changes to regulatory requirements have had a major impact on data warehouses, which require broader protection and security of business and customer data.

Data warehouses are experiencing exponential data growth, driven in part by the addition of unstructured or semi-structured data (such as Internet clickstream data) with market, financial, and customer data. The cleansing and transformation of data as it is loaded into the warehouse further increases the value of this data to an organization. Businesses cannot afford to lose this data or the value that was added in the process.

Data warehouses have become mission-critical systems for organizations. The need for enterprise-class protection of the data in data warehouses has become an implicit business requirement. The size of the database, including historical data, and the critical nature of the data, introduces new challenges to backup and recovery of data warehouses. Organizations require solutions that:

• Provide effective protection and security, including the ability to replicate data offsite

• Recover data quickly, without the need to recreate current or historical data

• Are simple and efficient to deploy and manage, and nondisruptive to data warehouse operations

New systems, such as the EMC® Greenplum® Data Computing Appliance (DCA), are capable of loading data at high speeds and in near real time, which avoids the need for overnight or weekly “snapshots”. As a result, the data in the warehouse is more current and of higher value to the organization.

Greenplum has created the DCA, a purpose-built data analytics and business intelligence (BI) database system for Big Data. The DCA is offered in a variety of configurations to meet different customer sizing and performance needs.

The DCA Storage Area Network (SAN) Mirror on EMC VNX™ for local backup and recovery solution enables organizations to:

• Use their existing VNX SAN storage infrastructure by enabling them to integrate EMC Greenplum Database™ into their existing SAN-based backup and recovery procedures

• Increase primary storage space on their DCA system

Business case

Solution overview

Page 6: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

6

This solution provides a recoverable instance of a Greenplum Database in the event that a local recovery is required, or a site-level disaster requires a remote restore from backup media.

The key recommendations of this solution are to:

• Use DCA SAN Mirror architecture to offload the backup of the Greenplum Database environment from the DCA

• Use EMC NetWorker® or a similar backup application to manage the backup, which reduces risk and ensures reliable backup and recovery for the Greenplum Database environment

• Use EMC Data Domain® storage systems for its data deduplication and scalability

Key recommendations

Page 7: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

7 EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

Introduction

The purpose of this white paper is to describe the validation of a solution for the EMC Greenplum DCA using an EMC VNX SAN storage array. This white paper details the functionality of the solution, and shows how the DCA SAN Mirror configuration is set up and how you achieve backup and recovery.

The scope of this white paper is to:

• Document the configuration component used in this solution

• Identify some best practices that EMC Professional Services can use when implementing customer-specific DCA backup and restore configurations

The primary audience of this white paper is EMC Professional Services and EMC customers looking to understand how a local backup and recovery solution for the Greenplum DCA can be achieved.

It is assumed that readers of this document have basic awareness and familiarity with EMC products, such as Greenplum DCA, VNX SAN storage array, NetWorker backup application, and Data Domain backup storage appliance.

This paper includes the following terminology.

Table 1. Terminology

Term Definition

Business intelligence (BI) The effective use of information assets to improve the profitability, productivity, or efficiency of a business. IT professionals use this term to refer to the business applications and tools that enable such information usage. The source of information is frequently the data warehouse.

Data warehousing (DW) The process of organizing and managing the information assets of an enterprise. IT professionals often refer to the physically stored data content in some databases that are managed by database management software as the data warehouse. They refer to applications that manipulate the data stored in such databases as DW applications.

Data Computing Appliance (DCA)

The DCA is a purpose-built, highly scalable, parallel DW appliance that architecturally integrates database, compute, storage, and network into an enterprise-class, easy-to-implement system.

Massively parallel processing (MPP)

A type of distributed computing architecture where tens to hundreds of processors team up to work concurrently to solve large computational problems.

Purpose

Scope

Audience

Terminology

Page 8: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

8

Term Definition

Storage area network (SAN) A network of storage disks and disk subsystems. By treating all of a company's storage as a single resource, disk maintenance and routine tasks are easier to schedule and control.

Fibre Channel over Ethernet (FCoE)

An encapsulation of Fibre Channel (FC) frames over Ethernet networks. This enables Fibre Channel to use 10 Gigabit Ethernet networks (or higher speeds) while preserving the Fibre Channel protocol.

Converged Network Adaptor (CNA)

Combines the functionality of a host bus adapter (HBA) with a network interface controller (NIC).

Page 9: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

9 EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

Solution overview

The DCA normally maintains two copies of customer data within the appliance. The primary copy of the data is used for queries. The mirrored copy of the data resides in a separate database instance on a separate Segment Server and receives updates from the primary server. The mirrored copy is only accessed if there is a failure on the Segment Server where the primary copy resides. With both a primary copy and a mirrored copy of the data, this solution provides resiliency while offering the highest level of performance for Data Warehousing/Business Intelligence (DW/BI) tasks.

In a SAN Mirror configuration, the second copy of the data moves from the internal disks to the SAN-based storage. The DCA retains the primary copy of the data to maximize query performance. The SAN Mirror copy is updated with writes, but it is not read unless a Primary segment instance becomes inaccessible.

By keeping the mirrored copy on the SAN, customers can use EMC local replication, and backup and recovery technologies, such as EMC SnapView™ and Data Domain Boost, to create local full copies or point-in-time images for backup and disaster recovery. Another benefit of the SAN mirror configuration is that it nearly doubles the internal capacity to be used as additional primary storage.

The DCA is a fully redundant system, where a mirrored copy of the data is maintained and processing continues even if one or more components in the system fail. The system can process large amounts of data by distributing the load across several servers or hosts. A Greenplum Database is similar to a loosely coupled array of individual highly customized PostgreSQL databases, which all work together to present a single database image.

The Master Server is the entry point to the Greenplum Database system. The Master Server contains all of the metadata required to distribute transactions across the system, but it does not hold any user data. The Master Server coordinates the work with the other database instances in the system, which handles data processing and storage.

When you deploy a Greenplum Database system, you have the option to make a mirror of the segment instances. If one copy of each segment's data remains available, mirroring allows the database to remain operational if a Segment Instance or Segment Server goes down. Primary segment instances replicate to Mirror segment instances at the subfile level. To guarantee that both Primary and Mirror segments present the same crash-consistent database image, Primary segments wait for Mirror segments to synchronize data for transaction commits and periodic database checkpoints.

On a Primary segment failure, the Mirror segment goes into change tracking mode and saves a list of changes made to its data. Once the underlying segment failure is resolved, an online recovery process copies only the changed data to the failed segment or makes a full copy.

A DCA is a self-contained system, where each Segment Server has its own internal storage. Six Primary segments and six Mirror segments are stored on each Segment

Introduction

How DCA mirrors data

Page 10: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

10

Server. Corresponding Primary and Mirror segments are distributed over separate Segment Servers in the DCA.

Figure 1 shows a conceptual diagram of the mirrored configuration. For an uncluttered diagram, you see only three Primary and three Mirror segments. Typically, for a DCA, each Segment Server has six Primary and six Mirror segments.

Figure 1. Mirrored configuration

Page 11: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

11 EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

As shown in Figure 2, if a Segment Server fails, the Mirror segments for the Primary segments that reside on the failed server become active.

Figure 2. Segment Server failure

You can also mirror the Master Host to a separate Standby Master Host. The Standby Master Host serves as a warm standby if the Primary Master Host becomes non-operational. The transaction log replication process keeps the Standby Master Host up to date.

In a SAN Mirror configuration, the mirrored segments move from internal storage to an array in the SAN. A SAN Mirror configuration has the advantage of “unlocking” the data so it is not held completely within the DCA. Therefore, it is easier for you to implement other operational facilities, such as backup offload, which provide higher data storage integrity, reliability, and availability.

A SAN Mirror configuration increases the DCA's internal capacity. While a standard DCA configuration stores both primary and mirrored data internal to the appliance, a SAN Mirror configuration moves the mirrored data to the SAN.

Mirroring data to the SAN with SAN Mirror

Page 12: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

12

Figure 3 shows a conceptual diagram of the SAN Mirror configuration.

Figure 3. SAN Mirror configuration

Page 13: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

13 EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

Figure 4 illustrates the architectural layout used to validate the DCA SAN Mirror on VNX for a local backup and recovery solution. Table 2 and Table 3 provide details of the hardware and software resources used in this solution.

Figure 4. Solution architecture

Table 2 lists the hardware used to validate the solution.

Table 2. Hardware components

Equipment Quantity Configuration

EMC Greenplum DCA 1 2 x Greenplum DB standard modules

EMC VNX 1 EMC VNX5500™ high bandwidth

EMC Data Domain 1 EMC Data Domain DD890™

Proxy server 1 12 core/48 GB memory

Fibre Channel switch 2 8 Gb 48 port

Ethernet switch 2 10 Gb 24 port

Solution architecture

Hardware resources

Page 14: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

14

Table 3 lists the software used to validate the solution.

Table 3. Software components

Software Version

EMC Greenplum DCA 1.2.0.0

EMC Greenplum Database 4.2.1.0

EMC VNX Block OE 05.31.000.5.704

EMC PowerPath® 5.6

EMC NetWorker 7.6.3

EMC Data Domain OS 5.1

Software resources

Page 15: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

15 EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

Key technology components

This section identifies and describes the key technology components deployed in the solution environment. The components include:

• EMC Greenplum DCA

• EMC Greenplum Database

• EMC VNX

• EMC SnapView

• EMC PowerPath

• EMC Networker

• EMC Data Domain deduplication storage system

• EMC Data Domain Boost

The DCA is a purpose-built, highly scalable, parallel data DW appliance. The DCA architecturally integrates database, computing, storage, and network resources into an enterprise-class, easy-to-implement system. The DCA offers the power of massively parallel processing (MPP) architecture, delivers the fastest data loading capacity, and the best price to performance ratio without the complexity and constraints of proprietary hardware.

Key features of the DCA • The DCA uses Greenplum Database software, which is based on MPP

architecture. MPP harnesses the combined power of all available compute servers to ensure maximum performance.

• The base architecture of the DCA is designed with scalability and growth in mind. This enables organizations to extend their DW/BI capability in a modular way. By expanding from a Greenplum Database Module (quarter-rack) up to twelve full racks with minimal downtime, you can achieve linear gains in capacity and performance.

• Greenplum Database software supports incremental growth (scale out) of the data warehouse through its ability to automatically redistribute existing data across newly added computing resources.

• The DCA employs a high speed interconnect bus that provides database level communication between all servers in the DCA. It is designed to accommodate access for rapid backup and recovery and data load rates (also known as ingest).

• Excellent performance is provided by effective use of the combined power of servers, software, network, and storage.

• The DCA can be installed and available onsite within 24 hours of the customer receiving delivery.

• The DCA uses innovative industry standard commodity hardware rather than specialized or proprietary hardware.

Introduction to the components

EMC Greenplum DCA

Page 16: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

16

• The DCA is offered in multiple rack appliance configurations to achieve the maximum flexibility and scalability for organizations faced with terabyte to petabyte scale data opportunities.

The DCA is composed of Standard, Capacity, Hadoop (HD), or Data Integration Accelerator (DIA) modules. Each module consists of four servers. The DCA is configured and scaled according to application requirements:

• A DCA configuration starts with one rack containing one Standard or one Capacity Module

• Up to four modules can be configured per rack

• Additional racks can be configured, up to 12 racks

Table 4 describes the main components of the DCA.

Table 4. Components of the DCA

Item Description

Greenplum Database

An MPP database server, based on PostgreSQL open-source technology. It is explicitly designed to support BI applications and large, multi-terabyte data warehouses.

Greenplum Database system

An associated set of Segment Instances and a Master Instance running on an array, which can be composed of one or more hosts.

Master Servers The servers that run the master database, responsible for the automatic parallelization of queries.

Segment Servers The servers that run the segment instances and perform the real work of processing and analyzing the data.

Interconnect Switch Provides high-speed communication between Master and Segment Servers. It consists of two switches to communicate requests from the Master to the Segment Servers, between Segment Servers, and to provide high-speed access to the Segment Servers for quick parallel loading of data across all Segment Servers.

Admin Switch Provides the management interface between the servers and additional racks.

Greenplum DB Standard Module

A DCA Greenplum Database module that runs the Greenplum Database and consists of four Segment Servers, each with 12 x 600 GB SAS disk drives.

Greenplum DB Capacity Module

A DCA Greenplum Database module that runs the Greenplum Database and consists of four Segment Servers, each with 12 x 2 TB SATA disk drives.

Hadoop (HD) Module

A DCA module consists of four Segment Servers that runs the Greenplum Hadoop Community Edition (CE) software.

Data Integration Accelerator (DIA) Module

A module designed for fast data integration and parallel, batch, and micro batch data loading. A DIA consists of four servers using certified partner software.

Page 17: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

17 EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

For more information about the DCA, refer to the white paper EMC Greenplum Data Computing Appliance: Architecture, Performance, and Functions - A Detailed Review.

The EMC Greenplum Database is a shared-nothing, MPP architecture, designed for BI and analytical processing. The core shared-nothing MPP architecture enables massive data storage loading and processing with unlimited linear scalability. Each server node acts as a self-contained database management system that owns and manages a distinct portion of the overall data. This architecture allows you to start small and scale for additional capacity and performance, while online, up to the largest multi-petabyte data warehouse.

The Greenplum Database provides automatic parallelization of data and queries, with no need for manual partitioning or tuning. All data is automatically partitioned across all nodes of the system, and queries are planned and executed using all nodes working together in a highly coordinated fashion. Dynamic Query Prioritization automatically balances running queries across resources and allows database administrators (DBAs) to control query priorities in real time. High performance loading, using MPP Scatter/Gather Streaming™ technology enables data to be loaded at frequent intervals (for example, every five minutes), while maintaining extremely high data ingest rates.

Using Greenplum’s Polymorphic Data Storage™ technology, DBAs can select the storage, execution, and compression settings that suit the way that a specific table is accessed. With this feature, you have the choice of row- or column-oriented storage and processing for any table or partition. Greenplum’s in-database compression uses compression technology to increase performance and dramatically reduce the space required to store data. You can expect to see a three- to ten-times disk space reduction with a corresponding increase in effective I/O performance.

The EMC VNX family delivers innovation and enterprise capabilities for file, block, and object storage in a scalable, easy-to-use solution. This next-generation storage system combines powerful and flexible hardware with advanced efficiency, management, and protection software to meet the demanding needs of today’s enterprises.

All of this is available in a choice of systems ranging from affordable entry-level solutions to high-performance, petabyte-capacity configuration, which service the most demanding application requirements.

The VNX operating environment enables Microsoft Windows and Linux/UNIX clients to share files in multi-protocol (NFS and CIFS) environments. Simultaneously, it supports iSCSI, Fibre Channel, and Fibre Channel over Ethernet (FCoE) access for high-bandwidth and latency-sensitive block applications.

The VNX series is powered by the Intel Quad Core Xeon 5600 series with a 6 Gb/s SAS drive back-end and can provide up to 10 GB/s bandwidth for data warehouse applications.

For this solution, EMC used VNX5500.

EMC Greenplum Database

EMC VNX

Page 18: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

18

SnapView can be used to create local point-in-time snapshots and fill-copy clones of production data for nondisruptive backup. The snapshot images and fractured clones are then available for mounting on a secondary server. The snapshot images can be used for other repurposing such as backups, decision support, or testing. In the event that the primary server access to the production database is interrupted, SnapView snapshot images and clones ensure reliable and quick access to the data from a secondary server. Additionally, data from a snapshot image or clone can be restored back to its source LUN if a data corruption occurs on the source LUN.

EMC PowerPath is server-resident software that enhances performance and application availability. It works with the storage system to intelligently manage I/O paths, and supports multiple paths to a logical device. PowerPath provides automated failover and recovery. If there is a hardware failure, PowerPath detects a path failure and redirects the I/O to another path. PowerPath subsequently puts the faulty path back into service once it has been repaired. PowerPath also does non-disruptive, transparent load-based testing to ensure that a data path can carry the workload presented to it.

EMC NetWorker, a backup and recovery software application, comprises a high capacity, easy-to-use data storage management solution that protects and helps to manage data across an entire network. NetWorker simplifies the storage management process and reduces the administrative burden by automating and centralizing data storage operations.

The NetWorker product suite includes the following components:

• NetWorker client—Communicates with the NetWorker server and provides backup and recovery functionality for the node on which it resides. It is installed on all nodes that are backed up to the NetWorker server.

• NetWorker storage node—Data can be backed up directly to devices that are attached to a NetWorker server or to a NetWorker storage node. A NetWorker storage node is a storage device, physically attached to another computer, whose backup operations are controlled by the NetWorker server.

• NetWorker server—The host running the NetWorker server software that contains the online indexes and provides backup and recovery services to the clients on the same network.

• NetWorker Management Console—Manages all of the NetWorker servers and clients. The Console also provides reporting and monitoring capabilities for all NetWorker server and clients.

The EMC Data Domain deduplication storage system dramatically reduces the amount of disk storage needed to retain and protect enterprise data. By identifying redundant data as it is being stored, Data Domain provides a storage footprint that is up to 30 times smaller, on average, than the original dataset. Backup data can then be efficiently replicated and retrieved over existing networks for streamlined disaster recovery and consolidated tape operations. This allows Data Domain appliances to integrate seamlessly into database architectures, maintaining existing backup strategies with no changes to scripts, backup processes, or system architecture.

EMC SnapView

EMC PowerPath

EMC NetWorker

EMC Data Domain deduplication storage system

Page 19: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

19 EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

The Data Domain appliance is a fast, cost-effective, and scalable single-controller deduplication storage solution for disk-based backup and network-efficient disaster recovery.

The Data Domain Stream-Informed Segment Layout (SISL™) scaling architecture enables the fast-inline deduplication throughput of the appliance. A CPU-centric approach to deduplication delivers a high throughput while minimizing the number of disk spindles required.

Data Domain Boost extends the backup optimization benefits of the Data Domain deduplication storage system by distributing parts of the deduplication process to the backup server or application client. Data Domain Boost dramatically increases throughput speeds, minimizes backup LAN load, and improves backup server utilization.

EMC Data Domain Boost

Page 20: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

20

Design and configuration

The DCA consists of two EMC Connectrix MP-8000B network switches. Each switch contained 24 Ethernet ports and eight FC ports, which makes 16 FC ports available for SAN connectivity. Each FC port can sustain approximately 800 MB/s. Depending on the bandwidth required, not all 16 ports are required to be connected to the SAN.

By default, FCoE is not configured on the switches; therefore, FCoE must be enabled individually on each of the MP-8000B’s internal Ethernet ports for SAN connectivity. We performed the following steps:

1. Added a VLAN classifier rule to dynamically classify the Ethernet packets on an untagged interface into the VLANs.

2. Added rules to the VLAN classifier groups.

3. Created a Converged Enhanced Ethernet (CEE) map and configured the bandwidth for each group.

4. Enabled FCoE on the VLAN interface.

5. Configured the interfaces to converged mode.

6. Activated the VLAN classified group.

7. Set FCoE priorities.

8. Applied the CEE provision map.

The MP-8000Bs also support trunking. The trunking feature optimizes the performance of the external SAN connection by allowing a group of inter-switch links (ISLs) to merge into a single logical link. Trunking is automatically implemented for any eligible ISLs after you install the trunking license. The license must be installed on each switch that participates in trunking. EMC recommends that the trunking license is installed on both Connectrix MP-8000B switches within the DCA, and on the SAN switches to which the DCA is connected.

Each Master and Segment Server in the DCA has a Converged Network Adapter (CNA). A CNA combines the functionality of a host bus adapter (HBA) with a network interface controller (NIC) and therefore a separate HBA is not required for each Segment and Master Server when implementing SAN Mirror.

Once you update the switch configuration, each CNA on each Master and Segment Server will then display as a separate initiator to be zoned on the SAN.

Introduction

SAN configuration

Page 21: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

21 EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

Figure 5 and Table 5 show the port assignment for the two EMC Connectrix MP-8000B switches in the DCA.

Figure 5. Port assignments

Table 5. Port assignments

Port Assignment

CEE 0 – 15 Segment Servers

CEE 16 – 17 Master Server

CEE 18 – 19 ETL/Data Domain

CEE 20 – 23 Inter rack

FC 0 – 7 SAN Mirror

When faced with sizing the correct VNX storage configuration, the best approach is to select the VNX product that will satisfy the expected business requirement. You then select the number of front-end modules that would provide the needed host connectivity to drive the required bandwidth. Finally, you must configure enough drives to accommodate not only the capacity needed, but also to ensure that you spread the data over enough drives to deliver the performance needed.

Before designing or configuring a VNX array to support the SAN Mirror solution, it is extremely important that you fully analyze the customer’s workload and requirements. EMC recommends using EMC VNX performance specialists to ensure that expectations are met and the best configuration for the customer’s needs is achieved.

For this solution, EMC chose the VNX5500 high bandwidth model. VNX5500 uses the integrated connectors to support up to two back-end bus connections and two 8 Gb FC connections for each storage processor. VNX5500 also accommodates two I/O expansion connection modules for potential additional connections. VNX Block OE 31.5 introduced a new enhancement in the storage system firmware where the VNX5500 can recognize and allow a second back-end bus I/O module to be used in one of the two I/O expansion slots. By inserting a back end bus I/O module into one of the two expansion slots, you can add four more back-end disk bus loops. This can bring the back-end sustained-read bandwidth to over 6,000 MB/s, the theoretical maximum that the SP memory subsystem can support. To fully realize the effect of the 6,000+ MB/s read-bandwidth advantage, if a single connection protocol is used, the

VNX configuration

Page 22: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

22

remaining I/O module must be configured for 8 Gb FC connections. The FC module, together with the FC-integrated connection ports from the SP can support over 3,000 MB/s per SP (two integrated 8 Gb FC ports and four other deployable FC connection ports from the expansion FC connection I/O module).

Figure 6 shows the recommended connection model that can drive the VNX5500 to the 6,000+ MB/s read level by using 8 Gb FC connections.

Figure 6. Recommended connection model

To use the storage capacity of the internal disks of a DCA in a SAN Mirror setup, EMC created the LUN size that is the same size as the data partitions on the DCA Segment Servers. Table 6 shows the amount of storage required by a DCA with two Greenplum DB Standard Modules as used in this solution.

Table 6. Storage requirements

DCA VNX

Each Segment Server (total of 8) 2 x 2.7 TB 2 x 2.7 TB

Master Server 1 x 2.1 TB 1 x 2.1 TB

Standby Master Server 1 x 2.1 TB 1 x 2.1 TB

EMC created a storage pool of 120 600 GB 10k rpm SAS drives, where 16 2.7 TB and two 2.1 TB RAID 5 4+1 thick pool LUNs were provisioned.

To support SnapView/Clones, EMC created a storage pool of 40 2 TB 7.2k rpm NL-SAS drives, where 16 2.7 TB and two 2.1 TB RAID 6 6+2 thick pool LUNs were provisioned.

To support SnapView/Snapshots, EMC created a storage pool of 25 600 GB 10k rpm SAS drives, where 36 275 GB RAID 5 4+1 thick pool LUNs were provisioned. EMC determined the size and configuration of the reserved LUN pool (RLP) by the customer workload profile and the frequency of the snapshots taken on the array.

B

B

B

A

A

B

B

B

B

B

B

B

A

A

SAS back-end I/O SLIC

8 Gb FC front-end I/O SLIC

SP-B SP-A

Back-end SAS connections Front-end 8 Gb FC connections

Page 23: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

23 EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

VNX5500 has 6,989 MB of cache available per SP for read and write. In this solution, EMC assigned 500 MB per SP for read cache and the remainder for write. Due to the read-ahead handled by the OS/ Greenplum DB, and the low rate of read hits due to table scans, 500 MB is sufficient. Since we worked with block sizes of 512 KB, cache page size was set to 16 KB. The low/high watermark was set to 50 percent/60 percent to give us high write-burst headroom in the write cache and helped to reduce the amount of write cache lookups for possible rehit when we performed reads.

When configuring storage pools, LUNs, and SP parameters for a customer, it is important to follow best practices and to consult with a local EMC VNX specialist.

With the SAN zoned and VNX configured, EMC performed the following steps:

1. Created storage groups for each DCA Master and Segment Server.

2. Registered DCA Master and Segment Server’ initiators on the VNX.

3. Added hosts to storage groups.

4. Added LUNs to storage groups.

EMC PowerPath was installed on all DCA Master and Segment Servers to manage the multiple SAN paths available to each LUN and to present that LUN by using one PowerPath pseudo name. PowerPath was configured to have only one active path to the LUN owning SP. This was done so there was less overhead in ordering the big 512 KB data blocks, unlike multiple active paths in a typical OLTP setup where service latency is a primary concern.

With PowerPath installed, we performed the following steps to configure and mount the LUNs:

1. Scan buses:

echo "- - -" > /sys/class/scsi_host/host1/scan echo "- - -" > /sys/class/scsi_host/host2/scan

2. Configure new paths/luns to logical devices:

powermt config

3. Save config:

powermt save

4. Create an XFS file system on the PowerPath pseudo device:

Master Servers

mkfs -t xfs -f /dev/emcpowera

Segment Servers

mkfs -t xfs -f /dev/emcpowera mkfs -t xfs -f /dev/emcpowerb

5. Create a folder to which the PowerPath pseudo device can mount:

Segment servers

Allocation and mounting of SAN devices on the DCA

Page 24: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

24

mkdir /data1/san_mirror mkdir /data2/san_mirror

6. Mount the PowerPath pseudo device:

Master servers

mount -o noatime,inode64,allocsize=16m /dev/emcpowera /data/master

Segment servers

mount -o noatime,inode64,allocsize=16m /dev/emcpowera /data1/san_mirror mount -o noatime,inode64,allocsize=16m /dev/emcpowerb /data2/san_mirror

7. Set the read-ahead:

Master servers

blockdev --setra 16384 /dev/emcpowera

Segment servers

blockdev --setra 16384 /dev/emcpowera blockdev --setra 16384 /dev/emcpowerb

Following a reboot, for devices to remain persistent and automatically mount, you must update the /etc/fstab to include the PowerPath pseudo name of the presented SAN devices on each individual server.

Greenplum Database is initialized on a DCA using dca_setup with the option of having the mirrors on either local storage or SAN storage. A Greenplum Database utility, gpmovemirror, can move the mirrors to the SAN for a Greenplum Database on a DCA already initialized with mirrors on local storage. Moving the mirrors is an online process for the Segment Servers. Before using the gpmovemirrors utility, you must run dca_setup to create a config file for gpmovemirrors. The config file specifies the host address, port, and system file space location of the current mirror and the host address, port, replication port, and system file space location of the new mirror in the following format:

[<filespace1_fsname>[:<filespace2_fsname>:...]<old_address>:<port>:<system_filespace_location>[<new_address:port>:<replication_port>:<system_filespace_location>[:<fselocation>:...]]

For example:

sdw1-1:50000:/data1/mirror/gpseg11 sdw1-1:50000:51000:/data1/san_mirror/gpseg11

You can find the host address, ports, and system filespace location information in the gp_segment_configuration and pg_filespace tables. For more information, refer to the Greenplum Database 4.2 Administrator Guide.

Moving mirrors

Page 25: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

25 EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

Note You can allocate and mount the SAN Mirrors on the Segment Servers with the database online.

As the Master Server and Standby Master Server also require offloading of their operations to the VNX, EMC performed these steps:

1. Removed the Standby Master Server (smdw):

gpinitstandby –r

2. Mounted the SAN device on the Standby Master Server using this syntax (all one line):

mount -o noatime,inode64,allocsize=16m /dev/emcpowera /data/master

3. Initialized the Standby Master Server:

gpinitstandby -s smdw

4. Stopped the Master Instance:

gpstop -m

5. Activated the Standby Master Server:

gpactivatestandby –f -d /data/master/gpseg-1/

6. Deleted the master data folder on the Master Server (mdw):

rm –r /data/master/*

7. Mounted the SAN device on the Master Server using this syntax (all one line):

mount -o noatime,inode64,allocsize=16m /dev/emcpowera /data/master

8. Initialized mdw as a Standby Master Server:

gpinitstandby -s mdw

9. Stopped the Master Instance:

gpstop -m

10. Activated the Master Server (mdw):

gpactivatestandby –f -d /data/master/gpseg-1/

11. Deleted the master data folder on the Standby Master Server (smdw):

rm –r /data/master/*

12. Initialized the Standby Master Server (smdw):

gpinitstandby -s smdw

Note This procedure requires a short period of downtime as the database requires a restart during these operations.

Page 26: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

26

Local backup and recovery

Once you implement the DCA SAN Mirror, you now have a full working copy of Greenplum Database available on the VNX SAN storage array. The working copy, which resides on the VNX storage, can then be integrated into the customer’s existing SAN-based backup and recovery scheme.

Before you back up the SAN Mirror data copy, you need to take a point-in-time snapshot image or full clone copy. To choose the best option, you need to consider:

• SnapView Clone backup of SAN Mirror

Pros

− No performance impact to DCA if there are high rates of change

Cons

− Two VNX disk capacity is required

• SnapView Snapshot backup of SAN Mirror

Pros

− Less disk capacity required (about 10 percent of SAN Mirror) versus clone

− Can take multiple snapshots without consuming a lot of capacity

Cons

− Potential performance impact if there is a high rate of change on the DCA

Backup options

Once there is a snapshot or clone of SAN Mirror, you then need to back it up to a disk or tape-based backup target. There are many EMC solutions available such as:

• EMC NetWorker with EMC Data Domain and DD Boost

• EMC NetWorker with EMC Data Domain and Network File System (NFS)

• EMC NetWorker with EMC Data Domain and direct SCSI

For this solution, we used EMC NetWorker with EMC Data Domain and DD Boost, due to these advantages:

• Simplified device setup and configuration by using wizards

• Increased aggregate backup throughput

• Integrated NetWorker advanced reporting of the Data Domain storage systems

• Available as a kit from EMC Professional Services

Introduction to local backup and recovery

Backup

Page 27: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

27 EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

Backup steps

Use these steps to back up SnapView Clone of SAN Mirror to EMC Data Domain using EMC NetWorker and DD Boost:

1. Check the consistency of the Segment Instances1.

2. Synchronize the SnapView Clones with DCA mirrors and masters.

3. Force the checkpoint in the transaction log2.

4. Flush the file system buffers.

5. Consistent fracture SnapView Clones.

6. Add the clones to the proxy storage group.

7. Scan the buses on the proxy server for new devices.

8. Run PowerPath discovery on the proxy server.

9. Mount the clones to the proxy server.

10. Back up the clones to EMC Data Domain over DD Boost using EMC NetWorker.

11. Unmount the clones from the proxy server.

1Check consistency of the segment instances Because backups are taken from the Mirror Segments, you need to check if all Mirror Segments are in a consistent state of being active and synchronized with the Primary Segments:

select * from gp_segment_configuration where preferred_role is 'm' and (status='d' or (status='u' and mode='r'));

Note The preceding SQL command returns zero rows if all mirrors are in a consistent state.

2Force checkpoint in transaction log A checkpoint is a point in the transaction log sequence, where all data files have been updated to reflect the information in the log. All data files will be flushed to disk. By default, there are checkpoints every five minutes. Forcing a checkpoint before the SnapView Clones are consistently fractured means there will be fewer transactions to be replayed at time of recovery. This is the checkpoint SQL command:

checkpoint;

Page 28: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

28

Recovery options

If there is a disaster, such as data loss, there are a few recovery options depending on how you implemented the backups and from how far back in time the database has to be recovered.

• Recover from SnapView Clone

If a disaster has occurred since the last sync, you can recover from the clone by performing a reverse sync in which only changed blocks are updated.

• Recover from SnapView Snapshot

If a disaster has occurred since the last snapshot, you can recover from the snapshot by rolling back changes.

• Recover from backup

If a disaster has occurred before the last sync or snapshot, you need to recover from a backup on EMC Data Domain.

Recovery steps

You must use these steps for recovery from a backup using EMC Data Domain:

1. Fracture SnapView Clones if synchronized with DCA mirrors and masters.

2. Add clones to the proxy server storage group.

3. Scan the buses on the proxy server for new devices.

4. Run PowerPath discovery on the proxy server.

5. Mount clones to the proxy server.

6. Restore clones from a backup using EMC NetWorker.

7. Unmount clones from the proxy server.

8. Stop Greenplum Database.

9. Unmount DCA mirror and master LUNs from the DCA.

10. Reverse sync clones.

11. Mount DCA mirror and master LUNs to the DCA.

12. Determine the current Master Server1.

13. Start the Greenplum Database Master Instance2.

14. Check consistency of the Segment Instances.

15. Reverse the roles of the primary and mirror instances so that the primaries are running off the SAN3.

16. Stop the Greenplum Database Master Instance from running4.

17. Start Greenplum Database in production mode.

18. Check the catalogs5.

Recovery

Page 29: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

29 EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

19. Synchronize the Mirror and the Primary Segments6.

20. Return the Primary and Mirror Segments to their preferred roles7.

1Determine the current Master Server In normal operation, one of the Master Servers is the active server from the database’s perspective. When you recover the database, it is necessary to determine the current Master Server, as it is possible for the Standby Master Server to be the active Master Server at the time of running the backup.

You must run the commands for starting, stopping, and querying the database against the Master Server.

The current active Master Server is found by comparing the dbid and standby_dbid entries in the text file $MASTER_DATA_DIRECTORY/gp_dbid on both the Master Server (mdw) and Standby Master Server (smdw).

Table 7. Determining the current Master Server

If: Then the current Master Server is:

mdw has dbid==1 and standby_dbid does not exist

mdw

smdw has dbid==1 and standby_dbid does not exist

smdw

mdw and smdw have standby_dbid, and mdw has dbid==1

mdw

mdw and smdw have standby_dbid, and smdw has dbid==1

smdw

2Start the Greenplum Database Master Instance To perform maintenance tasks, such as changing the role of the Segment Instances, you must start the Master Instance. To start the Master Instance, use this command:

gpstart –m -a

With only the Master Instance running, only connections to the Master Instance in utility mode are accepted:

PGOPTIONS='-c gp_session_role=utility' psql

3Reverse the Segment Instance roles You can change the roles of the Segment Instances so that Segments on the SAN devices have the role of primaries and the Segments on the local disks have the role of mirrors. You should complete this change because only the data for the Segment Instances, which had the role of mirrors on the SAN devices, is backed up.

The Greenplum Database tracks the roles, mode, and status of the segment instances in the gp_segment_configuration table.

• Mode where:

s = synchronized

Page 30: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

30

c = change logging

r = resynching

• Status where:

u = up

d = down

• Role where:

p = primary

m = mirror

You can alter the mode, status, and roles for each of the Segment Instances with these commands:

update gp_segment_configuration set mode='c',status='u',role='p' where preferred_role='m' and content>-1; update gp_segment_configuration set mode='s',status='d',role='m' where preferred_role='p' and content>-1;

4Stop Greenplum Database Master Instance from running You can run this command to stop the Master Instance from running:

gpstop –m -a

5Check the catalogs The gpcheckcat utility is called by the system when you choose option A, which causes the utility to check all of the catalogs of each of the databases. If this check does not return any issues, along with the consistency check, then the database is deemed okay.

gpcheckcat -A

6Synchronize the Mirror and the Primary Segments Once you verify the database catalogs, the Mirror Segments on the local disks can be fully recovered from the primary Segments on the SAN devices with this command:

gprecoverseg -F -a

7Return the Primary and Mirror Segments to their preferred roles Once the mirrors on the local disks are coordinated with the primaries on the SAN devices, the roles of the segment instances can be returned to their preferred roles of primaries on local disks and mirrors on SAN devices with this command:

gprecoverseg -r -a

Page 31: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

31 EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

Conclusion

In today’s data warehousing environments, where data continues to grow exponentially, operational backup and recovery are expected to occur in real time to further improve operational efficiencies and meet backup and restore window targets. By using existing VNX SAN storage infrastructure investments, Greenplum customers can ensure peak database performance by offloading their database backup operations using EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX, which delivers enterprise-level backup and restore reliability for their EMC Greenplum data analytics environments.

Deploying an EMC DCA SAN Mirror on EMC VNX configuration to protect Greenplum data provides the following benefits:

• How to back up and recover a Greenplum Database system using EMC technologies, such as VNX SAN storage, NetWorker, and Data Domain.

• Where customers can use their existing VNX storage infrastructure and practices to streamline and enhance the backup and recovery capabilities of their Greenplum Database system.

• The use of VNX SnapView to create either a snapshot-or clone-based point-in-time consistent view of a Greenplum Database for backup to a remote disk or tape-based backup target.

• The continuous database checking and validation procedure used to ensure the SAN-based data copy is consistent.

• The online offloading of the DCA Segment Server Mirror Instance data to the VNX SAN, which results in increasing the available capacity of the DCA.

Note DCA SAN Mirror on VNX is professionally installed by EMC. Before implementing SAN Mirror solutions, consult your local EMC Performance Specialists.

Summary

Key findings

Page 32: White Paper: EMC Greenplum Data Computing … Paper . EMC Solutions Group . Abstract . This white paper provides an overview on how you can use EMC ® Greenplum ™ Data Computing

EMC Greenplum Data Computing Appliance SAN Mirror on EMC VNX EMC Greenplum DCA, EMC VNX, EMC SnapView, EMC NetWorker, EMC Data Domain

32

References

For more information, see the following white papers:

• EMC Greenplum Data Computing Appliance: Architecture, Performance, and Functions – A Detailed Review

• Deploying EMC VNX Unified Storage Systems for Data Warehouse Applications

For more information, see the following product documents:

• EMC VNX Family

• EMC Data Domain Product Overview

• EMC NetWorker

• EMC PowerPath Family

• EMC SnapView

• Greenplum Database 4.2 Administrator Guide

White papers

Product documentation