© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
Client Technical SpecialistIBM System Storage
TS7600 ProtecTIERVirtual Tape
De-duplication
Revisão: 07/Janeiro/2013
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only2
Agenda
Introdução
IBM ProtecTIER TS7610/TS7620►Configuração VTL
►Configuração OST
►Configuração FSI/CIFS
Algoritmos de De-duplication
Família ProtecTIER
Replicação de backups e Disaster Recover
Identificação de oportunidades
Sizing do equipamento
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only3
Introdução
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only4
Modelo de backup/restoreOpções de implementação
Backup Server
Metadata Server
VTL
TapeDrive
TapeDrive
Backup Server
Primary Storage
Application Servers Backup Servers
Secondary Storage
Tape Library
NAS
LAN Attached
SAN Attached
DAS
NAS
Disk
LAN Attached
Disk
Foco do modulo de treinamento
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only5
Modelos de soluções de virtualização de fita
www.redbooks.ibm.comIBM System Storage TS7650 GW and TS7620 redbook sg247652
SAN
Library Manager
TS3100TS3200TS3500Other
C2C1
V7000DS3500DS5000Other
Storagereposito
ry
SoftwareEmulator
TS 7620TS 7650TS 7720 – mainframe
VTL Gateway
Appl
Backup/Restore Client
TSMBackup/Restore Server
Slot1Slot2
Slot...n
drivedrive
drive
Storagerepository
Tape Library
Disk System
TS7680TS7650
Storagerepository
SoftwareEmulator
TS7740 - mainframe
SoftwareEmulator
VTL
VTL / VTS
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only6
Por que virtualizar o processo de backup/restore ?1. Melhorar o processo de restore
a. Melhor RTO (Recovery Time Objective)
b. O backup reside em disco
c. O modelo implementado é disk-to-disk
2. Melhorar o RPO (recovery point in time)
a. Backups incrementais mais frequentes
b. Utilização de disco virtualizado para cópias
3. Melhorar o processo de backup
a) Processos paralelos
b) Vários backups/restores possiveis simultaneamente
4. Otimizar infraestrutura de rede para backup remoto
a. Melhorar o RTO e RPO na recuperação
Real Tape Drives
Virtual Tape Drives
Tempo
Janela
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only7
IBM ProtecTIER - TS7620 - VTL
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
O que é ProtecTIER ?
É um servidor utilizado para Backup e Restauração de dados
Apresenta-se para os servidores de backup em uma de tres opções:1. Tape Library Virtual (VTL)
robot, cartuchos e unidades de fita2. Entrega disk drives lógicos
Symantec Open Storage Tecnology (OST) Integração com Netbackup
3. Entrega file system shares File System Interface (FSI) Suporta protocolo CIFS Usado para backup/restore usando uma aplicação Exporta shares na rede IP
Utiliza um repositório em disco para armazenar os dados de backup
8
VTL OSTFSI
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
O que é ProtecTIER ?
Configurável em 2 opções Appliance: contem o servidor e o repositório Gateway: o servidor acessa o repositório na SAN
O servidor ProtecTIER é baseado em System x
ProtecTIER é o software que roda em Linux
O espaço do repositório é otimizado► Algoritmo de Des-duplicação de dados► Compressão de dados
Algoritmo referenciado por HyperFactor
Replicação remota de dados via TCP/IP
9
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
Família ProtecTIER
TS7650G Gateway► 3958-DD5► Repositório até 1.0 PB (útil)
TS7620 Appliance Express► 3959-SM2► Repositório (cap. útil):
● 5.4 TB ou● 11.0 TB
10
VTLOST
FSITS7650
Appliance
TS7610ApplianceExpress
TS7650Gateway
TS7620ApplianceExpress
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
Repository
ProtecTIER Virtual Tape Library
O software de backup enxerga que o dado está sendo gravado em cartuchos
ProtecTIER armazena e restaura o dado diretamente em disco
O dado no repositório é des-duplicado
11
ProtecTIERApplication
5.5 or 11 TB physical useable capacity
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
TS7620 ProtecTIER Deduplication Appliance Express
Same enterprise-proven ProtecTIER technology
VTL & CIFS Performance. Up to:– 145MB/s backup – 190MB/s restore
OST Performance. Up to:– 130MB/s backup– 170MB/s restore
Two configurations: 5.4 TB & 11 TB– Useable capacity, not RAW– Field upgradeable by customer
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
TS7620 Appliance Hardware Building Block Integrated Server, Storage and ProtecTIER Deduplication software
– 3U Enclosure fits in standard 19” rack– Storage: Twelve 2 TB NL SAS Drives, RAID 6 – Server: 6-core Intel Xeon E5645 Westmere 2.4 GHz processor– Memory – 48GB RAM
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
TS7620 deployment Shipped preconfigured as:
► TS3500 virtual library.► 16 LTO3 virtual drives, balanced evenly across both FC ports.► 200 GB cartridge size.► 16 virtual import export slots.► Small model (5.4 TiB)
● 400 virtual slots & 400 virtual cartridges.► Medium model(11 TiB)
● 540 virtual slots & 540 virtual cartridges.► Configuration can be modified by customer
Application Interface Support► VTL
● 2 x 8GB FC ports for host connectivity● 2 x 1Gb Copper ProtecTIER Native replication● 2 x 1Gb Copper Ethernet for customer network
► OST, CIFS● 2 x 1Gb Ethernet ports for host connectivity● 2 x 1Gb Copper Ethernet for ProtecTIER Native replication● 2 x 1Gb Copper Ethernet for customer network
15
LAN
SAN
Backup server
Backup clients
TS7620ProtecTIERVTL
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only16
Configuração OST
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
Rede TCP/IP
17
ProtectTIER e Open Storage Technology (OST)
• OST API integra ProtecTIER com Symantec NetBackup
• Habilita backup em disco sem emulação de Tape Library
• OST API plug-in é instalado no servidor NetBackup media
server
• OST API separa a lógica do backup da lógica do ProtecTIER
• Suporta a transferência de dados e controle entre os
servidores ProtecTIER e de backup.
ProtecTIER Server
NetBackup Policy and Control
NetBackup Server
ProtecTIER Server
ProtecTIER OST Plugin
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
Arquitetura OST
Catalog
NetBackup Media
Servers
NetBackup Master Server
Remote Access (Configuration & Mgmt)
OpenStorage API
Shared Disk
Plug-in
Data Movement (bptm/bpdm)
Resource Manager
Disk Service
Configuration Database
(EMM)
Clients
Advanced Disk
Plug-in
Basic Disk Plug-
in
Plug-ins
Backup
ProtecTIER
OST Interface
Restore
ProtecTIER Gateway or Appliance
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
Operação do OST
Media Server
Catalog
Replication
Update Catalog
Copy 1
• Choose which images to duplicate. Enables filtering for SLAs plus space and bandwidth utilization.
• Apply different retention periods. The second image is an independent copy.
Copy 1 2 weeks
3 monthsCopy 2
Benefits– Reduces workload on the media server
– Catalog-awareness of off-site images
– Faster and more flexible operations
Backup
ProtecTIER System ProtecTIER System
Copy 2
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
Principais benefícios do OST
Define um novo disco chamado Logical Storage Unit (LSUs) que pode ser duplicado, movido ou compartilhado dentre vários servidores NetBackup media servers
Backup pode ser replicado para um site remoto ou copiado para fita com total controle do NetBackup
Suporta a solução Machine-to-Machine (máx 12 nodes) e replicação em cascata com integração do NetBackup
Recuperação total ou parcial de imagens de backup replicadas usando uma interface de usuário do NetBackup
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only21
Configuração FSI/CIFS
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
ProtecTIER File System Interface (FSI) Apresenta o ProtecTIER como sendo um NAS
backup
Usado para o backup e restauração de dados via aplicação de backup► CommVault► Tivoli Storage Manager► EMC NetWorker► Symantec NetBackup► Symantec BackupExec
ProtecTIER FSI não é para ser usado como um servidor NAS
Utiliza o HyperFactor para desduplicar dados
Espelha backups via TCP/IP reduzindo a banda dos links
22
ProtecTIER Server
[Emulation Mapping]
ProtecTIER Native Interface
SMB/CIFSIP Network
IP Network
ProtecTIER emula um Windows file system e apresenta para os CIFS clients a hierarquia de :► File Systems► Diretórios ► Arquivos behavior and presents a virtualized
Diferentes File Systems podem residir dentro do repositório do ProtecTIER
Samba/CIFS é usado internamente
Samba VFS (virtual filesystem) é mapedo para o sistema nativo do ProtecTIER file system interface (FSI)
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
Autenticação em domínio Windows
Necessita pertencer a um dos domínios Windows
Active directory Contém o sistema ProtecTIER Autenticação é feita no servidor AD
usando o método Kerberos
23
Workgroups Usuário é definido dentro do sistema ProtecTIER ProtecTIER é o servidor de autenticação
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
Visão de File System e Share no ProtecTIER
Exemplo do TSM Server: definição do IP address do ProtecTIER e do Share
DEFine DEVclass PT1 DEVType=FILE MOUNTLimit=32 MAXCAPacity=16G DIRectory=\\10.200.40.1\sharename1► Format is \\FSI_IP\CIFS_name)
24
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only25
Algoritmos de de-duplication
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
O que é data deduplication ?
Chunking: dividir o dado em unidades para encontrar duplicidades.
Unidade: um bloco, um arquivo
Repositório: contém chunks únicos
Métodos de chunking:
File based: o arquivo é o chunk (dedup usado em file-systems)
Block based: o objeto é quebrado em blocos (dedup usado em disco)
Format/Content aware: Exemplo: PowerPoint (os slides são as unidades)
Data object / Data Stream
ChunkChunkChunkChunk
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
O que é data deduplication ? - Chunks
Processamento:
Chunks são identificados e processados.
Calcula-se um valor (número hash, assinatura digital, fingerprint) associado ao seu conteúdo.
Métodos de cálculo:
Hashing
Comparação binária
Data object / Data Stream
FEEADCBA
FEDCBA
Repositório
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only28
Algoritmos Hashing – Códigos MD5 e SHA-1
MD5: 16-byte long hash
– # echo “The Quick Brown Fox Jumps Over the Lazy Dog” | md5sum 9d56076597de1aeb532727f7f681bcb0
– # echo “The Quick Brown Fox Dumps Over the Lazy Dog” | md5sum5800fccb352352308b02d442170b039d
SHA-1: 20-byte long hash
– # echo “The Quick Brown Fox Jumps Over the Lazy Dog” | sha1sumF68f38ee07e310fd263c9c491273d81963fbff35
– # echo “The Quick Brown Fox Dumps Over the Lazy Dog” | sha1sumd4e6aa9ab83076e8b8a21930cc1fb8b5e5ba2335
Mudou a letra, o hash value é diferente
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
Tabela mantida com valores hashing e localidade do dado
Valor hashing existe ? Sim, descarta o dado
À medida que mais dados são gravados:► A tabela cresce em tamanho► Maior o tempo de pesquisa na tabela► Tempo do backup afetado
Exemplo do tamanho de uma tabela ► Algoritmo SHA-1 tem hash.value = 20bytes► Tamanho do repositório: 50 TB► Tamanho do chunk: 16KB
29
Algoritmo HashingHash Value
Pointer
Hash already exists?
Data isduplicated
Store the new data
Update the hash index
Yes No
50.000.000.000 KB --------------------------- = 3.125.000.000 16 KB entradas
3.125.000.000 x 20bytes =~ 63GB
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
Existe a probabilidade de 2 chunks com diferentes bytes, gerar o mesmo valor hash, causando colisão.
O algoritmo hashing descarta o dado ocorrendo perda da informação
30
Algoritmo Hashing – O problema da colisão
Referência: http://preshing.com/20110504/hash-collision-probabilities
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only31
Repository
Backup Servers
ProtecTIER™Server
HyperFactor™
New Data Stream
“Filtered” data
MemoryResident Index
Only 4GB needed to map 1PB of physical disk!
Conceito básico do algoritmo do ProtecTIER
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
ProtecTIER data deduplication is performed inline
HyperFactor Algorithm Overview
1. New data stream is sent to ProtecTIER server.► Received and analyzed by HyperFactor.
2. For each data element, HyperFactor searches the Memory Resident Index to locate the data in the repository that is most similar to the data element.
3. The similar data from the repository is read.
4. A binary differential between the new data element and the data from the repository is performed
► Resulting in the delta difference.
5. The delta is written to the disk repository after being compressed (LZH).
6. The Memory Resident Index is updated with the location of the new data that has been added.
32
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only33
Post-processing versus Inline-processing
• Backups run first
• De-dup algorithm runs thereafter
• Requires extra disk space to hold the interim full-sized copy of the backup
• Used when the de-dup algorithm is not fast enough to run inline
Inline De-Duplication (eg HyperFactor)Post Processing Deduplication
• De-dup runs as part of backup process
• Uses less disk
• Once save is done, the entire process is done
• Only possible with a fast de-dup algorithm like ProtecTIER HyperFactor
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only34
Replicação do backup e Disaster Recover
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only3535
ProtecTIER Native Replication w/ TS7620
Physical capacity
ProtecTIER GatewayBackup
Server
Central / DR Site
IP based NR links
Tape library
Virtual cartridges can be cloned to tape by the
Main-Site B/U server
Up to 12 branch offices (spokes) supported per target TS7650 (hub)TS7620 supported as Hub with limit of 4 spokes
Spoke
SpokeSpokeSpoke
Hub
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only3636
TS7620 as Spoke
TS7620 is best suited in a replication topology as a spoke, replicating to a TS7650 Appliance or Gateway hub
Physical bandwidth is the primary concern for TS7620 spoke – ProtecTIER native replication only replicates unique data so the amount of bandwidth necessary will depend on the achieved deduplication rate.
Bandwidth Sizing Example:► TS7620 spoke backing up 500GB of data daily► Average dedupe ratio measured or estimated @ ~10:1► Every daily backup (all 500GB) must be replicated to data center TS7650G
Hub► 12 hour replication window► 500 GB / 10:1 / 12 hr = 4.1 GB/hr 1.2 MB/s physical bandwidth pipe
required
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only3737
TS7620 as Hub If deployed as a hub, the maximum number of spokes is limited to 4
► as opposed to 12 spokes for a TS7650 hub
Capacity planning for TS7620 as Hub:► The TS7620 hub daily nominal workload (i.e. all spokes pre-dedupe replication
workload plus local backup) should not exceed 500 GB► Data is only deduplicated at the spoke/sources. It is not deduped again at the hub
against other spokes, so even if multiple spoke back up very similar data, the data will appear different at the hub
► Bottom line: For planning purposes, always treat replicated data at the hub as part of the 500gb/daily-backup overall limit
► Example of ‘maximum’ use case: ● TS7620 used as hub + 4 spokes● Hub performs 100GB of daily backups● (500-100)/4 = 100 On average, each of the 4 spokes can replicates up to 100GB
daily (although they may backup more than is replicated) Hub Performance Implications: Performance will not be an issue if Capacity
guideline is maintained
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only38
Sizing do ProtecTIER
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only39
Qualificação da oportunidade para TS7620
A solução é adequada para o cliente?
As características de capacidade e performance atendem os requerimentos do cliente?
O tipo do dado aproveita o algoritmo Hyperfactor (dedup) ?► É ruím : dado criptografado, comprimido, etc
A interoperabilidade da TS7620 no ambiente é homologada?
Sugestão no datasheet ..and is ideal for:
Customers experiencing significant data growth
Weekly full backups of 3 TBs or less
Daily incremental backups of 1 TB or less
Customers looking to make backup and recovery improvements without making radical changes
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
Capacity and Performance Requirements for TS76205.5TB Configuration
Capacity: The recommended backup workload for the is 500 GB or less per day► Daily backups exceeding 500 GB may be suitable with relatively low data change rate,
but because data change rate cannot be accurately gauged (without measurement), it’s recommended to assume 15-20% by default
► The following table illustrates how physical space consumption is derived from the three general factors: backup size, retention, and data change rate
Performance: Customer workload cannot exceed TS7620 performance capability -of 145 MB/s for VTL or 130 MB/s for FSI-CIFS► Both backup and restore throughput requirements should to taken into consideration
Daily Backup Retention Change Rate
Required Physical Space Dedupe Ratio
300 GB 30 10% 1.17 TB (300 + 300*29*0.1) 7.6 : 1
300 GB 30 20% 2.04 TB 4.5
300 GB 60 20% 3.84 TB 4.6
300 GB 90 10% 3 TB 9.0
500 GB 30 10% 1.95 TB 7.7
500 GB 30 20% 3.4 TB 4.4
500 GB 60 10% 3.45 TB 8.7
500 GB 60 15% 4.93 TB 6.1
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
Capacity and Performance Requirements for TS7620 11TB Configuration
Capacity: The recommended backup workload for the is 1TB or less per day► Daily backups exceeding 1TB may be suitable with relatively low data change rate, but
because data change rate cannot be accurately gauged (without measurement), it’s recommended to assume 15-20% by default
► The following table illustrates how physical space consumption is derived from the three general factors: backup size, retention, and data change rate
Performance: Customer workload cannot exceed TS7620 performance capability of 145 MB/s► Both backup and restore throughput requirements should to taken into consideration
Daily Backup Retention Change Rate
Required Physical Space
Dedupe Ratio
600 GB 30 10% 2.34 TB 7.6 : 1
600 GB 30 20% 4.08 TB 4.5
600 GB 60 20% 7.68 TB 4.6
600 GB 90 10% 6 TB 9.0
1TB 30 10% 3.9 TB 7.7
1TB 30 20% 6.8 TB 4.4
1TB 60 10% 6.9TB 8.7
1TB 60 15% 8.6 TB 6.1
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
10TB Full Backup #1 10TB
“New” Data
5TB Compressed1TB
Incremental
~300GB “New” Data
~150GB Compressed
10TB Full Backup #2
1TB “New” Data
Workload HyperFactorProcessing
Disk Usage After LZH Compression
5.15TB Accumulated
11TB“Nominal”
~150GB Compressed
~150GB Compressed
~150GB Compressed
~150GB Compressed
5.3TB Accumulated
12TB“Nominal”
5.45TB Accumulated
13TB“Nominal”
5.6TB Accumulated
14TB“Nominal”
5.75TB Accumulated
15TB“Nominal”
500GB Compressed
6.25TB Accumulated
25TB“Nominal” HyperFactor Ratio
4:1
Calculo do fator de de-duplication
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
Portal de suporte no Partnerworld
43
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
Architecting a Solution – Capacity Planner
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only45
Deduplication results will vary based on backup data change rate
Certain data types are prone to higher change rates due to their internal makeup and format
Good Candidates Databases uncompressed, unencrypted. Operating System and Application software
packages. Text files, Log files (usually dedupe very well). Email (PST, DBX, Domino DB, and similar
files). Snapshots (Filer Snaps, VMWare Images,
BCVs).
Problematic Candidates
Images, Video (JPEG, GIF, TIF, MPEG, others). Unless Redundant.
Compressed and Encrypted Files (unless redundant).
Rendering or seismic data.
CAD/CAM (depending on the type).
Sugestão para bons/maus candidatos
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only4646
Technical and Delivery Assessment Product Checklist
Official pre-sale qualification document for the TS7620
A Checklist that outlines the set of intuitive qualifiers of a TS7620 order► No need for deep technical expertise to evaluate► No review conference-call required as in TS7650G TDA’s
Checklist will be posted at the following location by GA:► BP’s: http://partners.boulder.ibm.com/src/assur30i.nsf/WebIndex/SA933
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only
Leve para casa
ProtecTIER de-dup elimina dados redundantes durante os backups► Um dos modos eficazes de gerenciar o crescimento exponencial de dados
De-dup armazena mais dados de backup com menos disco► Uma tecnologia de eficiência em disco
ProtecTIER realiza backups rápidos e principalmente restores mais rápidos
Reduz a banda necessária para replicação de dados via IP entre localidades remotas.
Hyperfactor é o algoritmo de de-duplication patenteado e garante 100% do dado integro► Diferente do algoritmo hashing(colisão (perda do dado)).
Simplifica a implementação de soluções de Disaster Recover
Emula uma Tape Library, comandos de robótica, unidades de fita LTO, cartuchos e slots virtuais.
Dois modelos: Gateway e Appliance. Ambos com conexões SAN via Fiber Channel
47
IBM System Storage
© 2013 IBM CorporationThis document is for IBM and IBM Business Partner use only48
Fim
Top Related