ddn - Teratec · ddn.com Any statements or representations around future events are subject to...

25

Transcript of ddn - Teratec · ddn.com Any statements or representations around future events are subject to...

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

Big Data pour les sciences de la vie:

Accès aux données (performance,

flexibilité, securité et vie privée)

2016/06

DataDirect Networks, Inc.

James Coomer [email protected]

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

Tendance dans le stockage haute performance

Global Distribution

Continued Pressure to

improve economics

Managing Scale and new

Data Paradigms

Managing performance for

diverse workloads Global Data Distribution

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

Global Distribution

Managing performance for

diverse workloads Global Data Distribution

Continued Pressure to

improve economics

Managing Scale and new Data Paradigms

Sciences de la vie & stockage: tendances

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

Global Distribution

Managing performance for

diverse workloads

Continued Pressure to

improve economics

Managing Scale and new Data Paradigms

Global Data

Distribution Applying Security

Sciences de la vie & stockage: tendances

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

Global Distribution

Continued Pressure to

improve economics

Managing Scale and new Data Paradigms

Global Data

Distribution Applying Security

Managing performance for diverse workloads

Sciences de la vie & stockage: tendances

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

Les sciences de la vie représentent un

défi pour le stockage aujourd'hui

Continued Pressure to

improve economics

Managing Scale and new Data Paradigms

Managing performance for diverse workloads Global Data

Distribution Applying Security

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

8 A propos de la Sécurité

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

Un peu de contexte

►La sécurité n'a pas été une priorité majeure des

système de stockage HPC •Conception orientée vers la Performance et la Scalabilité

•Pas de connexion directe à l'Internet et accès physique

sécurisé

►Mais aujourd'hui, le stockage HPC N'EST PLUS

limité à du scratch et aux repertoires des

utilisateurs •Un même cluster sollicité pour plusieurs types d'utilisation

•Le matériel dédié n'est pas satisfaisant en terme

coût/performance

•Les données sécurisé ne doivent être accessibles et

visibles UNIQUEMENT au personnes autorisées

►Authentification par utilisateur/noeud

►Capacité d'audit

►Chiffrement des données

9

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

Isolement des données avec Docker

►La virtualiation dans les

sciences de la vie apporte

des capacités d'isolement

(multi-tenancy)

10

COMPUTE

container

container

container

container

container

container

container

container

container

container

container

container

container

container

container

container

container

container

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

Isolement des données avec Docker

►The Lustre Filesystem

provides good performance, but

breaks the isolation

►all containers would see all

data only being restricted by

POSIX permissions/ACLs

11

LUSTRE

COMPUTE

container

container

container

container

container

container

container

container

container

container

container

container

container

container

container

container

container

container

/

/fast /work /scratch

/jen /eva /bob /eva /bob

/scratch

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

Isolement des données avec Docker

►Utilisation d'un Linux

“Kerberizé” pour

l'authentification des

containers Dockers auprès

du système de fichiers

12

LUSTRE

COMPUTE

KERBEROS

SERVER

container

container

container

container

container

container

container

container

container

container

container

container

container

container

container

container

container

container

/

/fast /work /scratch

/jen /eva /bob /eva /bob

/scratch

1. credentials(A)

3. mount(credA)

2. credA

4. /filesetA

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

Isolement des données avec Docker

►Deux éléments clef

pour isoler les données

:

►Notion de nodemap,

associant des filesets

avec NIDs (Lustre

containers)

►Utilisation de

montage de sous-

répertoires

13

LUSTRE

COMPUTE

KERBEROS

SERVER

container

container

container

container

container

container

container

container

container

container

container

container

container

container

container

container

container

container

/

/fast /work /scratch

/jen /eva /bob /eva /bob

/scratch

1. credentials(A)

3. mount(credA)

2. credA

4. /filesetA

nodemap

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

Performance des clients Lustre dans des

containers 14

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

Les sciences de la vie représentent un

défi pour le stockage aujourd'hui

Comment relever ce défi ?

Managing performance for diverse workloads

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

Performance des E/S dans les sciences de la vie

►Petites E/S

►Petits Fichiers

►mmap() E/S •Souvent suite à l'utilisation de Java

•GPFS peut se révéler sous-optimal dans ces conditions:

"La structure du noyau Linux implique pour GPFS un

volume additionnel d'opérations de synchronisation afin de

maintenir la sémantique de mmap. Ces opérations nuisent

aux performances."

16

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

Exemple d'appel E/S mmap() 17

0

100

200

300

400

500

lustre-1.8.9 lustre-2.5.2 DDN branch

mmap() Read Performance (1MB Block Size)

0

100

200

300

400

500

32K 128K 512K 1024K

mmap() Read Performance (32K to 1 MB Block Size)

Lustre-1.8.9 DDN branch

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

Sanger Institute Benchmarks 18

2.5.1 2.5.23-ddntest15

0

10

20

30

40

50

Human Genetics samtools workflow

Ru

ntim

e (

Ho

urs

)

Lustre 2.5 Client Performance

Samtools

20% plus rapide

grace aux

optimisations

Lustre de DDN

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

Lustre Metadata Performance

Extensibilité des MDSs (Répertoire unique)

0

50000

100000

150000

200000

250000

16 32 64 128 256 512 1024

op

s/s

ec

Number of Threads

Lustre Metadata Scalability (Unique) (32 clients, 1024 mount points)

File Creation

File Removal

Dir Creation

Dir Removal

File Creation(DNE)

File Removal(DNE)

100% by DNE

50% of File

creation

multiple

mount points

Directory Creation

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

Le problème des petites E/S

www.ellexus.com

0

5

10

15

20

25

30

35

0 30 60 90 120 150

Nu

mb

er

read

s

(in

milli

on

s)

pe

r 20 s

ec i

nte

rval

Time (minutes)

1B reads

2B-4KB reads

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

Le deploiement de SSDs sous le PFS n'est pas

une réponse satisfaisante

0

200

400

600

800

1000

1200

1400

1600

1800

2000

1 10 100 1000

Th

rou

gh

pu

t (M

B/s

)

IO Request SIze (KB)

Parallel Filesystem on IME Demo Cluster 14x1U servers with SSDs

(50GB/s available)

Avg IOR Tput (MB/s)

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

PARALLEL

FILESYSTEM

COMPUTE

IME

IME

SERVER IME

SERVER

IME

SERVER

IME

SERVER

DDN | IME Application I/O Workflow

►Insertion d'une strate additionnelle de NVMs

distribués entre les applications et le PFS

►Architecture Client-Server

►Transparent pour les utilisateurs

►Transparent pour les applications

►Ré-organisation des données avant la reversement

des données sur le PFS

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

Structuration en Log dans IME Consider two different application I/O patterns (write)

‘Burst buffer blocks’ (BBB) sont les buffers

générés au niveau du client

Notez que les contenus des BBB peuvent être

alignés ou non.

La même méthode de stockage est utilisée pour

les deux BBB et ce en dépit de la différence

qualitative de contenu

Séquentiel

Non-séquentiel

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

IME vs Systèmes de fichiers parallèles : Performance pour un fichier partagé (SSF)

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.

Any statements or representations around future events are subject to change.

102-0081

東京都千代田区四番町6-2

東急番町ビル 8F

TEL:03-3261-9101

FAX:03-3261-9140

company/datadirect-networks

@ddn_limitless

[email protected]

Thank

You! Keep in touch with us