Small File File Systems USC Jim Pepin. Level Setting Small files are ‘normal’ for lots of...

10
Small File File Systems USC Jim Pepin

Transcript of Small File File Systems USC Jim Pepin. Level Setting Small files are ‘normal’ for lots of...

Page 1: Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this.

Small FileFile Systems

USCJim Pepin

Page 2: Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this.

Level Setting

Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this way)

Comes from ‘pc’/desktop world These users have discovered ‘hpc’ but don’t want to change programs (not even MPI)

Find ways to help (best is ‘rewrite’ but that is not reasonable to expect)

Small files are deadly to most file systems Some more than others Impact of ‘custer’ systems

Page 3: Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this.

Level Setting

Disks Sata

Not fast. Reliability issues Cheap

Fast disk (15k etc) Not cheap Fast

People are looking at ‘cheap’ Drives better backup/maintainability solutions

Distributed doesn’t mean ‘faster’ Virtualization can be your enemy (in some ways)

Page 4: Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this.

Basics

1800 node cluster Presents special problems

Myrinet ‘interconnect’ Ethernet (gb) data plane Fiber channel disk/tape data plane (2Gb/s) 256+ disk/tape devices

15+ file servers 250+ TB disk Tape Backup

DR site

Page 5: Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this.

Basics

QFS base FS Archiving and distributed access Sun thing

Local parallel FS on nodes NFS

Issues around it “Condo” disk versus Condo nodes

Page 6: Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this.

Basics

Three types of File systems Parallel FS on compute nodes (temp)

Exception on ‘condo’ nodes Small files

More directory transactions Small frames win No stripes

Large files More data transactions Jumbo frames win Stripes win

Tuning is stripe factors and blocksizes

Page 7: Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this.

Small Files

Examples Genomics Group

10ks of files in single directory Natural Language Group

50-250k files in directory Many nodes accessing same stuff

Dictionaries

Backups are ‘slower’ / ‘harder’ Reasons Updating directory data Blocking of data on tape

Page 8: Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this.

Small Files

Ways to help “Faster” disk (helps metadata/directory space)

Distributed file access (qfs) Metadata still a ‘block’.

Read/write locks Updating for distributed access Next version scales better (lock improvements)

No free lunch Special Purpose File Systems and/or local space on cluster nodes (replication)

Page 9: Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this.

Next generation

Why change needed NFS doesn’t cut it

Why

GPFS Helps some

10Gb hosts on ‘data plane’ Next month

Ram disk for ‘metadata’?

Page 10: Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this.

Next generation

Storage management solutions SRB and friends Database based solutions Lustre possible Object storage Performance for small files/objects is question in my mind

All these have potential but… Back to don’t change code “Virtualization” conundrum

How to build massively parallel data spaces HPCS/other projects