- be able to figure out what storage they need based on measurements at their - know ... · 2015....

47
When the class is over, the students will … - be able to figure out what storage they need based on measurements at their home institution - know what their filesystem options are and some of the tradeoffs associated with each one Who is this class designed for? - People relatively new to HPC - System administrators who need technical detail and some understanding of what information is needed to make decisions - Decision makers who need information necessary to make decisions and some technical detail 1

Transcript of - be able to figure out what storage they need based on measurements at their - know ... · 2015....

Page 1: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

When the class is over, the students will …- be able to figure out what storage they need based on measurements at their

home institution- know what their filesystem options are and some of the tradeoffs associated with

each oneWho is this class designed for?- People relatively new to HPC - System administrators who need technical detail and some understanding of what

information is needed to make decisions- Decision makers who need information necessary to make decisions and some

technical detail

1

Page 2: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

- Notice that the fundamental problem isn’t: “What filesystem should we deploy”- We don’t really care about the filesystem (or even if we are using a “filesystem”) as

long as it meets the storage needs.

- Users often can’t articulate what they need in terms of space, speed, iops, and i/o patterns (they aren't storage experts)- You have a woefully small budget (of course you do)- There are a range of needs for different problem domains and even within each

domain- - small files and lots of them- - large files - - write once read never- - read lots, seek lots- Expertise in a single storage system is probably insufficient- Expertise in multiple storage systems is difficult to find

2

Page 3: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

3

Page 4: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

4

Page 5: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

5

Page 6: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

6

Page 7: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

Why aren’t there any right answers?- Storage solutions are infrastructure. That implies that they should “just work” and

be invisible.- All storage solutions will be deficient in some way because no one solution can

solve every problem and be invisible all the time.

What constitutes a wrong answer?

Scenario:You buy a single solution that you tested rigorously in a small-scale environment. You

really stressed the system, running lots of benchmarks and determining performance across a range of workloads.Then, you release it to your users and it immediately crashes and won’t stay stable.

You eventually find that one application is being bad and causing the crash. You didn’t test that application. Now, no work can be done because there is only one place for people to store data.This actually happens!

What do?

7

Page 8: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

Scenario:You buy two different solutions, for different purposes. One for long-term storage,

the other for temporary/scratch storage.People don’t use scratch because long-term is good enough, and the users don’t know enough.So, their applications run slow (compared to using scratch), and the money and time spent on scratch was a waste.

Scenario:You buy two different solutions, one for scratch one for long-term storage. People start using scratch but treat it as a long-term, reliable storage system.In essence, they get reliable storage from two places (from the user perspective) even though scratch isn’t backed up.What happens when they delete data accidentally from scratch, or if files are lost through a system failure?

What you should take away from this talk:- You should know what information you need to gather to make your decision(s)- You should know what your options are- You should know how others have approached this same problem

7

Page 9: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

8

Page 10: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

A filesystem is not just a place to store data.Fancy filesystems and management tools can do interesting things with your data (like move it around, compress and decompress, analyze, track, etc.)That is necessary for some things but doesn’t fall under the purview of a traditional filesystem.

Normally, we think about the filesystem as just a bunch of disks. It isn’t.It is actually the entire set of disks, servers, networking, and software that goes into serving and managing data.The data may be persistent long-term, or ephemeral.The requirements of data access will control what storage device you get.

The filesystem isn’t the interesting or useful part.The interesting and useful parts are HOW we use it.The filesystem should be infrastructure – invisible as long as it is working.

9

Page 11: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

Why is this important?- Every group needs different things. If they all work together, things will go much

better for everyone.- - What does that mean – “better”? Happy users are productive. Happy managers

and administrators give money. Happy administrators keep things running well and plan for the future user needs.- - It is a big cycle of feedback, growth, and responsibility

Which kind of users will you have?- Casual?- Heavy?- Important?- Beginner?- Advanced?

Different sets of users may have different clusters.There may be multiple storage deployments and multiple clusters, each with different stakeholders.

10

Page 12: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

How do these stakeholders play a role in the storage solution design and purchase process?Users:- ownership of data- budget- only they can tell you how quickly something needs to be accessed, but hey often

can’t articulate it- What do they need the storage to do?

Management- licensing- maintenance costs- corporate and user relations. corporate partnerships- return on investment. Was the money well-spent? How do we show that?

Admins:- Administration and day-to-day monitoring and fixing- How easy is it to do things (like changing quota, replacing failed parts, upgrading

systems and software)?-What are the interactions between all of these stakeholders?

What is the perspective of each group?What is the give and take from each group?(Answers to both of these questions depends entirely on your culture and environment)

Principal characteristic of each group:- Users = storage is usable- Managers and Leadership = Money and ownership concerns- - University support staff includes purchasing people and planning.- - Can a solution be purchased without running afoul of rules and regulations? - Administrators = Keeping the storage working

10

Page 13: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

You can’t get all of these with a single solution.Which ones are important to you?

- Backups and archive space are different. Backups are there to recover previous state. Archive is uberlong-term storage (10+ years)

- Administrator storage is stuff like node images, log data, configuration info for PXE, etc.

11

Page 14: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

- Speed: The solution with the best benchmark results might not be the best solution given your applications. Maybe the system with the highest benchmark is the slowest for <your most significant application>. What do IOPs mean in the context of speed?

- Space: The solution with the largest amount of space may not be the best solution given your applications. Maybe failures cause the filesystem to be unavailable far too often.

- Administrative burden – How easy is it to provision new storage for a group? Set quotas? Replace disks? Add space and speed?

- - Do you have the space, power and cooling for it? What does the initial installation take?

- Management – How is it monitored? Can you get usage information like performance, hotspots, and usage statistics from it? How easy is it to upgrade and is data safe over an upgrade?

- Reliability and Redundancy – How safe is your data? RAID5? RAID6? Are those safe? Do you need data replication? Metadata replication? What happens if a disk “goes away”? What happens if a storage server goes away?

- Cost – What do you have to sacrifice to get the speed, space, management burden, etc. needed?

12

Page 15: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

- Features – Is quota management necessary? Automated file management? Integration with a backup or archive system?

- Support from vendor – When you have problems, how helpful is the vendor?

12

Page 16: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

Typical temporary storage defining characteristics:- Files left on this system are reaped after some period of time- Typically has smaller capacity than other storage locations (if there are other

locations)- Typically the fastest system- Often designed for the heaviest users (biggest files, biggest jobs, most data)- Reliability not as important?- Backups are not taken- May not be very good at small files

Long-term storage defining characteristics- May be involved in backups- Reliability is very important- Typically not very fast- Typically very large and expandable, but expansion may be in disconnected units

(ala new NFS server)

HPC system storage defining characteristics - Holds administrative data (like logs)

13

Page 17: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

- Software- If used for storing and accessing logs, it should be fast- May store machine images (for VMs or bare-metal node installation), or any data

needed to make nodes run (like PXE or node configuration info from CFEngine, Puppet, Ansible, Rocks, etc.).

Are these three storage solutions separate?Maybe.

13

Page 18: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

- There are many options to choose from. You are not limited to this list.- As long as you know your tradeoffs,- We are focusing on the systems we have experience with- For the others, we don’t talk about them because we have little/no experience

with them

NFS is serial, but some solutions play games to make it parallel-ish. Isilon, NetApp, etc. are examples.We focus on filesystems that are somehow shared. It might be through the VFS, or through http or a custom protocol and API. Why? Non-shared filesystems are not terribly useful for linux clusters.

14

Page 19: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

Having a single server manage things reduces the administrative burden.

15

Page 20: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

Administrative burden and expertise statements are anecdotal only.

16

Page 21: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

Speed is limited by the single serverSpace is essentially unlimited. Another JBOD can be connected to the server through an external SAS cable, ISCSI, ATAoE, SAN, etc.Management and administrative burden is small as long as the storage configuration is reliable.Cost? Only costs what bare hardware does.Features? Backups and archive? Lots of products work fine with NFS. Or, you can use the underlying filesystem to manage snapshots, backups, and archive.- policies for managing things? Requires other software.

Reliability/Redundancy? True redundancy is harder.Purpose? – Works best with low-volume, low-throughput kinds of stuff.If the data can be cached well, NFS is good.- Serving software (like openmpi) from NFS works well in many places

Support? Who needs support for NFS? (You don’t need support for NFS, but you just might for the 10G network connection that isn’t getting full bandwidth, or the RAID card in the server, or the firmware for the JBOD, etc.)

17

Page 22: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

Speed? – scales well as long as you keep adding disks and servers. Adding servers may require additional licenses.Can have a mix of SSDs and spinning disk, but this requires planning to take best advantage of the capabilities of the SSDs and spinning disks.Space? – unlimited at no increased cost other than hardwareAdministrative burden?- day-to-day- upgrades- installation and deployment- vendor support – buying GPFS licenses entitles you to support

- - gpfs appliances can be purchased- - make the distinction between technology and actual hardware appliance

deployment- cost – can be significant for the software

A GPFS cluster is the set of all servers and clients that are members of the management domain.A client can’t simply mount a GPFS filesystem. The client must join the GPFS cluster in

18

Page 23: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

order to get the necessary configuration and authorization before it can mount any filesystems.

Cost? GPFS is a licensed product. It is available in per-client and per-NSD-server licenses.Features? - Integrates with tivoli for backups and archive. It includes a built-in policy engine

that manages file placement and migration.- Speaks native Infiniband

Purpose? GPFS can be configured to meet many purposes like database hosting, geo-replicated systems, scratch space, etc.Reliability/Redundancy? Both data and metadata can be replicated for redundancy, or the underlying disk structure can be used for redundancy. If a single GPFS server disappears, as long as a quorum of servers is still available, and all necessary disks (aka NSDs) are available, GPFS will continue to operate.

18

Page 24: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

OSSs can be active/active redundant.MDSs are active/passive redundant per filesystem.

MDSs *only* handle permissions and path lookups.

Tradeoffs- Speed – Unlimited as OSSs and OSTs are added- Space – unlimited as OSSs and OSTs are added- Cost – Can be had for the cost of the (commodity) hardware- Administrative burden – See support.- Support – Can be bought, but since Lustre is free software, it is not required.- - There are lustre appliances that can be had

- - who can you buy support from intel, seagate (Xyratek), terascala (appliances only?)- features- - Speaks native Infiniband and Ethernet- - Snapshots? Only with LVM- Anecdotal evidence says that kernel patches are required on clients, MDSs, and

OSSs in order to use Lustre effectively- Reliability and redundancy are handled at the hardware layer and by other

19

Page 25: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

software, not by Lustre

19

Page 26: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

Speed – Unlimited as storage and director blades are addedSpace – Unlimited as storage blades are addedCost – No licensing costs, but the hardware and software are Panasas Features?- snapshots-

Reliability/redundancy – Failure of any device will not cause the filesystem to failManagement burden – it is a black box. Very little management or tuning required or allowed.

20

Page 27: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

Do the tools increase or decrease complexity?- Depends on the tool- Depends on how many tools you have to operate

21

Page 28: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

Which tradeoffs do we address with each one?

Advantages:Speed? Lots of itCost? Small since most investment is in LustreSpace? Lots of Lustre and lots of tape

Disadvantages:Management burden? All eggs in one Lustre-shaped basketFeatures? Can backups or archiving be automatically done?What about policies? How are files in the scratch space reaped?

Purpose?- NFS for job scripts and hand-editing of files- Lustre for actual computation and storage- tape for archival and backup purposes

22

Page 29: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

Advantages:- Expertise in only a single technology is needed

Disadvantages- If in a single system, failure stops everything- If in multiple systems, max performance drops

Speed: Maximum possibleSpace: Maximum possibleReliability and redundancy: Depends on how it was built.

23

Page 30: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

A single Panasas realm is mounted and used on every compute and login node.Long-term storage is purchased by individual research groups in the form of individual storage servers. Each server provides from 10-200TiB of usable space, depending on what the research group has purchased.Home space is split between four servers for all 1000+ users. Home space is limited to 5GiB per user.

Advantages: - Data is easily accessible

Disadvantages:- Users can misuse storage through naiveté or malice.

NFS makes it easy to isolate problem users and servers.

Which tradeoffs do we address with each one?NFS for long-term- cost is small

24

Page 31: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

- speed is small- features - easy to backup- reliable in aggregate (most things are up most of the time)- management burden is small (replacing drives)- Good aggregate performance

Panasas - speed was good when first deployed, but hasn’t been expanded since the initial

purchase- management burden is small. Failed drives are blades send notifications and are

easy to fix.

24

Page 32: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

VMs typically don’t require much storage – just enough to boot from. But they do need speed for random reading and writing.Possible solutions:NFS, iSCSI import, ATAoE import, DAS

Storage devices:SAS, SATA, SSD

Database hosting random access to data is really important, and speed is paramount.Possible solutions: same as aboveStorage devices:RAM and SSD

25

Page 33: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

What do the stakeholders care about? Identify who they are, what they want, and what they need.Do you need quotas? Snapshots? Backup integration? Migration? Ingest and outgest?

26

Page 34: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

HIPAA and ITAR- These are very complex things. - You need to know if you have this kind of data!- You can go to jail, your organization can be sanctioned and lose all funding,

university administrators can go to jail, etc.- Your organization may have a compliance office to help deal with these kinds of data

ITAR – International Traffic in Arms Regulations- Controls the export and import of defense-related articles. - Designed to safeguard national security- May apply to data and applications your researchers work with

HIPAA and PHI- PHI is any information that can be linked to a specific individual- Designed to protect individuals privacy and anonymity in research trials

27

Page 35: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

Getting the data we need about performance is hard.

28

Page 36: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

Tradeoffs:- Network speed- Firewall speed- Where is the data coming from?- Science DMZ- - How do you expose the filesystem in the DMZ?- - Seems dangerous

What kind of data is it?- large files?- lots of small files?- tarball?

29

Page 37: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

30

Page 38: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

PCIe SSD Examples- Intel 910- Fusion I/O (now in SanDisk)- Virident FlashMax

SATA/SAS SSD Examples- Intel- Samsung- OCZ- SanDisk- Seagate- Western Digital- and many more

SAS- 15k RPM- 10k RPM- 7.2k RPM

31

Page 39: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

SATA- 10k RPM- 7.2k RPM- 5.4k RPM

31

Page 40: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

Why is it important to make this distinction between benchmarks and actual performance?Why do we care? (Because your stakeholders care!)

There are sequential IOP and random IOP patterns.What is the difference?With sequential IOP patterns, performance is usually measured in MB/s or GB/sWith random IOP patterns, performance is usually given in IOPS since it is a measure of how many unrelated requests can be handled per second.

How do those things affect performance?Spinning disks are made for sequential IOP patterns, and normally top out at less than 500 IOPS.SATA/SAS SSDs are made for random IOP patterns, and can get >10,000 IOPS.PCIe SSDs can get > 100,000 IOPs

32

Page 41: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

Storage segregation is another tradeoff.You can isolate a group of users to reduce the impact on others, but that may reduce maximum performance.

Support vs. expertise is another tradeoff.If you have little expertise, you may need to pay for support until the expertise is built.

33

Page 42: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

34

Page 43: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

35

Page 44: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

36

Page 45: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

What are these good at? (Perhaps they aren’t good for traditional HPC…)A few of the features of some of these are:- geo-replication- federation- self-healing- RESTful API interface- self-management- lots o’ other features

The other solution we talked about have a different focus (largely on performance).These softwares and appliances aren’t necessarily focused on “performance first”.

37

Page 46: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

38

Page 47: - be able to figure out what storage they need based on measurements at their - know ... · 2015. 1. 23. · Scenario: You buy two different solutions, for different purposes. One

39