8/6/2019 (Redhat) Linux Important Stuff (34)
1/50
8/6/2019 (Redhat) Linux Important Stuff (34)
2/50
Picking the Right File & Storage
System for your Application
Ric Wheeler
Architect & Manager, Red HatJune 23, 2010
8/6/2019 (Redhat) Linux Important Stuff (34)
3/50
Overview
Introduction Local File Systems
Networked File Systems
Shared Disk File Systems
Storage Overview
New RHEL6 FS Features
Performance Results
Futures
8/6/2019 (Redhat) Linux Important Stuff (34)
4/50
File System Types
Local file systems ext3, ext4, xfs and btrfs
Network file systems
NFS and CIFS
Shared disk file systems
GFS2
Cloud file systems
8/6/2019 (Redhat) Linux Important Stuff (34)
5/50
Which File System is Best?
It always depends on your specific application andcircumstances
Budget?
Performance requirements?
Capacity needs?
Availability?
Robustness in the face of power outages & crashes?
IO Workload generated by your application? Different answers for every combination of answers!
8/6/2019 (Redhat) Linux Important Stuff (34)
6/50
Data Integrity over System Crash
Systems can fail for multiple reasons Power outage, hardware fault, software failure
Modern file systems use a journal mechanism tomaintain consistent state
Similar to a database transaction
Correctness tied to order that data makes it to safestorage
barrier support manages volatile storage device writecache
8/6/2019 (Redhat) Linux Important Stuff (34)
7/50
Alignment on Storage
Most storage has a preferred IO size and alignment Simple disks have a 512 byte IO size and alignment
need
New drives move to 4096 byte IO and alignment
Historic default to sector 63
Does not work for some storage at all
Can be a big performance hit for some sophisticatedstorage devices
8/6/2019 (Redhat) Linux Important Stuff (34)
8/50
Discard Support
File systems now issue discard hints to block layer Informs storage of unused ranges of blocks
Allows storage to keep an accurate picture of what isutilized
SSD devices see this as a TRIM command
Used for wear leveling, pre-erase, etc
SCSI devices see this as an UNMAP command
Used for thinly provisioned LUNs
8/6/2019 (Redhat) Linux Important Stuff (34)
9/50
Overview
Introduction Local File Systems
Networked File Systems
Shared Disk File Systems
Storage Overview
New RHEL6 FS Features
Performance Results
Futures
8/6/2019 (Redhat) Linux Important Stuff (34)
10/50
EXT3 Pros & Cons
10
ext3 is the most common file system in Linux
Most distributions have used it as their default
Applications tuned to its specific behaviors
Familiar to most system administrators
ext3 challenges
File system repair (fsck) time can be extremely long
Limited scalability - maximum file system size of 16TB
Can be significantly slower than other local file systems
8/6/2019 (Redhat) Linux Important Stuff (34)
11/50
EXT4 Pros & Cons
11
Ext4 has many compelling new features
Extent based allocation
Faster fsck time (up to 10x over ext3)
Delayed allocation
Higher bandwidth
Should be relatively familiar for existing ext3 users
Ext4 challenges
Large device support not finished in its user space tools
Limits supported maximum file system size to 16TB
Has different behavior over system failure
8/6/2019 (Redhat) Linux Important Stuff (34)
12/50
XFS Pros and Cons
XFS is very robust and scalable Very good performance for large storage
configurations and large servers
Many years of use on large (> 16TB) storage
Red Hat tests & supports up to 100TB
XFS challenges
Not as well known by many customers and field
support people Performance issues with meta-data intensive (small
file creation) workloads
12
8/6/2019 (Redhat) Linux Important Stuff (34)
13/50
BTRFS
Btrfs is the newest local file system Has its own internal RAID and snapshot support
Does full data integrity checks for metadata anduser data
Can dynamically grow and shrink
Supported in RHEL6 as a tech preview item
Developers very interested in feedback and testing
Not meant for production use!
8/6/2019 (Redhat) Linux Important Stuff (34)
14/50
ext3 is our default file system for RHEL5
ext4 is supported as a tech preview in (5.4)
xfs offered as a layered product (5.5+)
RHEL5 Local File Systems
14
8/6/2019 (Redhat) Linux Important Stuff (34)
15/50
RHEL6 Local FS Summary
FS write barrier enabled for ext3, ext4, gfs2 and xfs
FS tools warn about unaligned partitions
parted/anaconda responsible for alignment
Size Limitations XFS for any single node & GFS2 for clusters up to
100TB
Ext3 & ext4 supported < 16TB
8/6/2019 (Redhat) Linux Important Stuff (34)
16/50
Overview
Introduction
Local File Systems
Networked File Systems
Shared Disk File Systems
Storage Overview
New RHEL6 FS Features
Performance Results
Futures
8/6/2019 (Redhat) Linux Important Stuff (34)
17/50
NFS Overview
Supported by a huge range of hardware NFS servers range from consumer devices up to high
end NAS arrays
Performance varies with network & hardware
Scales up to very large file systems Popular uses
Users' home directories
Read-mostly workloads in scale out configurations ofdozens of nodes
See Steve Dickson's talk on NFS for details
8/6/2019 (Redhat) Linux Important Stuff (34)
18/50
NFS Limitations
Traditional NFS servers can be a bottleneck Parallel NFS (pNFS) is a new standard that allows
direct client to data connections
Object, block and file versions
Does not provide SMP-like coherency for clients Client A needs to wait to see data written by client B
Similar issue with newly created files in a directory
NFS V4.0 delegations improve this situation
8/6/2019 (Redhat) Linux Important Stuff (34)
19/50
CIFS and Samba
Samba is a server that speaks Microsoft SMBprotocols
Allows RHEL to provide networked storage for windowsguests
CIFS is the client side file system that provides accessto SMB servers
Allows RHEL clients of windows or Samba servers
See Jeff Layton's CIFS or Simo Sorce's Samba talk for
details
8/6/2019 (Redhat) Linux Important Stuff (34)
20/50
Overview
Introduction
Local File Systems
Networked File Systems
Shared Disk File Systems
Storage Overview
New RHEL6 FS Features
Performance Results
Futures
8/6/2019 (Redhat) Linux Important Stuff (34)
21/50
Shared Disk File Systems
Design goal is to provide tight coherence and highavailability
Avoids most of the issues and lags seen with NFSclients and servers
Achieves this by aggressive use of distributed locks Requires shared storage
Shared disk file systems pay for this tighter coherency
Tend be slower than a dedicated local file system
Complex to set up and maintain
Application tuning needed to avoid lock thrashing
8/6/2019 (Redhat) Linux Important Stuff (34)
22/50
Choosing Between NFS & GFS2?
GFS2 is a layered product aimed at deployments thatneed high availability
Supported on clusters from 2-16 nodes
GFS1 support is dropped in RHEL6
Maximum FS size is 100TB Users are encouraged to review configuration with Red
Hat
NFS deployments are much easier to set up and
configure
8/6/2019 (Redhat) Linux Important Stuff (34)
23/50
Overview
Introduction
Local File Systems
Networked File Systems
Shared Disk File Systems
Storage Overview
New RHEL6 FS Features
Performance Results
Futures
8/6/2019 (Redhat) Linux Important Stuff (34)
24/50
Storage Systems Overview
Different types of storage have wildly varyingperformance characteristics
Random write?
Random read?
Streaming read? Streaming write?
File systems historically have been tuned to run beston traditional, single rotating disk drives
See Tom Coughlan's talk on storage for details
8/6/2019 (Redhat) Linux Important Stuff (34)
25/50
Traditional Spinning Disk
Spinning platters store data Modern drives have a large, volatile write cache (16+
MB)
Streaming read/write performance of a single S-ATA
drive can sustain roughly 100MB/sec Seek latency bounds random IO to the order of 50-100
random IO's/sec
This is the classic platform that operating systems &
applications are designed for Write barrier support needed on these devices
8/6/2019 (Redhat) Linux Important Stuff (34)
26/50
External Disk Arrays
External disk arrays can be extremely sophisticated Large non-volatile cache used to store data
IO from a host normally lands in this cache withouthitting spinning media
Performance changes Streaming reads and writes are vastly improved
Random writes and reads are fast when they hit cache
Random reads can be very slow when they miss cache No need for write barrier support on these devices
8/6/2019 (Redhat) Linux Important Stuff (34)
27/50
SSD Devices
S-ATA interface SSD's Streaming reads & writes are reasonable
Random writes normally slow
Random reads great!
PCI-e interface SSD's enhance performance acrossthe board
Both types of devices tend to use internal DRAM as abuffer
Some might need write barrier support
8/6/2019 (Redhat) Linux Important Stuff (34)
28/50
Overview
Introduction
Local File Systems
Networked File Systems
Shared Disk File Systems
Storage Overview
New RHEL6 FS Features
Performance Results
Futures
8/6/2019 (Redhat) Linux Important Stuff (34)
29/50
RHEL6 Support for Alignment
New standards allow storage to inform OS of preferredalignment and IO sizes
Few storage devices currently export the information
Partitions must be aligned using the new alignmentvariables
fdisk, parted, etc snap to proper alignment
FS tools warn of misaligned partitions
Red Hat engineering is actively working with partnersto verify and enhance this for our customers
8/6/2019 (Redhat) Linux Important Stuff (34)
30/50
RHEL6 Support for Discard
File system level feature that informs storage ofregions no longer in active use
SSD devices see this as a TRIM command and use it todo wear leveling, etc
Arrays see this as a SCSI UNMAP command and canenhance thin lun support
Discard support is off by default
Some devices handle TRIM poorly
Might have performance impact Test carefully and consult with your storage provider!
8/6/2019 (Redhat) Linux Important Stuff (34)
31/50
RHEL6 NFS Features
NFS version 4 is the default Per client configuration file can override version 4
Negotiates downwards to V3, V2, etc
Support for industry standard encryption types IPV6 Support added for NFS and CIFS
NFS clients fully supported in 6.0
NFS server support for IPV6 aimed at 6.1
8/6/2019 (Redhat) Linux Important Stuff (34)
32/50
Overview
Introduction
Local File Systems
Networked File Systems
Shared Disk File Systems
Storage Overview
New RHEL6 FS Features
Performance Results
Futures
8/6/2019 (Redhat) Linux Important Stuff (34)
33/50
Performance & Measurement
Workload, storage device and server type all have ahuge impact
Always measure your actual application on your realsystem if possible!
Same test run on different storage can give oppositeresults
Various file systems have special tuning that can help
See talks by our performance team Rao & Wagner
and Shakshober & Woodman
Making a File System Elapsed Time (sec)
8/6/2019 (Redhat) Linux Important Stuff (34)
34/50
Making a File System Elapsed Time (sec)Smaller is Better
S-ATA Disk - 1TB FS PCI-E SSD - 75GB FS
0
50
100
150
200
250
300
EXT3EXT4
XFS
BTRFS
Making a File System Elapsed Time (sec)
8/6/2019 (Redhat) Linux Important Stuff (34)
35/50
Making a File System Elapsed Time (sec)Smaller is Better (Zooming in on SSD)
PCI-E SSD - 75GB FS
0
0.5
1
1.5
2
2.5
3
3.5
4
EXT3EXT4
XFS
BTRFS
Creating Lots of Small Files Elapsed Time (sec)
8/6/2019 (Redhat) Linux Important Stuff (34)
36/50
Creating Lots of Small Files Elapsed Time (sec)Smaller is Better
S-ATA Disk - 100GB File PCI-E SSD - 50GB File
0
2000
4000
6000
8000
10000
12000
EXT3EXT4
XFS
BTRFS
Creating Lots of Small Files Elapsed Time (sec)
8/6/2019 (Redhat) Linux Important Stuff (34)
37/50
Creating Lots of Small Files Elapsed Time (sec)Smaller is Better (Zooming in on SSD)
PCI-E SSD - 50GB File
0
50
100
150
200
250
300
350
400
EXT3EXT4
XFS
BTRFS
Creating Lots of Small Files Elapsed Time (sec)
8/6/2019 (Redhat) Linux Important Stuff (34)
38/50
Creating Lots of Small Files Elapsed Time (sec)Smaller is Better
S-ATA Disk - 100GB File PCI-E SSD - 50GB File
0
2000
4000
6000
8000
10000
12000
EXT3EXT4
XFS
BTRFS
Creating Lots of Small Files Elapsed Time (sec)
8/6/2019 (Redhat) Linux Important Stuff (34)
39/50
Creating Lots of Small Files Elapsed Time (sec)Smaller is Better (Zooming in on SSD)
PCI-E SSD - 50GB File
0
50
100
150
200
250
300
350
400
EXT3EXT4
XFS
BTRFS
File System Repair Elapsed Time (secs)
8/6/2019 (Redhat) Linux Important Stuff (34)
40/50
y p p ( )Smaller is Better
S-ATA Disk - FSCK 1 Million Files PCI-E SSD - FSCK 1 Million Files
0
200
400
600
800
1000
1200
EXT3EXT4
XFS
BTRFS
File System Repair Elapsed Time (secs)
8/6/2019 (Redhat) Linux Important Stuff (34)
41/50
y p p ( )Smaller is Better (Zooming in on SSD)
PCI-E SSD - FSCK 1 Million Files
0
10
20
30
40
50
60
70
80
EXT3EXT4
XFS
BTRFS
Writing a Few Medium Files Elapsed Time (secs)
8/6/2019 (Redhat) Linux Important Stuff (34)
42/50
g p ( )Smaller is Better
S-ATA Disk - 20 1GB Files PCI-E SSD - 20 1GB Files
0
50
100
150
200
250
300
EXT3EXT4
XFS
BTRFS
Writing 1 Really Big File MB/sec
8/6/2019 (Redhat) Linux Important Stuff (34)
43/50
g y gBigger is Better
S-ATA Disk - 100GB File PCI-E SSD - 50GB File
0
50
100
150
200
250
300
350
400
450
EXT3
EXT4
XFS
BTRFS
Block Device
RHEL5 3 IO EXT3 EXT4 XFS l
8/6/2019 (Redhat) Linux Important Stuff (34)
44/50
RHEL5.3 IOzone EXT3, EXT4, XFS evalBigger is Better
EXT4DEV EXT4 BARRIER=0 XFS XFS Barrier=0
0
5
10
15
20
25
30
35
40
RHEL53 (120), IOzone PerformanceGeo Mean 1k points, Intel 8cpu, 16GB, FC
In Cache
Direct I/O
> Cache
PercentRe
lativetoEXT3
RHEL5 Oracle 10.2 Performance FilesystemsIntel 8 cp 16GB 2 FC MPIO AIO/DIO
8/6/2019 (Redhat) Linux Important Stuff (34)
45/50
Intel 8-cpu, 16GB, 2 FC MPIO, AIO/DIOBigger is Better
10 U 20 U 40 U 60 U
0.00
10000.00
20000.00
30000.00
40000.00
50000.00
60000.00
70000.00
80000.00
90000.00
100000.00
RHEL53 Base 8 cpus
ext3
RHEL53 Base 8 cpus
xfs
RHEL53 Base 8 cpus
ext4
Performance Summary
8/6/2019 (Redhat) Linux Important Stuff (34)
46/50
Performance Summary
Always measure performance of your application onyour real system!
No single file system out performs every other one
Expensive storage can hide performance issues
Retest when moving to a new OS or applicationversion
Faster is not always better
Trade offs include reduced data integrity
Less features like extended attributes, system security
Overview
8/6/2019 (Redhat) Linux Important Stuff (34)
47/50
Overview
Introduction
Local File Systems
Networked File Systems
Shared Disk File Systems
Storage Overview
New RHEL6 FS Features
Performance Results
Futures
Upcoming Local File System Features
8/6/2019 (Redhat) Linux Important Stuff (34)
48/50
Upcoming Local File System Features
Union mounts
Allow a read-write overlay on top of a read-only base filesystem
Useful for virt guests storage, thin clients, etc
Continuing to help lead btrfs development towards anenterprise ready state
Support for ext4 on larger storage
Enhanced XFS performance for meta-data intensive
workloads
Upcoming NFS Features
8/6/2019 (Redhat) Linux Important Stuff (34)
49/50
Upcoming NFS Features
PNFS support pNFS and more 4.1 features aimed at a minor 6.x
release
No commercial arrays support pNFS yet
Ongoing work on open source (GFS2, object, etc)pNFS servers
Working with standards body to add support forpassing extended attributes over NFS
Goal is to enable SELinux over NFS
8/6/2019 (Redhat) Linux Important Stuff (34)
50/50
Top Related