Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of...
Transcript of Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of...
![Page 1: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/1.jpg)
11
Persistent Memory and Linux:New Storage Technologies and Interfaces
Ric WheelerSenior ManagerKernel File & Storage TeamRed Hat, Inc.
![Page 2: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/2.jpg)
2
Overview
2
● What is Persistent Memory
● Challenges with Existing SSD Devices & Linux
● Crawl, Walk, Run with Persistent Memory Support
● Getting New Devices into Linux
● Related Work
![Page 3: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/3.jpg)
What is Persistent Memory?
![Page 4: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/4.jpg)
Persistent Memory
● A variety of new technologies are coming from multiple vendors
● Critical feature is that these new parts:● Are byte addressable● Do not lose state on power failure
● Critical similarities are that they are roughly like DRAM:● Same cost point● Same density● Same performance
![Page 5: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/5.jpg)
Similarities to DRAM
● If the parts are the same cost and capacity of DRAM● Will not be reaching the same capacity as traditional,
spinning hard drives● Scaling up to a system with only persistent memory
will be expensive● Implies a need to look at caching and tiered storage
techniques● Same performance as DRAM
● Will press our IO stack which already struggles to reach the performance levels of current PCI-e SSD's
![Page 6: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/6.jpg)
Challenges with Existing SSD Devices & Linux
![Page 7: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/7.jpg)
7
![Page 8: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/8.jpg)
8
Early SSD's and Linux
● The earliest SSD's look like disks to the kernel● Fibre channel attached high end DRAM arrays (Texas
Memory Systems, etc)● S-ATA and SAS attached FLASH drives
● Plugged in seamlessly to the existing stack● Block based IO● IOP rate could be sustained by a well tuned stack● Used the full block layer● Used a normal protocol (SCSI or ATA commands)
![Page 9: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/9.jpg)
9
PCI-e SSD Devices
● Push the boundaries of the Linux IO stack● Some devices emulated AHCI devices● Many vendors created custom drivers to avoid the
overhead of using the whole stack● Performance challenges
● Linux block based IO has not been tuned as well as the network stack to support millions of IOPS
● IO scheduling was developed for high latency devices
![Page 10: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/10.jpg)
Performance Limitations of the Stack
● PCI-e devices are pushing us beyond our current IOP rate● Looking at a target of 1 million IOPS/device
● Working through a lot of lessons learned in the networking stack● Multiqueue support for devices● IO scheduling (remove plugging)● SMP/NUMA affinity for device specific requests● Lock contention
● Some fixes gain performance and lose features
![Page 11: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/11.jpg)
Tuning Linux for an SSD
● Take advantage of the Linux /sys/block parameters● Rotational is key● Alignment fields can be extremely useful● http://mkp.net/pubs/storage-topology.pdf
● Almost always a good idea not to use CFQ
![Page 12: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/12.jpg)
12
Device Driver Choice● Will one driver emerge for PCI-e cards?
● NVMe: http://www.nvmexpress.org
● SCSI over PCI-e: http://www.t10.org/members/w_sop-.htm
● Vendor specific drivers● Most Linux vendors support a range of open drivers
● Open vs closed source drivers● Linux vendors have a strong preference for open
source drivers● Drivers ship with the distribution - no separate
installation● Enterprise distribution teams can fix code issues
directly
![Page 13: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/13.jpg)
Challenges Cross Multiple Groups
● Developers focus in relatively narrow areas of the kernel● SCSI, S-ATA and vendor drivers are all
different teams● Block layer expertise is a small community● File system teams per file system● Each community of developers spans multiple
companies
![Page 14: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/14.jpg)
Crawl, Walk, Run
![Page 15: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/15.jpg)
15
Persistent Memory & Byte Aligned Access
● DRAM is used to cache all types of objects – file system metadata and user data● Moving away from this model is a challenge● IO sent in multiples of file system block size ● Rely on journal or btree based updates for consistency ● Must be resilient over crashes & reboots● On disk state is the master view & DRAM state differs
● These new devices do not need block IO
![Page 16: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/16.jpg)
Crawl Phase
● Application developers are slow to take advantage of new hardware● Most applications will continue to use read/write “block
oriented” system calls for years to come● Only a few, high end applications will take advantage
of the byte addressable capabilities● Need to hide the persistent memory below our
existing stack● Make it as fast and low latency as possible!
![Page 17: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/17.jpg)
Crawling Along● Block level driver for persistent memory parts
● Best is one driver that supports multiple types of parts● Enhance performance of IO path
● Leverage work done to optimize stack for PCI-e SSD's● Target: millions of IOP's?
● Build on top of block driver● Block level caching● File system or database journals?● As a metadata device for device mapper, btrfs?
![Page 18: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/18.jpg)
Crawl Status
● Block drivers being developed
● Block caching mechanisms upstream● Bcache from Kent Overstreet moving into the
upstream kernel – http://bcache.evilpiepirate.org
● A new device mapper dm-cache target– Simple cache target can be a layer in device mapper
stacks.– Modular policy allows anyone to write their own policy– Reuses the persistent-data library from thin provisioning
● Vendor specific caching schemes
![Page 19: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/19.jpg)
Walk Phase
● Modify existing file system or database applications● Can journal mechanism be reworked to avoid barrier
operations and flushes?● Dynamically detect persistent memory and use for
metadata storage (inodes, allocation, etc)?● Provide both block structured IO and byte aligned IO
support● Block for non-modified applications● Memory mapped accessed for “power” applications
![Page 20: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/20.jpg)
Walk Phase Status
● File systems like btrfs have looked at dynamically steering load to SSDs● More based on the latency than on persistence
● Looking at MMAP interface to see what changes are needed
● Need to educate developers about the new parts & get parts in community hands
![Page 21: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/21.jpg)
Run Phase
● Develop new APIs at the device level for consumption by file and storage stack
● Develop new APIs and a programming model for applications
● Create new file systems that consume the devices API's and provide the application API● Nisha's talk on NVM interfaces Wed at 3:00PM
● Modify all performance critical applications to use new APIs
![Page 22: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/22.jpg)
Run Status
● Storage Network Industry Association (SNIA) Working Group on NVM● Working on a programming model for this class of
parts● http://snia.org/forums/sssi/nvmp
● Several new file systems for Linux being actively worked on
● Will be as painful as multi-threading or teaching applications to use fsync()!
![Page 23: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/23.jpg)
Getting New Devices into Linux
![Page 24: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/24.jpg)
24
What is Linux?
● A set of projects and companies● Various free and fee-based distributions● Hardware vendors from handsets up to mainframes● Many different development communities
● Can be a long road to get a new bit of hardware enabled● Open source allows any party to write their own file
system or driver ● Different vendors have different paths to full support ● No single party can get your feature into distributions
![Page 25: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/25.jpg)
25
Not Just the Linux Kernel
● Most features rely on user space components
● Red Hat Enterprise Linux (RHEL) has hundreds of projects, each with● Its own development community (upstream)● Its own rules and processes● Choice of licenses
● Enterprise Linux vendors● Work in the upstream projects● Tune, test and configure● Support the shipping versions
![Page 26: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/26.jpg)
26
The Life Span of a Linux Enhancement● Origin of a feature
● Driven through standards like T10, SNIA or IETF● Pushed by a single vendor● Created by a developer or at a research group
● Proposed in the upstream community● Prototype patches posted● Feedback and testing● Advocacy for inclusion
● Move into a community distribution like Fedora
● Shipped and supported by an enterprise distribution
![Page 27: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/27.jpg)
27
The Linux Community is Huge● Most active contributors in 3.8 kernel (changesets):
● No affiliation – 12.8%
● Red Hat – 9.0%
● Intel – 8.7%
● Unknown – 7.4%
● LINBIT – 4.8%
● Linaro – 4.6%
● Texas Instruments – 4.0%
● Vision Engraving Systems – 3.5%
● Samsung – 3.3%
● SUSE – 2.5%
● Statistics from: http://lwn.net/Articles/536768
![Page 28: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/28.jpg)
28
Linux Storage & File & MM Summit 2012
![Page 29: Persistent Memory and Linux...Similarities to DRAM If the parts are the same cost and capacity of DRAM Will not be reaching the same capacity as traditional, spinning hard drives Scaling](https://reader034.fdocuments.net/reader034/viewer/2022050605/5facb9c3a5099f2fa8659484/html5/thumbnails/29.jpg)
29
Resources & Questions
● Resources● Linux Weekly News: http://lwn.net/● Mailing lists like linux-scsi, linux-ide, linux-fsdevel, etc
● SNIA NVM TWG● http://snia.org/forums/sssi/nvmp
● Storage & file system focused events● LSF workshop● Linux Foundation & Linux Plumbers Events