1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

35
1 A Look at PVFS, a A Look at PVFS, a Parallel File Parallel File System for Linux System for Linux Will Arensman Will Arensman Anila Pillai Anila Pillai

Transcript of 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

Page 1: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

1

A Look at PVFS, a A Look at PVFS, a Parallel File System for Parallel File System for

LinuxLinux

Will ArensmanWill Arensman

Anila PillaiAnila Pillai

Page 2: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

2

OverviewOverview

Network File Systems (NFS)Network File Systems (NFS) Drawbacks of NFSDrawbacks of NFS Parallel Virtual File Systems (PVFS)Parallel Virtual File Systems (PVFS) Using PVFSUsing PVFS DemoDemo ConclusionConclusion ReferencesReferences

Page 3: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

3

1. Network File System (NFS)1. Network File System (NFS) NFS is a client/server application developed by Sun Microsystems It lets a user view, store and update files on a remote computer as

though the files were on the user's local machine. The basic function of the NFS server is to allow its file systems to be

accessed by any computer on an IP network. NFS clients access the server files by mounting the servers exported

file systems.

For example:For example:

/home/ann server1:/export/home/ann/home/ann server1:/export/home/ann

Page 4: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

4

Having all your data stored in a central location presents a number of problems:

Scalability:Scalability: arises when the number of computing nodes exceeds the performance capacity of the machine exporting the file system; could add more memory, processing power and network interfaces at the NFS server, but you will soon run out of CPU, memory and PCI slots; the higher the node count, the less bandwidth (file I/O) individual node processes end up with

Availability:Availability: if NFS server goes down all the processing nodes have to wait until the server comes back into life.

Solution:Solution: Parallel Virtual File System (PVFS)

2. Drawbacks of NFS2. Drawbacks of NFS

Page 5: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

5

Parallel Virtual File System (PVFS) is an open source implementation of a parallel file system developed specifically for Beowulf class parallel computers and Linux operating system

It is joint project between Clemson University and Argonne National Laboratory

PVFS has been released and supported under a GPL license since 1998

File System – allows users to store and retrieve data using common file access methods (open, close, read, write)

Parallel – stores data on multiple independent machines with separate network connections

Virtual – exists as a set of user-space daemons storing data on local file systems

3. Parallel Virtual File System(PVFS)3. Parallel Virtual File System(PVFS)

Page 6: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

6

Instead of having one server exporting a file via NFS, you have N servers exporting portions of a file to parallel application tasks running on multiple processing nodes over an existing network

The aggregate bandwidth exceeds that of a single machine exporting the same file to all processing nodes.

This works much the same way as RAID 0 – file data is striped across all I/O nodes.

PVFS… PVFS…

1 2 3 4 5 6

1

7 8 9 10

2 3 4 5 6 7 8 9 10

Data blocks

RAID 0 (Stripping)

Page 7: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

7

PVFS provides the following features in one package:

allows existing binaries to operate on PVFS files without the need for recompiling

enables user-controlled striping of data across disks on the I/O nodes

robust and scalable

provides high bandwidth for concurrent read/write operations from multiple processes to a common file

ease of installation

easily used - provides a cluster wide consistent name space,

PVFS file systems may be mounted on all nodes in the same directory simultaneously, allowing all nodes to see and access all files on the PVFS file system through the same directory scheme.

Once mounted PVFS files and directories can be operated on with all the familiar tools, such as ls, cp, and rm.

PVFS…PVFS…

Page 8: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

8

In order to provide high-performance access to data stored on the file system by many clients, PVFS spreads data out across multiple cluster nodes, called I/O nodes

By spreading data across multiple I/O nodes, applications have multiple paths to data through the network and multiple disks on which data is stored.

This eliminates single bottlenecks in the I/O path and thus increases the total potential bandwidth for multiple clients, or aggregate bandwidth.

Roles of nodes in PVFS:

1. COMPUTE NODES - on which applications are run,

2. MANAGEMENT NODE - which handles metadata operations

3. I/O NODES - which store file data for PVFS file systems.

Note:- nodes may perform more than one role

PVFS Design and ImplementationPVFS Design and Implementation

Page 9: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

9

PVFS System ArchitecturePVFS System Architecture

Page 10: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

10

There are four major components to the PVFS system:

1. Metadata server (mgr)

2. I/O server (iod)

3. PVFS native API (libpvfs)

4. PVFS Linux kernel support

The first two components are daemons (server types) which run on nodes in the cluster

1. The metadata server (or mgr)

File manager; it manages metadata for PVFS files.

A single manager daemon is responsible for the storage of and access to all the metadata in the PVFS file system

Metadata - information describing the characteristics of a file, such as permissions, the owner and group, and, more important, the physical distribution of the file data

PVFS ComponentsPVFS Components

Page 11: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

11

PVFS files are striped across a set of I/O nodes in order to facilitate parallel access.

The specifics of a given file distribution are described with three metadata parameters:

base I/O node number

number of I/O nodes

stripe size

These parameters, together with an ordering of the I/O nodes for the file system, allow the file distribution to be completely specified

PVFS ComponentsPVFS Components

Page 12: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

12

Example:

pcountpcount - field specifies that the the number of I/O nodes used for storing data

basebase - specifies that the first (or base) I/O node (is node 2 here)

ssizessize - specifies that the stripe size--the unit by which the file is divided among the I/O nodes—here it is 64 Kbytes

The user can set these parameters when the file is created, or PVFS will use a default set of values

PVFS ComponentsPVFS Components

inodeinode 10921575041092157504

basebase 22

pcountpcount 33

ssizessize 6553665536

Meta data example for file

Page 13: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

13

PVFS ComponentsPVFS ComponentsPVFS file striping done in a round-robin fashion

Though there are six I/O nodes in this example, the file is striped across only three I/O nodes, starting from node 2, because the metadata file specifies such a striping.

Each I/O daemon stores its portion of the PVFS file in a file on the local file system on the I/O node.

The name of this file is based on the inode number that the manager assigned to the PVFS file (in our example, 1092157504).

Page 14: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

14

PVFS ComponentsPVFS Components

when application processes (clients) open a PVFS file, the PVFS manager informs them of the locations of the I/O daemons

the clients then establish connections with the I/O daemons directly

when a client wishes to access file data, the client library sends a descriptor of the file region being accessed to the I/O daemons holding data in the region

the daemons determine what portions of the requested region they have locally and perform the necessary I/O and data transfers.

Page 15: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

15

2. The I/O server (or iod)

It handles storing and retrieving file data stored on local disks connected to the node.

3. PVFS native API (libpvfs)

It provides user-space access to the PVFS servers

This library handles the scatter/gather operations necessary to move data between user buffers and PVFS servers, keeping these operations transparent to the user

For metadata operations, applications communicate through the library with the metadata server

For data access the metadata server is eliminated from the access path and instead I/O servers are contacted directly

This is key to providing scalable aggregate performance

The figure shows data flow in the PVFS system for metadata operations and data access

PVFS ComponentsPVFS Components

Page 16: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

16

PVFS ComponentsPVFS Components

For metadata operations applications communicate through the library with the metadata server

Metadata access

Page 17: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

17

PVFS ComponentsPVFS Components

Metadata server is eliminated from the access path; instead I/O servers are contacted directly; libpvfs reconstructs file data from pieces received from iods

Data access

Page 18: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

18

4. PVFS Linux kernel support

The PVFS Linux kernel support provides the functionality necessary to mount PVFS file systems on Linux nodes

This allows existing programs to access PVFS files without any modification

This support is not necessary for PVFS use by applications, but it provides an extremely convenient means for interacting with the system

The PVFS Linux kernel support includes:

a loadable module

an optional kernel patch to eliminate a memory copy

a daemon (pvfsd) that accesses the PVFS file system on

behalf of applications

It uses functions from libpvfs to perform these operations.

PVFS ComponentsPVFS Components

Page 19: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

19

The figure shows data flow through the kernel when the Linux kernel support is used

Operations are passed through system calls to the Linux VFS layer. Here they are queued for service by the pvfsd, which receives operations from the kernel through a device file

It then communicates with the PVFS servers and returns data through the kernel to the application

PVFS ComponentsPVFS Components

app

/dev/pvfsdVFS

pvfsd

to PVFS servers

user space

kernel space

Data flow through kernelData flow through kernel

Page 20: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

20

Applications on client nodes can access PVFS data on I/O nodes using one of the three methods:

PVFS native API:-

The PVFS native API provides a UNIX-like interface for accessing PVFS files. It also allows users to specify how files will be striped across the I/O nodes in the PVFS system.

Linux kernel interface:-

The Linux kernel VFS Module provides the functionality for adding new file-system support via loadable modules without recompiling the kernel. These modules allow PVFS file systems to be mounted in a manner similar to NFS. Once mounted, the PVFS file system can be traversed and accessed with existing binaries just as any other file system. .

PVFS Application InterfacesPVFS Application Interfaces

Page 21: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

21

ROMIO MPI-IO interface:-

ROMIO implements the MPI2 I/O calls in a portable library. This allows parallel programmers using MPI to access PVFS files through the MPI-IO interface

PVFS Application InterfacesPVFS Application Interfaces

Page 22: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

22

5. Using PVFS 5. Using PVFS 1. Download, untar pvfs, pvfs-kernel files.

Available at:http://parlweb.parl.clemson.edu/pvfs/

2. Go to PVFS directory: ./configure, make, make install make install on each node

3. Go to PVFS kernel directory: ./configure –with-libpvfs-dir=../pvfs/lib make, make install cp pvfs.o /lib/modules/<kernel-version>/misc/ make install, cp pvfs.o on each node

Page 23: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

23

PVFS InstallationPVFS Installation

Metadata server needs:

1. mgr executable

2. .iodtab file: contains IP addresses and ports of I/O daemons

3. .pvfsdir file: permissions of the directory where metadata is stored

Run mkmgrconf to create .iodtab and .pvfsdir

Page 24: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

24

I/O server needs iod executable iod.conf file: describes the location of the pvfs data

directory on the machine Each client needs

pvfsd executablepvfs.o kernel module /dev/pvfsd device filemount.pvfs executablemount point

PVFS InstallationPVFS Installation

Page 25: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

25

After installation

1. Start iod’s

2. Start mgr

3. Run pvfs-mount

PVFS InstallationPVFS Installation

Page 26: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

26

PVFS APIPVFS API

1. Initialize a pvfs_filestat struct:struct pvfs_filestat {

int base; /* First node. -1 = default = 0 */

int pcount; /* # I/O Nodes. default = all*/

int ssize; /* Stripe size. default = 64K */

int soff; /* Not used. */

int bsize; /* Not used. */

}

Page 27: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

27

2. Open the file:pvfs_open(char *pathname, int flag,

mode_t mode, struct pvfs_filestat *dist);

3. Have a look at your metadata:pvfs_iocctl(int fd, GETMETA,

struct pvfs_filestat *dist);

PVFS APIPVFS API

Page 28: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

28

PVFS Utilities

Copy files to PVFS:u2p –s <ssize> -b <base> -n <#nodes> <src> <dest>

Examine file distribution:pvstat <file>

<file>: base = 0, pcount = 8, ssize = 65536

Page 29: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

29

PVFS at U of A

EagleFast Ethernet (12.5 MB per second) Various SCSI, IDE Hard Drives (~12-18 MB

per second) Raven

Myrinet (~160 MB per second)7 40 GB Drives (~60 MB per second)

Page 30: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

30

Page 31: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

31

Page 32: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

32

6. Demo…..6. Demo…..

Page 33: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

33

7. Conclusions7. Conclusions

Pros:Higher cluster performance than NFS.Many hard drives to act a one large hard drive.Works with current software.Best when reading/writing large amounts of data

Cons:Multiple points of failure.Poor performance when using kernel module.Not as good for “interactive” work.

Page 34: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

34

8. References8. References1. The Parallel Virtual File System, Available at:

http://www.parl.clemson.edu/pvfs/

2. P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur, ``PVFS: A Parallel File System For Linux Clusters'', Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317-327

3. Thomas Sterling, “Beowulf Cluster Computing with Linux”, The MIT Press, 2002

4. W. B. Ligon III and R. B. Ross, ``An Overview of the Parallel Virtual File System'', Proceedings of the 1999 Extreme Linux Workshop, June, 1999.

5. Network File System, Available at: http://www.redhat.com/docs/manuals/linux/RHL-7.2-Manual/ref-guide/ch-nfs.html

6. http://www.linuxuser.co.uk/articles/issue14/lu14-All_you_need_to_know_about-PVFS.pdf

Page 35: 1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

35

Questions…Questions…??