Sensitivity of Cluster File System Access to I/O Server Selection

Post on 02-Feb-2016

24 views 0 download

Tags:

description

Sensitivity of Cluster File System Access to I/O Server Selection. A. Apon, P. Wolinski, and G. Amerson University of Arkansas. Overview. Benchmarking study Parallel Virtual File System (PVFS) Network File System (NFS) Testing parameters include Pentium-based cluster node hardware - PowerPoint PPT Presentation

Transcript of Sensitivity of Cluster File System Access to I/O Server Selection

Sensitivity of Cluster File System Access to I/O Server Selection

A. Apon, P. Wolinski,

and G. Amerson

University of Arkansas

Overview

Benchmarking study– Parallel Virtual File System (PVFS)– Network File System (NFS)

Testing parameters include– Pentium-based cluster node hardware– Myrinet interconnect– Varying number and configuration of I/O servers

and client request patterns

Outline

File system architectures Performance study design Experimental results Conclusions and future work

Node 0

NFS Server Node 1

Node 2

Node N

Each cluster node hasdual-processor PentiumLinux, HD, lots of memory

Netw

ork

Sw

itch

NFS Architecture

Client/server system Single server for files

DATAFILE

PVFS Architecture

Also a client/server system Many servers for each file Fixed sized stripes in round-robin fashion

Node 0

Node 2

Node 1

DATAFILE

Each cluster node still hasdual-processor PentiumLinux, HD, lots of memory

Netw

ork

Sw

itch

PVFS Architecture

One node is a manager node– Maintains metadata information for files

Configuration and usage options include:– Size of stripe– Number of I/O servers– Which nodes serve as I/O servers– Native PVFS API vs. UNIX/POSIX API

Native PVFS API example

#include <pvfs.h>

int main() {int fd, bytes; fd=pvfs_open(fn,O_RDONLY,0,NULL,NULL); ... pvfs_lseek(fd, offset, SEEK_SET); ... bytes_read = pvfs_read(fd, buf_ptr, bytes); ... pvfs_close(fd);}

Performance Study Design

Goals– Investigate the effect on cluster I/O when using

the NFS server or the PVFS I/O servers also as clients

– Compare PVFS with NFS

Performance Study Design

Experimental cluster– Seven dual-processor Pentium III 1GHz, 1GB

memory computers– Dual EIDE disk RAID 0 subsystem in all nodes,

measured throughput about 50MBps– Myrinet switches, 250MBps theoretical bandwidth

Performance Study Design

Two extreme client workloads– Local whole file (LWF)

Takes advantage of caching on server side One process per node, each process reads the entire

file from beginning to end

Node 1

Node 2

Node N

Performance Study Design

Two extreme client workloads– Global whole file (GWF)

Minimal help from caching on the server side One process per node, each process reads a different

portion of the file, balanced workload

Node 1

Node 2

Node N

NFS Parameters

Mount on Node 0 is a local mount– Optimization for NFS

NFS server can participate or not as a client in the workload

PVFS Parameters

A preliminary study was performed to determine the “best” stripe size and request size for the LWF and GWF workloads– Stripe size of 16KB– Request size of 16MB– File size of 1GB

All I/O servers for a given file participate in all requests for that file

System Software

RedHat Linux version 7.1 Linux kernel version 2.4.17-rc2 NFS protocol version 3 PVFS version 1.5.3 PVFS kernel version 1.5.3 Myrinet network drivers gm-1.5-pre3b MPICH version 1.2.1

Experimental Pseudocode

For all nodesOpen the test fileBarrier synchronize with all clientsGet start time

Loop to read/write my portionBarrier synchronize with all clientsGet end timeReport bytes processed and time

For Node 0Receive bytes processed, report aggregate throughput

Clearcache

Clear NFS client and server-side caches– Unmount NFS directory, shutdown NFS– Restart NFS, remount NFS directories

Clear server-side PVFS cache– Unmount PVFS directories on all nodes– Shutdown PVFS I/O daemons, manager– Unmount pvfs-data directory on slaves– Restart PVFS manager, I/O daemons– Remount PVFS directories, all nodes

Experimental Parameters

Number of participating clients Number of PVFS I/O servers PVFS native API vs. UNIX/POSIX API

I/O servers (NFS as well as PVFS) may or may not also participate as clients

Experimental Results

NFS PVFS native API vs UNIX/POSIX API GWF, varying server configurations LWF, varying server configurations

NFS, LWF and GWF with and without server reading

PVFS, LWF and GWFnative PVFS API vs. UNIX/POSIX API

PVFS UNIX/POSIX API compared to NFS

PVFS, GWF using native API servers added from Node 6 down

PVFS and NFS, GWF, 1 and 2 clients with/without server participating

PVFS, LWF using native API servers added from Node 6 down

PVFS and NFS, LWF, 1, 2, 3 clients with/without servers participating

PVFS, LWF and GWF, separate clients and servers, seven nodes

Conclusions

NFS can take advantage of a local mount NFS performance is limited by contention at

the single server– Limited to the disk throughput or the network

throughput from the server, whichever has the most contention

Conclusions

PVFS performance generally improves (does not decrease) as the number of clients increases– More improvement seen with LWF workload than

with the GWF workload

PVFS performance improves when the workload can take advantage of server-side caching

Conclusions

PVFS is better than NFS for all types of workloads where more than one I/O server can be used

PVFS UNIX/POSIX API performance is much less than the performance using the PVFS native API– May be improved by a new release of the Linux

kernel

Conclusions

For a given number of servers, PVFS I/O throughput decreases when the servers also act as clients

For the workloads tested, PVFS system throughput increases to the maximum possible for the cluster when all nodes participate as both clients and servers

Observation

The drivers and libraries have been in constant upgrade during these studies. However, our recent experiences indicate that they are now stable and interoperate well together.

Future Work

Benchmarking with cluster workloads that include both computation and file access

Expand the benchmarking to a cluster with a higher number of PVFS clients and PVFS servers