London Ceph Day: Ceph at CERN

download London Ceph Day: Ceph at CERN

of 28

  • date post

    16-Jan-2015
  • Category

    Technology

  • view

    3.722
  • download

    3

Embed Size (px)

description

Dan van der Ster, Computer Engineer & Dr. Arne Wiebalck, CERN

Transcript of London Ceph Day: Ceph at CERN

  • 1. DSSData & Storage ServicesBuilding an organic block storage service at CERN with CephDan van der Ster Arne Wiebalck Ceph Day 2013, London 9 October 2013CERN IT Department CH-1211 Geneva 23 Switzerlandwww.cern.ch/it

2. DSS CERNs mission and tools CERN studies the fundamental laws of nature The Large Hadron Collider (LHC) Internet Services CERN IT DepartmentBuilt in a 27km long tunnel, ~200m underground Dipole magnets operated at -271C (1.9K) Particles do ~11000 turns/sec, 600 million collisions/sec ...Detectors Why do particles have mass? What is our universe made of? Why is there no antimatter left? What was matter like right after the Big Bang? Four main experiments, each the size of a cathedral DAQ systems Processing PetaBytes/secWorldwide LHC Computing Grid (WLCG) Computer network to provide computing for LHC data analysis CERN at the centre of 170 computing centres worldwideCH-1211 Geneva 23 Switzerlandwww.cern.ch/itWiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph 3. DSS CERNs mission and toolsInternet Services CERN IT Department CH-1211 Geneva 23 Switzerlandwww.cern.ch/itWiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph 4. DSS Big Data at CERN Physics Data on CASTOR/EOSHome directories for 30k users Physics analysis devt Project spaces (applications)Service Data on AFS/NFS Databases, admin applicationsTape archival with CASTOR/TSM Internet Services CERN IT DepartmentFiles240TB1.9BCASTOR87.7PB317MEOS19.8PB160MLHC experiments produce ~10GB/s, 25PB/yearUser Data on AFS/DFS SizeAFSServiceRAW physics outputs Desktop/Server backupsCERN developed CASTOR & EOS because until very recently our storage reqs were globally unique. Following the Google / Amazon / Facebook innovations, we are now trying to leverage community solutionsCH-1211 Geneva 23 Switzerlandwww.cern.ch/itWiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph 5. DSS IT (R)evolution at CERN Cloudifying CERNs IT infrastructure ... Centrally-managed and uniform hardware No more service-specific storage boxes OpenStack VMs for most services Building for 100k nodes (mostly for batch processing) Attractive desktop storage services Huge demand for a local Dropbox, Google Drive Remote data centre in Budapest More rack space and power, plus disaster recovery brings new storage requirements Block storage for OpenStack VMs Images and volumes Backend storage for existing and new services AFS, NFS, OwnCloud, Zenodo, ... Regional storage Make use of the new data centre in Hungary Failure tolerance, data checksumming, easy to operate, security, ...Internet Services CERN IT Department CH-1211 Geneva 23 Switzerlandwww.cern.ch/itWiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph 6. DSS Possible Solutions GlusterFS Cloud team at CERN found it wasnt stable enough Doesnt offer block device for physical machinesNFS (NetApp) Expensive Vendor lock-inCeph Interesting architecture (on paper) Offers almost all features we neededEarly 2013 we started investigating Ceph ...Internet Services CERN IT Department CH-1211 Geneva 23 Switzerlandwww.cern.ch/itWiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph 7. DSS First steps Set up a small scale test cluster 3 MON servers, 1 RADOS gateway (all VMs) 8 OSD hosts with 4-5 disks each (ex-CASTOR) Ceph 0.56.4 installed via yum install ceph on SLC6.4 Various clients: kernel rbd driver, OpenStack, AI monitoring, ...MONMONOSD OSD OSDMONRGWCLT1CLT2...Internet Services CERN IT Department CH-1211 Geneva 23 Switzerlandwww.cern.ch/itWiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph 8. DSS Early testing Setup was easy Passed our (simple) interface tests remove OSD, change replication size, delete object in pg, corrupt object in pg, OpenStack/CinderPassed our performance test RADOS, RBD, RADOS GW, CephFSPassed our first functional tests ~2 days for our 50TB testbedradosbenchPassed our community expectations very quick and helpful responses to issues we encounteredInternet Services CERN IT Department CH-1211 Geneva 23 Switzerlandwww.cern.ch/itWiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph 9. DSS Issues during early testing ceph-deploy did not work for us at the time2 rooms - 3 replicas - problemre-weight apocalypse flaky server caused Ceph timeouts and constant re-balancing wrong ratio of RAM to OSDstaking out the server fixed the problem root cause not understood (can slow server slow down the cluster?)qemu-kvm RPM on RHEL derivative SLC needs patching RPM provided by InktankInternet Services CERN IT Department CH-1211 Geneva 23 Switzerlandwww.cern.ch/itWiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph 10. DSS Issues during early testing ceph-deploy did not work for us at the time2 rooms - 3 replicas - problemThe results of this initial testing allowed re-weight apocalypse us toratio of RAM to OSDs wrong convince management to support a more serious Ceph prototype ... flaky server caused Ceph timeouts and constant re-balancing taking out the server fixed the problem root cause not understood (can slow server slow down the cluster?)qemu-kvm RPM on RHEL derivative SLC needs patching RPM provided by InktankInternet Services CERN IT Department CH-1211 Geneva 23 Switzerlandwww.cern.ch/itWiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph 11. DSS 12 racks of disk server quadsInternet Services CERN IT Department CH-1211 Geneva 23 Switzerlandwww.cern.ch/itWiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph 12. DSS Our 3PB Ceph Cluster 48 OSD serversDual Intel Xeon E5-2650 32 threads incl. HT Dual 10Gig-E NICs Only one connected 24x 3TB Hitachi disks Eco drive, ~5900 RPM 3x 2TB Hitachi system disks Triple mirror5 monitorsDual Intel Xeon L5640 24 threads incl. HT Dual 1Gig-E NICs Only one connected 3x 2TB Hitachi system disks Triple mirror 48GB RAM64GB RAMInternet Services CERN IT Department[root@p01001532971954 ~]# ceph osd tree | head -n2 # id weight type name up/down reweight -1 2883 root defaultCH-1211 Geneva 23 Switzerlandwww.cern.ch/itWiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph 13. DSS Fully Puppetized Deployment Fully puppetized deployed Big thanks to eNovance for their module! https://github.com/enovance/puppet-ceph/Automated machine commissioning Add a server to the hostgroup (osd, mon, radosgw) OSD disks are detected, formatted, prepared, authd Auto-generated ceph.conf Last step is manual/controlled: service ceph startWe use mcollective for bulk operations on the servers Ceph rpm upgrades daemon restartsInternet Services CERN IT Department CH-1211 Geneva 23 Switzerlandwww.cern.ch/itWiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph 14. DSS Our puppet-ceph changes Yum repository support Dont export the admin key our puppet env is shared across CERN (get the key via k5 authd scp instead) New options: osd default pool size, mon osd down out interval, osd crush location RADOS GW support (RHEL only) https to be completed /dev/disk/by-path OSDs better handle disk replacements Unmanaged osd service manual control of the daemon Other OSD fixes: delay mkfs, dont mount the disks, Needs some cleanup before pushing back to enovanceInternet Services CERN IT Departmenthttps://github.com/cernceph/puppet-ceph/CH-1211 Geneva 23 Switzerlandwww.cern.ch/itWiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph 15. DSS Puppet-ceph TODO/Wish-list We have some further puppet work in mind: Add arbitrary ceph.conf optionsMove the OSD journal to a separate partitionSSD OSD journalsUse the udev triggers for OSD creationInternet Services CERN IT Department CH-1211 Geneva 23 Switzerlandwww.cern.ch/itWiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph 16. DSS Ceph Configuration 11 data pools with 3 replicas each mostly test pools for a few different use-cases 1-4k pgs per pool; 19584 pgs total Room/Rack in ceph.conf: osd crush location = room=0513-R-0050 rack=RJ35 Rack-wise replication: rule data { ruleset 0 type replicated min_size 1 max_size 10 step take 0513-R-0050 step chooseleaf firstn 0 type rack step emit }Internet Services CERN IT Department CH-1211 Geneva 23 Switzerlandwww.cern.ch/itWiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph 17. DSS Ceph Configuration 11 data pools with 3 replicas each mostly test pools for a few different use-cases 1-4k pgs per pool; 19584 pgs total Room/Rack in ceph.conf: osd crush location = room=0513-R-0050 rack=RJ35Internet Services CERN IT Department-1 -2 -3 -15 -16 -17 -18 -4 -23 -24 -25 -26 ...Rack-wise replication: 2883 root default rule data { 0513-R-0050 2883 room ruleset 0 262.1 rack RJ35 type replicated 65.52 host p05151113471870 min_size 1 65.52 host p05151113489275 max_size 10 65.52 host p05151113479552 step take 0513-R-0050 65.52 host p05151113498803 step chooseleaf firstn 0 type 262.1 rack RJ37 rack 65.52 host p05151113507373 step emit 65.52 host p05151113508409 } 65.52 host p05151113521447 65.52 host p05151113525886CH-1211 Geneva 23 Switzerlandwww.cern.ch/itWiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph 18. DSS Service MonitoringInternet Services CERN IT Department CH-1211 Geneva 23 Switzerlandwww.cern.ch/itWiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph 19. DSS Service Monitoring A few monitoring helper scripts https://github.com/cernceph/ceph-scripts ceph-health-cron: report on the ceph health hourly cephinfo: python API to the ceph JSON dumps cern-sls: example usage of cephinfo.py compute and