Scaling Servers and Storage for Film Assets

28
Scaling Servers and Storage for Film Assets Mike Sundy Digital Asset System Administrator David Baraff Senior Animation Research Scientist Pixar Animation Studios

description

In the past two years, Pixar has grown from a handful of Perforce servers to over 90 servers. In this session, members of the Pixar team will discuss how they met the challenges in scaling out and being prudent about storage usage, from automating server creation to de-duplicating the repositories.

Transcript of Scaling Servers and Storage for Film Assets

Page 1: Scaling Servers and Storage for Film Assets

Scaling Servers and Storage for Film Assets Mike Sundy Digital Asset System Administrator

David Baraff Senior Animation Research Scientist

Pixar Animation Studios

Page 2: Scaling Servers and Storage for Film Assets
Page 3: Scaling Servers and Storage for Film Assets

Environment Overview Scaling Storage Scaling Servers Challenges with Scaling

Page 4: Scaling Servers and Storage for Film Assets

Environment Overview

Page 5: Scaling Servers and Storage for Film Assets

Environment

As of March 2011: •  ~1000 Perforce users (80% of company) •  70 GB db.have •  12 million p4 ops per day (on busiest server) •  30+ VMWare server instances •  40 million submitted changelists (across all servers) •  On 2009.1 but planning to upgrade to 2010.1 soon

Page 6: Scaling Servers and Storage for Film Assets

Growth & Types of Data

Pixar grew from one code server in 2007 to 90+ Perforce servers storing all types of assets: •  art – reference and concept art – inspirational art for film. •  tech – show-specific data. e.g. models, textures, pipeline. •  studio – company-wide reference libraries. e.g. animation reference, config files, flickr-like company photo site. •  tools – code for our central tools team, software projects. •  dept – department-specific files. e.g. Creative Resources has “blessed” marketing images. •  exotics – patent data, casting audio, data for live action shorts, story gags, theme park concepts, intern art show.

Page 7: Scaling Servers and Storage for Film Assets

Scaling Storage

Page 8: Scaling Servers and Storage for Film Assets

Storage Stats

•  115 million files in Perforce. •  20+ TB of versioned files.

Page 9: Scaling Servers and Storage for Film Assets

Techniques to Manage Storage

•  Use +S filetype for the majority of generated data. Saved 40% of storage for Toy Story 3 (1.2 TB). •  Work with teams to migrate versionless data out of Perforce. Saved 2 TB by moving binary scene data out. •  De-dupe files — saved 1 million files and 1 TB.

Page 10: Scaling Servers and Storage for Film Assets

De-dupe Trigger Cases

• p4 submit file1 file2 ••• fileN p4 submit file1 file2 ••• fileN # only file2 actually modified •  p4 submit file # contents: revision n # five seconds later: “crap!” p4 submit file # contents: revision n–1 •  p4 delete file p4 submit file # user deletes file (revision n) # five seconds later: “crap!” p4 add file p4 submit file # contents: revision n

Page 11: Scaling Servers and Storage for Film Assets

De-dupe Trigger Mechanics

repfile.14 AABBCC…!

repfile.15 AABBCC…!

repfile.24 AABBCC…!

repfile.25 XXYYZZ…!

repfile.26 AABBCC…!

repfile.34 AABBCC…!

repfile.38 AABBCC…!

file#n file#n+1

file#n file#n+1 file#n+2

file#n file#n+2 file#n+1

Page 12: Scaling Servers and Storage for Film Assets

De-dupe Trigger Mechanics

repfile.24 AABBCC…!

repfile.25 XXYYZZ…!

repfile.26 AABBCC…!

•  +F for all files; detect duplicates via checksums. •  Safely discard duplicate: $ ln repfile.24 repfile.26.tmp $ rename repfile.26.tmp repfile.26!

repfile.24 AABBCC…!

repfile.25 XXYYZZ…!

repfile.26.tmp

repfile.26 AABBCC…!

hardlink rename

Page 13: Scaling Servers and Storage for Film Assets

Scaling Servers

Page 14: Scaling Servers and Storage for Film Assets

Scale Up vs. Scale Out

Why did we choose to scale out? •  Shows are self-contained. •  Performance of one depot won’t affect another.* •  Easy to browse other depots. •  Easier administration/downtime scheduling. •  Fits with workflow (e.g. no merging art) •  Central code server – share where it matters.

Page 15: Scaling Servers and Storage for Film Assets

Pixar Perforce Server Spec

•  VMWare ESX Version 4. •  RHEL 5 (Linux 2.6). •  4 GB RAM. •  50 GB “local” data volume (on EMC SAN). •  Versioned files on Netapp GFX. •  90 Perforce depots on 6 node VMWare cluster – special 2-node cluster for “hot” tech show. •  For more details, see 2009 conference paper.

Page 16: Scaling Servers and Storage for Film Assets

Virtualization Benefits

•  Quick to spin up new servers. •  Stable and fault tolerant. •  Easy to remotely administer. •  Cost-effective. •  Reduces datacenter footprint, cooling, power, etc.

Page 17: Scaling Servers and Storage for Film Assets

Reduce Dependencies

•  Clone all servers from a VM template. •  RHEL vs. Fedora. •  Reduce triggers to minimum. •  Default tables, p4d startup options. •  Versioned files stored on NFS. •  VM on a cluster. •  Can build new VM quickly if one ever dies.

Page 18: Scaling Servers and Storage for Film Assets

Virtualization Gotchas

•  Had severe performance problem when one datastore grew to over 90% full. •  Requires some jockeying to ensure load stays balanced across multiple nodes – manual vs. auto. •  Physical host performance issues can cause cross-depot issues.

Page 19: Scaling Servers and Storage for Film Assets

Speed of Virtual Perforce Servers

•  Used Perforce Benchmark Results Database tools. •  Virtualized servers 95% of performance for branchsubmit benchmark. •  85% of performance for browse benchmark (not as critical to us). •  VMWare flexibility outweighed minor performance hit.

Page 20: Scaling Servers and Storage for Film Assets

Quick Server Setup

•  Critical to be able to quickly spin up new servers. •  Went from 2-3 days for setup to 1 hour. 1-hour Setup •  Clone a p4 template VM. (30 minutes) •  Prep the VM. ( 15 minutes) •  Run “squire” script to build out p4 instance. (8 seconds) •  Validate and test. (15 minutes)

Page 21: Scaling Servers and Storage for Film Assets

Squire

Script which automates p4 server setup. Sets up: •  p4 binaries •  metadata tables (protect/triggers/typemap/counters) •  cron jobs (checkpoint/journal/verify) •  monitoring •  permissions (filesystem and p4) •  .initd startup script •  linkatron namespace •  pipeline integration (for tech depots) •  config files

Page 22: Scaling Servers and Storage for Film Assets

Superp4

Script for managing p4 metadata tables across multiple servers. •  Preferable to hand-editing 90 tables. •  Database driven (i.e. list of depots) •  Scopable by depot domain (art, tech, etc.) •  Rollback functionality.

Page 23: Scaling Servers and Storage for Film Assets

Superp4 example

$ cd /usr/anim/ts3!$ p4 triggers -o!Triggers:

!noHost form-out client ”removeHost.py %formfile%”!!$ cat fix-noHost.py!def modify(data, depot):! return [line.replace("noHost form-out”,! "noHost form-in”)! for line in data]!!$ superp4 –table triggers –script fix-noHost.py –diff! • Copies triggers to restore dir • Runs fix-noHost.py to produce new triggers, for each depot. • Shows me a diff of the above. • Asks confirmation; finally, modifies triggers on each depot. • Tells me where the restore dir is!!

Page 24: Scaling Servers and Storage for Film Assets

Superp4 options

$ superp4 –help! -n Don’t actually modify data! -diff Show diffs for each depot using xdiff. -category category Pick depots by category (art, tech, etc.) -units unit1 unit2 ... Specify an explicit depot list (regexp allowed). -script script Python file to be execfile()'d; must define a function named modify(). -table tableType Table to operate on (triggers, typemap,…) -configFile configFile Config file to modify (e.g. admin/values-config) -outDir outDir Directory to store working files, and for restoral. -restoreDir restoreDir Directory previously produced by running superp4, for when you screw up.

Page 25: Scaling Servers and Storage for Film Assets

Challenges With Scaling

Page 26: Scaling Servers and Storage for Film Assets

Gotchas

•  //spec/client filled up. •  user-written triggers sub-optimal. •  “shadow files” consumed server space. •  monitoring difficult – cue templaRX and mayday. •  cap renderfarm ops. •  beware of automated tests and clueless GUIs. •  verify can be dangerous to your health (cross-depot).

Page 27: Scaling Servers and Storage for Film Assets

Summary

•  Perforce scales well for large amounts of binary data. •  Virtualization = fast and cost-effective server setup. •  Use +S filetype and de-dupe to reduce storage usage.

Page 28: Scaling Servers and Storage for Film Assets

Q & A

Questions?