Scaling Servers and Storage for Film Assets

Scaling Servers and Storage for Film Assets Mike Sundy Digital Asset System Administrator

David Baraff Senior Animation Research Scientist

Pixar Animation Studios

Environment Overview Scaling Storage Scaling Servers Challenges with Scaling

Environment Overview

Environment

As of March 2011: •  ~1000 Perforce users (80% of company) •  70 GB db.have •  12 million p4 ops per day (on busiest server) •  30+ VMWare server instances •  40 million submitted changelists (across all servers) •  On 2009.1 but planning to upgrade to 2010.1 soon

Growth & Types of Data

Pixar grew from one code server in 2007 to 90+ Perforce servers storing all types of assets: •  art – reference and concept art – inspirational art for film. •  tech – show-specific data. e.g. models, textures, pipeline. •  studio – company-wide reference libraries. e.g. animation reference, config files, flickr-like company photo site. •  tools – code for our central tools team, software projects. •  dept – department-specific files. e.g. Creative Resources has “blessed” marketing images. •  exotics – patent data, casting audio, data for live action shorts, story gags, theme park concepts, intern art show.

Scaling Storage

Storage Stats

•  115 million files in Perforce. •  20+ TB of versioned files.

Techniques to Manage Storage

•  Use +S filetype for the majority of generated data. Saved 40% of storage for Toy Story 3 (1.2 TB). •  Work with teams to migrate versionless data out of Perforce. Saved 2 TB by moving binary scene data out. •  De-dupe files — saved 1 million files and 1 TB.

De-dupe Trigger Cases

• p4 submit file1 file2 ••• fileN p4 submit file1 file2 ••• fileN # only file2 actually modified •  p4 submit file # contents: revision n # five seconds later: “crap!” p4 submit file # contents: revision n–1 •  p4 delete file p4 submit file # user deletes file (revision n) # five seconds later: “crap!” p4 add file p4 submit file # contents: revision n

De-dupe Trigger Mechanics

repfile.14 AABBCC…!

repfile.25 XXYYZZ…!

file#n file#n+1

file#n file#n+1 file#n+2

file#n file#n+2 file#n+1

De-dupe Trigger Mechanics

•  +F for all files; detect duplicates via checksums. •  Safely discard duplicate: $ ln repfile.24 repfile.26.tmp $ rename repfile.26.tmp repfile.26!

repfile.26.tmp

hardlink rename

Scaling Servers

Scale Up vs. Scale Out

Why did we choose to scale out? •  Shows are self-contained. •  Performance of one depot won’t affect another.* •  Easy to browse other depots. •  Easier administration/downtime scheduling. •  Fits with workflow (e.g. no merging art) •  Central code server – share where it matters.

Pixar Perforce Server Spec

•  VMWare ESX Version 4. •  RHEL 5 (Linux 2.6). •  4 GB RAM. •  50 GB “local” data volume (on EMC SAN). •  Versioned files on Netapp GFX. •  90 Perforce depots on 6 node VMWare cluster – special 2-node cluster for “hot” tech show. •  For more details, see 2009 conference paper.

Virtualization Benefits

•  Quick to spin up new servers. •  Stable and fault tolerant. •  Easy to remotely administer. •  Cost-effective. •  Reduces datacenter footprint, cooling, power, etc.

Reduce Dependencies

•  Clone all servers from a VM template. •  RHEL vs. Fedora. •  Reduce triggers to minimum. •  Default tables, p4d startup options. •  Versioned files stored on NFS. •  VM on a cluster. •  Can build new VM quickly if one ever dies.

Virtualization Gotchas

•  Had severe performance problem when one datastore grew to over 90% full. •  Requires some jockeying to ensure load stays balanced across multiple nodes – manual vs. auto. •  Physical host performance issues can cause cross-depot issues.

Speed of Virtual Perforce Servers

•  Used Perforce Benchmark Results Database tools. •  Virtualized servers 95% of performance for branchsubmit benchmark. •  85% of performance for browse benchmark (not as critical to us). •  VMWare flexibility outweighed minor performance hit.

Quick Server Setup

•  Critical to be able to quickly spin up new servers. •  Went from 2-3 days for setup to 1 hour. 1-hour Setup •  Clone a p4 template VM. (30 minutes) •  Prep the VM. ( 15 minutes) •  Run “squire” script to build out p4 instance. (8 seconds) •  Validate and test. (15 minutes)

Squire

Script which automates p4 server setup. Sets up: •  p4 binaries •  metadata tables (protect/triggers/typemap/counters) •  cron jobs (checkpoint/journal/verify) •  monitoring •  permissions (filesystem and p4) •  .initd startup script •  linkatron namespace •  pipeline integration (for tech depots) •  config files

Superp4

Script for managing p4 metadata tables across multiple servers. •  Preferable to hand-editing 90 tables. •  Database driven (i.e. list of depots) •  Scopable by depot domain (art, tech, etc.) •  Rollback functionality.

Superp4 example

$ cd /usr/anim/ts3!$ p4 triggers -o!Triggers:

!noHost form-out client ”removeHost.py %formfile%”!!$ cat fix-noHost.py!def modify(data, depot):! return [line.replace("noHost form-out”,! "noHost form-in”)! for line in data]!!$ superp4 –table triggers –script fix-noHost.py –diff! • Copies triggers to restore dir • Runs fix-noHost.py to produce new triggers, for each depot. • Shows me a diff of the above. • Asks confirmation; finally, modifies triggers on each depot. • Tells me where the restore dir is!!

Superp4 options

$ superp4 –help! -n Don’t actually modify data! -diff Show diffs for each depot using xdiff. -category category Pick depots by category (art, tech, etc.) -units unit1 unit2 ... Specify an explicit depot list (regexp allowed). -script script Python file to be execfile()'d; must define a function named modify(). -table tableType Table to operate on (triggers, typemap,…) -configFile configFile Config file to modify (e.g. admin/values-config) -outDir outDir Directory to store working files, and for restoral. -restoreDir restoreDir Directory previously produced by running superp4, for when you screw up.

Challenges With Scaling

Gotchas

•  //spec/client filled up. •  user-written triggers sub-optimal. •  “shadow files” consumed server space. •  monitoring difficult – cue templaRX and mayday. •  cap renderfarm ops. •  beware of automated tests and clueless GUIs. •  verify can be dangerous to your health (cross-depot).

Summary

•  Perforce scales well for large amounts of binary data. •  Virtualization = fast and cost-effective server setup. •  Use +S filetype and de-dupe to reduce storage usage.

Questions?

Scaling Servers and Storage for Film Assets

Technology

Transcript of Scaling Servers and Storage for Film Assets

Look Ma! No Servers! Serverless Application Development with … · Serverless ≠ No Servers . Why serverless? You can be the minimum champion of… • Infrastructure • Scaling

A system design for elastically scaling transaction ... · A system design for elastically scaling transaction processing engines in virtualized servers Angelos-Christos Anadiotisz,

9 An Overview of the NoSQL World - ODBMS.org...Scaling up: aims at allocating a bigger machine to act as database servers. 2. Scaling out: aims at replicating and partitioning data

High Availability, Disaster Recovery and Extreme Read Scaling using Binlog Servers

Chapter # TRANSMITTANCE SCALING FOR REDUCING POWER … · · 2012-09-08Chapter # TRANSMITTANCE SCALING FOR REDUCING ... TFT stands for "Thin Film Transistor" and ... Transmittance

Performance Scaling of Cassandra on High-Thread Count Servers

Scaling Deep Learning on Multi-GPU Servers · Scaling Deep Learning on Multi-GPU Servers Peter Pietzuch (with Alexandros Koliousis, Luo Mai, PijikaWatcharapichat, Matthias Weidlich,

RAPIDS Scaling on Dell EMC PowerEdge Servers › ... › dell-emc-rapids-scaling-poweredge.pdf · RAPIDS is a GPU accelerated data science pipeline, and it consists of open-source

EngageOne Server Performance and Scalability whitepaper...Hardware – The system is capable of scaling out through the addition of extra servers and scaling up by increasing the processing

Scaling Xen Within Rackspace Cloud Servers

Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014

Central Challenges When Up Scaling the Manufacturing of ...€¦ · Central Challenges When Up Scaling the Manufacturing of Thin-film Battery Applications Author Michael Espig,¹

Architecture for Scaling Java Applications to Multiple Servers...distributed Java cache Cacheonix • Frequent speaker on scalability ... Concurrency 4. State sharing 5. Data consistency

Scaling Internet TV Content Deliverymascots2017.cs.ucalgary.ca/Gutarin-Keynote-Talk.pdf · Content Delivery Network (CDN) A set of content delivery servers (appliances) Geographically

Scaling Up and Out your Virtualized SQL Servers

Virtualization for the Win! Scaling Electronic Sports League’s servers way up Sreeram Sammeta Paul Lindberg Intel.

Auto scaling squid servers using Amazon Web Services sc… · Amazon SNS and Availability Zone 1 Availability Zone 2 Squid servers EC 2 Executing jobs Elastic Load Balancing This

Dedicated Servers in Gears of War 3 Scaling to Millions of Players Michael Weilbacher Development Manager, Microsoft Studios.

Running Postgres-as-a-Service in Kubernetes...Scaling Capabilities Backups Connection Pooling Monitoring K8S Integration. K8S Integration CLIs, Operators & API Servers Namespace handling

Unified Computing System Overview...Application Security Delivery Servers Virtualization Platform Compute Platform Network Platform Resource Scaling with Cisco Extended Memory Technology