Large-scale Web Apps @ Pinterest

Large Scale Web Apps @Pinterest (Powered by Apache HBase)

May 5, 2014

Pinterest is a visual discovery tool for collecting the things you love, and discovering related content along the way.

What is Pinterest ?

ScaleChallenges @scale • 100s of millions of pins/repins per month • Billions of requests per week • Millions of daily active users • Billions of pins • One of the largest discovery tools on the internet

Storage stack @Pinterest!

• MySQL • Redis (persistence and for cache) • MemCache (Consistent Hashing)

App Tier

Manual Sharding

Sharding Logic

Why HBase ?!

• High Write throughput - Unlike MySQL/B-Tree, writes don’t ever seek on Disk

• Seamless integration with Hadoop • Distributed operation

- Fault tolerance - Load Balancing - Easily add/remove nodes !

Non-Technical Reasons • Large active community • Large scale online use cases

Outline!

• Features powered by HBase • SaaS (Storage as a Service)

- MetaStore - HFile Service (Terrapin)

• Our HBase setup - optimizing for High availability & Low latency

Applications/Features!

• Offline - Analytics - Search Indexing - ETL/Hadoop worklows

• Online - Personalized Feeds - Rich Pins - Recommendations

Why HBase ?

Personalized Feeds

WHY HBASE ? Write Heavy load due to Pin fanout.

Recommended Pins

Users I follow

Rich Pins

WHY HBASE ? Negative Hits with Bloom Filters

Recommendations

HADOOP 1.0

HBASE + HADOOP 2.0

HADOOP 2.0

WHY HBASE ? Seamless Data Transfer from Hadoop

Generate Recommendations

DistCP Jobs

Serving Cluster

• Large number of feature requests • 1 Cluster per feature • Scaling with organizational growth • Need for “defensive” multi tenant storage • Previous solutions reaching their limits

MetaStore I• Key Value store on top of HBase • 1 HBase Table per Feature with salted keys • Pre split tables • Table level rate limiting (online/offline reads/writes) • No Scan support • Simple client API !

string getValue(string feature, string key, boolean online); void setValue(string feature, string key, string value,

boolean online);

MetaStore II

MetaStore Thrift Server

Primary HBase Secondary HBase

Clients

Master/Master Replication

Thrift

Salting + Rate Limiting ZooKeeper

Issue Gets/Sets

Notifications

Metastore Config - Rate Limits - Primary Cluster

HFile Service (Terrapin)

• Solve the Bulk Upload problem • HBase backed solution

- Bulk upload + major compact - Major compact to delete old data

• Design solution from scratch using mashup of: - HFile - HBase BlockCache - Avoid compactions - Low latency key value lookups

High Level Architecture I

Client Library /Service

ETL/Batch Jobs Load/Reload

HFile Servers

HFiles on Amazon S3

Key/Value Lookups

Multiple HFiles/Server

High Level Architecture II• Each HFile server runs 2 processes

- Copier: pulls HFiles from S3 to local disk - Supershard: serves multiple HFile shards to client

• ZooKeeper - Detecting alive servers - Coordinating loading/swapping of new data - Enabling clients to detect availability of new data

• Loader Module (replaces distcp) - Trigger new data copy - Trigger swap through zookeeper - Update ZooKeeper and notify client

• Client library understands sharding • Old data deleted by background process !

Salient Features

• Multi tenancy through namespacing • Pluggable sharding functions - modulus, range & more • HBase Block Cache • Multiple clusters for redundancy • Speculative execution across clusters for low latency !

Setting up for Success• Many online usecases/applications • Optimize for:

- Low MTTR - high availability - Low latency (performance)

MTTR - I

DEADLIVE STALE20sec 9min 40sec

• Stale nodes avoided - As candidates for Reads - As candidate replicas for writes - During Lease Recovery

• Copying of underreplicated blocks starts when a Node is marked as “Dead”

DataNode States

MTTR - II

Failure Detection

Lease Recovery

Log Split

Recover Regions

30 sec ZooKeeper session timeout

HDFS 4721

HDFS 3703 + HDFS 3912

< 2 min

• Avoid stale nodes at each point of the recovery process • Multi minute timeouts ==> Multi second timeouts

Simulate, Simulate, Simulate

Simulate “Pull the plug failures” and “tail -f the logs” • kill -9 both datanode and region server - causes connection refused errors • kill -STOP both datanode and region server - causes socket timeouts • Blackhole hosts using iptables - connect timeouts + “No Route to host” - Most representative of AWS failures

PerformanceConfiguration tweaks • Small Block Size, 4K-16K • Prefix compression to cache more - when data is in the key, close to 4X reduction for some data sets • Separation of RPC handler threads for reads vs writes • Short circuit local reads • HBase level checksums (HBASE 5074)

Hardware • SATA (m1.xl/c1.xl) and SSD (hi1.4xl) • Choose based on limiting factor

- Disk space - pick SATA for max GB/$$ - IOPs - pick SSD for max IOPs/$$, clusters with heavy reads or heavy compaction activity

Performance (SSDs)

HFile Read Performance • Turn off block cache for Data Blocks, reduce GC + heap fragmentation • Keep block cache on for Index Blocks • Increase “dfs.client.read.shortcircuit.streams.cache.size” from 100 to 10,000 (with short circuit reads) • Approx. 3X improvement in read throughput !

Write Performance • WAL contention when client sets AutoFlush=true • HBase 8755

In the Pipeline...!

• Building a graph database on HBase • Disaster recovery - snapshot + incremental backup + restore • Off Heap cache - reduce GC overhead and better use of hardware • Read path optimizations

And we are Hiring !!

Large-scale Web Apps @ Pinterest

Software

Transcript of Large-scale Web Apps @ Pinterest

Otras herramientas de Web Social: Google Apps. Objetos digitales. Wikis. Escritorios virtuales. Marcadores sociales. Pinterest

Badass Microservices - deploy, build & scale your apps with Payara Micro

Social Media and Alt Text - Minnesota · Social Media and Alt Text ... Instagram. LinkedIn Post. LinkedIn Article Image. Pinterest. Scheduling Apps • Choose scheduling apps that

Large-Scale Empirical Studies of Mobile Apps

How to Build Front-End Web Apps that Scale - FutureJS

Pinterest + amazon = $$$ pinterest-traffic-secrets

Modular Angular: Apps that Scale (ng-vegas)

Building Backbone.js Apps for Scale

Api Apps Easily build and consume APIs in the cloud Web Apps Web apps that scale with your business Logic Apps Automate business process across SaaS.

Redis Functions, Data Structures for Web Scale Apps

T3 - Deploy, manage, and scale your apps

Building front-end apps that Scale - FOSDEM 2014

Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes

AppInspect: Large-scale Evaluation of Social Networking Apps

Performance Testing ISV Apps to Scale

Scale Your Apps for the Big Time

Scale net apps in aws

Designing OpenSocial Apps For Speed and Scale - Huihoodocs.huihoo.com/google/io/2009/DesigningOpenSocial... · Designing OpenSocial Apps for Speed and Scale ... glinden/StanfordDataMining.2006-11-29.ppt

Pinterest - 10 tips til Pinterest

Heroku Apps Scale-out, performance monitoring・visualization・ management