High Availability - How to get 99.99% service availabilty - Designing clusters (DOs & DON'Ts)
Magnolia Performance Tuning and High Availabilty Best Practices
-
date post
18-Oct-2014 -
Category
Software
-
view
188 -
download
3
description
Transcript of Magnolia Performance Tuning and High Availabilty Best Practices
If you're not failing 90% of the time, then you're probably not working on
sufficiently challenging problems.
High Performance and Scalability with MagnoliaMarc Korthaus
CEO SysEleven GmbH
Magnolia Conference 2014
About SysEleven Born out of a macnews.de (1997 - 2010†)
Founded 2007 in Berlin, 2 datacenters, 50 people (40+ engineers), ISO 27001, Akamai Partner, SuperCache Developer, specialized in Application Management, fully virtualized datacenter, a lot of big iron
„If there are two or more ways to do something, and one of those ways can result in a catastrophe, then someone will do it.“
Why yet another Hosting Company?
• Way to expensive offers from enterprise hosters
• nobody offered application expertise for scaling applications
• strict budget but application knowledge
• … pressure makes diamonds
But some definitions first…
Scalabilty … is horizontal aka means capacity, serve more users at the same time, reached
via clustering (Session sharing, Shared Storage, Load balancing etc.),
Performance … is vertical, is about zippiness, means to optimize for time to first byte and general load time, needs bigger iron, caching (much cheaper, just do it) and
application optimization (a must).
!
… which actually means:
Capacity (Our job)
Performance (your job)
Typical CMS Setup
Clustered Magnolia Setup
Typical eCommerce Server Setup
MySQL Master
F5 Loadbalancer
Shared Storage
MySQL-Replikation
Application Cluster
SuperCache (NginX & Couchbase)
MySQL Slaves
DB Writes DB Reads
Prudsys Server
FACT Finder Cluster
Cache API
Backend & Syslog
Search
Application Server
Shared Storage
DBS
Log Processing
Recommendation Engine
Cache Frontend
Loadbalancing
Caching API
DBM
How to improve Performance
• „Normally“ bigger Iron will help, doesn´t really help with bad Application code
• SSDs (SAS vs. PCI cards), higher CPU frequency per Thread, more memory
• General approach: Optimize for „time to first byte“
• Database Tuning (slow query, read from slave, write to master)
• Profile your application: NewRelic, JProfiler, hprof, AppDynamics, DynaTrace
• Caching, Caching and Caching
Scalabality & Clustering• JackRabbit on different nodes for author and app server (you already knew that)
• Big Iron is good, too(Hardware Loadbalancers, Web Application Firewalls, DDoS-Protection incl.)
• High Performance Shared Storage (go with NetApp or Isilon - don´t even think of working with Linux NFS servers!)
• Database clusteringMaster/Master & Master/Slave, maybe even sharding?
• Elasticity, automatic deployment (Puppet, Chef, Go), DevOps in general
Caching: Ehcache, Varnish & SuperCache
• Good ol’ Ehcache is active by default (so good so far)
• Passive Caching (Varnish)serving full pages (or snippets via ESI), no advanced flushing out of the box, needs a lifetime per URL, no shared memory backend per instances, no warm restart, slow start is painful
• Active Caching (SuperCache)separates memory backend, infinite Cache Lifetime, active flushing via mapping service, API based, full flexibility
• Best is: Client for Magnolia is in development
SuperCache in detail
Hardware Loadbalancer
http-Request
Proxy incl. API(NginX + Lua)
Cache Memory Backend(Couchbase Server)
MagnoliaApplication Servers
Magnolia Authoring Server
Hit
Miss Tag
SuperCache - What´s in for you
• RESTful API (not for a single Application), mapping & tags per cache item
• infinite Cache Lifetime, no cold cache, even after startup
• memory & disk backend, multi level scalability
• DDoS proof, very little CPU resources are used (2 Threads)
• ESI as a way to combine content of different applications
• HTTP Layer 7 Manipulation
Cache hit rates
Customer C
MISS 29 %
HIT 71 %
Measure your Hitrate!
Customer B
MISS 87 %
HIT 13 %
Customer A
MISS!50 %
HIT!50 %
How to ruin your cache hit rate
• Put sessions in URL
• Do lot of AB-Testing (and implement it lousy)
• make a lot of content updates (≥ thousands of updates per day)
• not having a page with content but a search only site(…yes, it´s a trend: „Don´t use a CMS, just do a search!“)
Time to first byte
Application
some other cache :-)
SuperCache
0,00 Sec. 0,50 Sec. 1,00 Sec. 1,50 Sec. 2,00 Sec.
If you're not failing 90% of the time, then you're probably not working on
sufficiently challenging problems.