CouchConf-SF-Couchbase-Performance-Tuning
-
Upload
couchbase -
Category
Technology
-
view
3.002 -
download
3
Transcript of CouchConf-SF-Couchbase-Performance-Tuning
Tuning Couchbase Tim Smith, Engineer
I am:
● http://github.com/couchtim
● Support Engineer
● Sales Engineer
Introduction
Introduction
You are:● Using Membase
● Using CouchDB
● Using it in production
● 100ms response
● 2ms response
● Smarter than I am
Simple, Fast, Elastic*
Membase● 5-minute cluster setup
● Memcached API
● Memcached-fast responses (working set)
● Saturate network with minimal CPU
● Pleasant admin UI
● Rebalance on the fly
Simple, Fast*, Elastic
CouchDB● Webophilic: JSON, RESTy HTTP, Javascript
● Append-only, crash-only, MVCC (hide the fancy bits)
● Bidi replication, /_changes
● Replication & app-level sharding scale out
● Your data. Everywhere.
Simple*, Fast, Elastic
Couchbase Server 2.0● Auto-sharding, clustered elasticity
● Caching, predictable low-latency
● Scatter-gather, incremental map/reduce
● Rich hands-on-your-data features
API Combo
Couchbase Server 2.0● Both memcached binary protocol
● And CouchDB HTTP protocol
● SDK provides consistent interface
● Optionally synchronous persistence
Lots to Learn…
We welcome your discoveries, ingenious solutions and feedback.
But we’re not starting from scratch!
Best practices from Membase and CouchDB still apply.
● Hardware and system resources
● Client and API usage
● Data modeling
Hardware
Membase:● RAM, RAM, RAM!
● Fast disk (throughput) helpful for write-intensive applications, and disk-heavy ops (rebalance)
● Network bandwidth may become an issue
● Adding more nodes can help with all three
Hardware—RAM
Proper cluster sizing is #1● Couchbase.org wiki: Sizing Guidelines
● Main variables include total number of items, size of working set, replicas and per-item overhead
● Under-provisioning reduces elasticity
Hardware—Disk
Fast disk not too important…until it is● Rebalance can move a lot of data around
● Especially when disk > RAM
● Warm-up time after node restart
● Under-provisioning reduces elasticity
● More nodes spread out the I/O
● SSD, RAID, the usual stuf
Hardware
CouchDB:● CPU usage can be signifcant with view
updates, replication flters, formatting via /_list and /_show
● Fast disk helpful for many applications, and disk-heavy ops (compaction)
● Separate data and view on diferent filesystems to improve I/O
● RAM can’t hurt
Hardware—Cloud
Cloud hosting brings variability● Disk bandwidth can occasionally drop
● Even identical instances may perform diferently
● Large instances more reliable
● More instances provide redundancy
● Best bang-for-the-buck still an open question
Configuration
Membase client● Use a Membase-aware smart client
(spymemcached for Java, Enyim for C#)
● Or, run moxi on the client host● Minimizes network hops, preserves
bandwidth● Value compression (often automatic)
Configuration
CouchDB client● Caching, Etag / If-None-Match
● Compression, Accept-Encoding: gzip
● Keep-Alive
● By the way, Couchbase Single Server has some killer performance increases (coming soon to an Apache CouchDB release)
API Usage
Memcached API● Binary protocol
● Multi-get and multi-set
● Incr, decr, append, prepend
● TTL expiration, get-and-touch
API Usage
CouchDB API● HEAD vs. GET, ?limit=1● ?startkey, not ?skip
● Use built-in reduce functions: _sum, _count, _stats; write views in Erlang
● Keep view index size in mind—emit just what you need
API Usage
CouchDB API● Use ?group_level to aggregate over
structured keys
● Emit null, and use ?include_docs to get more data (faster view generation)
● Emit more data, so ?include_docs isn’t needed (avoid random I/O on query)
Modeling—Doc Size
Bundle related info into one document● Fewer items → less caching overhead
● Reduce number of requests clients make
● Promotes server-side processing with _show functions
● More context available for flexible maps
Modeling—Doc Size
Break up serial items to separate docs● E.g., comments, events, other “feeds”
● Each entry is self-described
● Avoids write contention on a container
● Avoid read/write of container contents just to make a small addition
● May be gathered with map/reduce view
Modeling—Key Size
Use short key values● At the clustering layer, all keys are kept
in RAM, tracked for replicas, etc.
● 255 bytes max length, but prefer short keys
● At CouchDB layer, id is likewise used in many places, and short ids are more efficient
● Semantic keys
Modeling—Indexes
Consider other index types● Full-text integration
● Geo-spatial (can be used for non-spatial data, too)
● Hadoop connector w/ Couchbase Server (via TAP)
Modeling—K/V Tricks
Non-obvious models in key/value space● Example: level of indirection to “remove” a
bunch of keys without knowing their keys:
● Defne a master key, e.g. 'obj_rev': 3
● Defne subordinate attribute keys with the master value in the key name, e.g. 'obj_foo-3', 'obj_bar-3'
● Increment 'obj_rev', and rely on TTL to reap stale attribute items
Diagnostic Stats
Monitoring Couchbase Server● Ops/sec
● RAM usage vs. high/low water marks
● Growth of RAM usage (mem_used)
● Growth of metadata usage (ep_overhead)
Diagnostic Stats
Monitoring Couchbase Server● RAM ejections for active/replica data
(*eject*)
● Cache miss ratio (get_hits vs. ep_bg_fetched)
● Disk write queue size (ep_queue_size + flusher_todo)
● Disk space available
Diagnostic Stats
Error condition stats● Disk write errors (*failed*)
● Uptime resets
● Out of memory conditions (*oom*)
● Swap usage
What’d I Miss
Before questions, I want assertions.
That is, you’re smarter than I am● …and you’ve got more experience
● What’s the most important tip you know?
● What mistakes did I make?
Thank You
Tim Smith
http://github.com/couchtim
@couchtim