Post on 16-Apr-2017
PowerPoint-Prsentation
Dynamic Language VMs
Ruby 1.9
Lourens Naude, WildfireApp.com
Background
Independent Contractor
Ruby / C / integrations
Well versed full stack
Architecture
WildfireApp.com
Social Marketing platform
Large whitelabel clients
Bursty traffic Lady Gaga, EA, Gatorade etc.
RUBY VM INTERNALS ?
A GOOD CRAFTSMEN KNOWS HIS TOOLS
A BAD CRAFTSMEN BLAMES HIS TOOLS
Typical public facing apps
Interaction patterns
Request / response
Time
Event driven
Overheads
Data transfer (I/0)
Serialization / coercion (CPU)
VM allocation, symbol tables etc. (CPU + mem)
Business requirements (CPU)
Ruby daemon - strace
Process 5856 detached% time calls syscall------ ------- ------------- 89.69 5092 recvfrom 5.35 5093 sendto 2.49 26300 stat 2.05 11004 clock_gettime
Ruby daemon - ltrace
% time calls function------ -------- -------- 95.78 635173 memcpy 1.38 25862 malloc 0.79 14984 free 0.60 11403 strcmp
System Resources
Data latency
CPU cache
Memory local
Disk - local
Memory + disk - remote
Record retrieval with ORM
Fetch results (local/remote memory + disk)
Serialization + conversion (CPU)
Object instantiation (CPU + memory)
Optional memcached (local or remote memory)
RUBY ?
Conversion rows to hash
Benchmark.bm do |b| b.report do1000.times{ ActiveRecord::Base.connection.select_rows "SELECT * FROM users" } endend user system total real 0.300000 0.040000 0.340000 ( 0.505095)
Conversion rows to objects
Benchmark.bm do |b| b.report do1000.times{ ActiveRecord::Base.connection.select_all "SELECT * FROM users" } endend user system total real 0.510000 0.050000 0.560000 ( 0.719201)
Instantiation
Benchmark.bm do |b| b.report do 100_000.times{ 'string'.dup } end end user system total real 0.040000 0.000000 0.040000 ( 0.043791)
Serialization load + dump
Benchmark.bm do |b| b.report do 100_000.times{ Marshal.load(Marshal.dump('ruby string')) } end end user system total real 1.660000 0.010000 1.670000 ( 1.699882)
Roadmap
VM Architecture
Symbol table
Opcodes / instructions
Dispatch
Optimizations
Ruby language
Object model
Garbage Collection
Contexts and control flow
Concurrency
VM ARCHITECTURE
Changes
Ruby 1.8 artifacts
Parser && AST nodes
Object model
Garbage Collection
No immediate performance gains for String manipulation etc.
Codegen phase
Better optimization hooks
Faster runtime
AST AND CODEGEN
Abstract Syntax Tree (AST)
Structure
Grammar representation
Annotations attach semantics to nodes
Possible to refactor the tree more nodes, less complexity
Example nodes
Literals, values and assignments
Method calls, arguments and return values
Jumps if, else, iterators
Unconditional jumps exceptions, retry etc.
Code generation
How it works
Converts the AST to compiled code segments
Reduces a tree to a linear and ordered instruction set
Fast execution no tree walking + native code
Workflow
Preprocessing AST refactoring (!YARV)
Codegen, nodes instruction sequences
Postprocessing replace with optimal instruction sequences (peephole optimization)
Pre and postprocessing phases may be multiple passes
LOOKUPS
Symbol / Hash tables
How it works
Constant time access to int/char indexed values
Table defaults: 11 bins, 5 entries per bin
Bins++, sequential lookup inside bins
Lookup of methods, variables, encodings etc.
Symbol
Entity with both a String and Number representation
!(String || Symbol), points to a table entry
Developer identifies by name, VM by int
Immutable for performance watch out for memory
VM INSTRUCTIONS
VM instructions / opcodes
Stateless functions
80+ currently
Generated from definitions at interpreter compile time
(existing ruby requirement for 1.9)
Instruction / opcode / operands notation
Categories and examples
variable: get or set local variable
class / module: definition
method / iterator: invoke method, call block
Optimization: redefines common +, > 8 * 1.8=> 14.4
>> 8 * 1.8 * 1.8
=> 25.92
>> 8 * 1.8 * 1.8 * 1.8
=> 46.656
>> 8 * 1.8 * 1.8 * 1.8 * 1.8
=> 83.9808
Heap growth mid to large app
=> 83.9808>> 8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8
=> 151.16544
>> 8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8
=> 272.097792
>> 8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8
=> 489.7760256
Slot structure
typedef struct RVALUE { union {
struct {
VALUE flags; /* 0 when free */
struct RVALUE *next;
}free;
struct RObject object;
struct RFloat float;
...
Pointer layout
Self describing
Program data area and heap
RVALUE union can accommodate any ruby object
Frames, variable structures etc. well defined also
40 bytes (64 bit arch) represents a slot
Free list points to the next free slot
Ruby heap VS OS heap
Ruby heap
20 bytes represents a slot
slot points to OS data, on the OS / system heap
OS heap
Thus a 20 byte slot can reference a 2MB chunk on the system heap
CRuby: Mark and Sweep
Conservative
Cannot determine with certainty if a value references an object assume it's in use
Two phase implementation
Mark phase: identifies and flags reachable objects from the current program context
Sweep phase: iterates through the object space and
free all objects not marked
unmark marked objects
Concerns
Performance
Runtime pauses
Work proportional to heap size
Prone to memory fragmentation (no compaction)
Recursive
Triggers
8m malloc calls triggers GC
Every 8MB allocated triggers GC
Not enough heap reserve
GC in action
# 4 objs, 1 Array, 3 Stringsary1 = %w(a b c)
ary2 = %w(d e f)
# both ary1 and ary2 is reachable
ary1 = nil
# ary1 and it's contents is unreachable
Generational GC
Observations
Vast majority of objects are short lived 80%+
Expensive to account for long lived objects
Parition by age and frequently collect short lived ones
How it works
Restrict GC to the most recently modified slots
These sub heaps are referred to as generations
Perform a full GC only when the youngest generationfails to meet memory requirements
CONCURRENCY
Threading
Changes
Native OS Threads
Ruby Thread == pthread
Multiple cores ftw!
but
Syscalls schedule, synchronize and create
Much more expensive to spawn and switch than green threads
Global VM Lock (GVL)
Global VM Lock (GVL)
How it works
Thread that owns the GVL is allowed to execute
Blocking operations should release the GVL
Automatically released when scheduled
C extensions : author does not concern with syncronization
Blocking VM operations
I/O
blocking reads and writes
DNS resolution or connects
Often has huge handshake overheads
Computations, processes and locks
Expensive Bignum ops blocked 1.8 interpreters
Process.waitpid
File locks
Releasing the GVL
Stable API
Blocking function: slow system call / computation
Unblock function: called on Thread interrupt
Pitfalls
Cannot access VALUEs (objects) in blocking functions
No integration with Ruby's exception / error handler
Lightweight Concurrency
Fibers
Coroutines 4k stack size
Very fast user space context switches
Cooperative scheduling required
Fiber.yield pauses the activation record, which keeps context across multiple calls
Use cases
Generators
Blocking I/0 - Neverblock
In the pipeline
MVM: Multiple Virtual Machines
Shared process state
Sandboxed per VM application state
Distribute VMs across available cores
Message passing for inter VM communication
Most Ruby deployments aren't thread safe
MVM is well suited for this
QUESTIONS ?