Continuous Integration on Steroids
-
Upload
alexander-akbashev -
Category
Engineering
-
view
209 -
download
3
Transcript of Continuous Integration on Steroids
Continuous Integration on SteroidsAkbashev AlexanderHighload++ | November 07, 2016
Agenda
01. CI in HERE 02. Monitoring 03. Scalability 04. Jenkins 05. Nightmares Plugins 06. Morale 07. Q&A
01Continuous Integration in HERE
Every change goes through validation pipeline
Gerrit Gerrit Plugin
Pre-submit Trigger
Pre-submit Trigger
Build
Build
Build
Build
Build
TestsTestsTestsTestsTestsTests
TestsTestsTestsTests
TestsTests
Tests
Feedback goes from tests back to Gerrit
Gerrit Gerrit Plugin
Pre-submit Trigger
Pre-submit Trigger
Build
Build
Build
Build
Build
TestsTestsTestsTestsTestsTests
TestsTestsTestsTests
TestsTests
Tests
Feedback comes from every pipeline
Gerrit Gerrit Plugin
Pre-submit Trigger
Pre-submit Trigger
Build
Build
Build
Build
Build
TestsTestsTestsTestsTestsTests
TestsTestsTestsTests
TestsTests
Tests
Numbers
100k+ builds per day ~1.5k concurrent builds 1.3-2.5k executors
• Each “build” is execution of one build/test job
• Total number correlates with number of commits
• Number of builds is not so important as number of commits
• Big throughput is extremely important
• Morning commit • Before lunch • “Last attempt for today”
• Raised on-demand • Health checks • Jenkins strategy is not
optimized for cloud
02Monitoring
Collects information about every build in system
Groovy Event
Listener Plugin
Jenkins build Fluentd InfluxDB Grafana
Collects information about every build in system
Groovy Event
Listener Plugin
Jenkins build Fluentd InfluxDB Grafana
JVM stats are the best “canary”
Groovy Event
Listener Plugin
Jenkins build Fluentd InfluxDB Grafana
Jenkins JVM
03Scalability
What do we want to achieve?
What do we want to achieve?
Keep feedback time (< 20 min.)
What do we want to achieve?
Keep feedback time (< 20 min.)Test as much as possible
What do we want to achieve?
Keep feedback time (< 20 min.)Test as much as possible… with debug symbols
What do we want to achieve?
Keep feedback time (< 20 min.)Test as much as possible… with debug symbols… and code coverage information
What do we want to achieve?
Keep feedback time (< 20 min.)Test as much as possible… with debug symbols… and code coverage informationand on physical devices
How to scale
Increase number of executors Minimize job execution time Smart testing
How to increase number of executors?
EC2 Plugin TestDroid
How to minimize job execution time
How to minimize job execution time
Split tests by type
How to minimize job execution time
Split tests by typeParallel execution
How to minimize job execution time
Split tests by typeParallel executionNode as cache storage
How to minimize job execution time
Split tests by typeParallel executionNode as cache storageShared compiler cache
How to minimize job execution time
Split tests by typeParallel executionNode as cache storageShared compiler cacheProfiling!
04Jenkins
Is Jenkins so slow or we are doing something wrong?
Is Jenkins so slow or we are doing something wrong?
Jenkins is ok.
Is Jenkins so slow or we are doing something wrong?
Jenkins is ok.But…
Surprise #1
Rotation costs a lot
Surprise #2
It works much better with nginx
less jenkins.access.log | tail -n1000 | grep urt=\"\-\" | wc -l407
Surprise #3
Some buttons are very dangerous
Surprise #3
Some buttons are very dangerous
Slave
Slave
One fundamental issue
Master
Slave
Slave
Slave
Slave
Slave
Slave
Users
What can you find in heap dump of OOM-Killed Jenkins?
What can you find in heap dump of OOM-Killed Jenkins?
Console logs
Console logs
Should be less than X MB Verbose output goes to file “>” and “tee” are amazing!
What can you find in heap dump of OOM-Killed Jenkins?
Console logs
What can you find in heap dump of OOM-Killed Jenkins?
Console logs Build history
Build history
2000 entities or 3 days Efficient rotator
What can you find in heap dump of OOM-Killed Jenkins?
Console logs Build history
What can you find in heap dump of OOM-Killed Jenkins?
Console logs Build history Build artifacts
Build artifacts
Push to S3 directly from slaves Don’t store anything on master
05Nightmares Jenkins Plugins
Limit of number of builds
20K
Groovy Event Listener Plugin
all events synchronized groovy compilation
fixed since 1.010 (Mar 10, 2016)
Limit of number of builds
40K
Warnings Plugin
Just another parser of console log
parseConsole is “deprecated” parseFile is allowed 0 warnings are very appreciated :)
Limit of number of builds
60K
Timestamper Plugin
Tail needs not only “tail”
fixed since 1.8.5 (Aug 31, 2016)
Limit of number of builds
60K
EC2 Plugin
Full list of all images in AWS
fixed since 1.35 (Jun 30, 2016)
Limit of number of builds
90K
Robot Framework Plugin
Green chart costs 100 times more
Replaced by xUnit Plugin
Limit of number of builds
120K
Build Failure Analyzer Plugin
One regexp One stream One thread
PR-57 is not accepted yet
Limit of number of builds
140K
Cleanup Workspace Plugin
`ü` breaks everything
PR-29 is not accepted yet
06Morale
Final recommendations
Final recommendations
Think about scalability at first place
Final recommendations
Think about scalability at first placeFlakiness could be a huge problem
Final recommendations
Think about scalability at first placeFlakiness could be a huge problemReduce memory allocations
Final recommendations
Think about scalability at first placeFlakiness could be a huge problemReduce memory allocationsCache as much as possible
Final recommendations
Think about scalability at first placeFlakiness could be a huge problemReduce memory allocationsCache as much as possibleFailing builds can be expensive
Workflow
Slowness? Profile! Fix! Contribute!
Open source collaboration
Let’s make our life better ;)
Full list of our contributions related to this talk
• Jenkins • ccache • clcache • EC2 Plugin • S3 Plugin • FluentD Plugin
• BuildRotator Plugin • Groovy Event Listener Plugin • Timestamper Plugin • Robot Framework Plugin • Build Failure Analyzer Plugin • JVM GC Log Plugin for
FluentD
Thank youContact
Akbashev Alexander GitHub: Jimilian E-mail: [email protected]