Developing a database server: software engineer's view
-
Upload
laurynas-biveinis -
Category
Software
-
view
805 -
download
0
Transcript of Developing a database server: software engineer's view
Developing a Database Server: Software Engineer’s ViewLaurynas Biveinis / Percona laurynas.biveinis@{gmail|percona}.com Big Data Strategy 2015 Vilnius
Which database server?
Percona Server
http://www.percona.com/software/percona-server
A drop-in compatible fork of MySQL
An open-source, relational database management system
Approaching 2,000,000 downloads
A part of MySQL ecosystem
Enabled by GNU General Public License
Forks abound
Healthy and thriving
Lots of politics
The main players, pt 1
The main players, pt 2
The main players, pt 3 Big Web Patches
The main players, pt 4
The main players, pt 5
The ecosystem is fragmented, but is it healthy?
One measure is code flow between the forks
A case of super_read_only
A case of super_read_onlyFacebook patch implemented it first
Facebook contributed it to WebScaleSQL
A case of super_read_onlyFacebook patch implemented it first
Facebook contributed it to WebScaleSQL
Percona Server merged it from WebScaleSQL, sent some bugfixes back to WebScaleSQL
A case of super_read_onlyFacebook patch implemented it first
Facebook contributed it to WebScaleSQL
Percona Server merged it from WebScaleSQL, sent some bugfixes back to WebScaleSQL
Oracle re-implemented it from scratch for the next major MySQL release
A case of super_read_onlyFacebook patch implemented it first
Facebook contributed it to WebScaleSQL
Percona Server merged it from WebScaleSQL, sent some bugfixes back to WebScaleSQL
Oracle re-implemented it from scratch for the next major MySQL release
MariaDB did not like it
Code is flowing (mostly) everywhere Coopetition
Back to Percona Server
Tracks MySQL closely
Diagnostics and management
Performance and scalability
Why diagnostics and management?
Early Percona Server:
Ad-hoc patch for extra diagnostics by Percona consultants
Get billed-per-hour work done more efficiently
Why (InnoDB) performance and scalability?
In 2010, InnoDB was performing worse on a 4-core machine than on 1-core one
And fixes were not forthcoming at the time
Addressed the need then, built the reputation since
Why not other features?
Feature benefit / feature cost ratio has to be very, very high
Case 1: implement low-hanging fruits
Case 2: implement extremely beneficial features
No rewrites, no refactorings, no code base cleanups
“Why not other features” brings us to lessons learned
Lesson 1: stand on the shoulders of giants
You probably do not need to write a DBMS from scratch
So find a good project to fork
Lesson 2: do not diverge
Do not add a single line of code difference without a very good reason
Unless your engineering team is as big as the upstream one
Improvements such as O(n2) -> O(n log n) algorithms are often not good enough in cold code paths
Plugins are very good
Lesson 3: listen to usersEasier said than done, especially if done right
Listening and then ignoring / downplaying users’ pain
Listening to wrong users
We have the best users! :)
$$$ / €€€ add weight to users’ opinions
Both right and wrong
Lesson 4: Continuous QC
Was not something Percona Server had on Day One
MySQL always had an automated feature/regression testsuite
But 3rd parties did not always add tests for their features
Step 1: require developers to actually run the testsuite
Step 2: Jenkins per-push
Step 3: …
Lesson 4: wrong ways and slightly less wrong ways to do performance
A Performance Graph
0
10000
20000
30000
40000
Product A Product B
A Performance Graph
0
10000
20000
30000
40000
Product A Product B
PRODUCT B IS BETTER !!1!
Same performance graph, different view
0
20000
40000
60000
80000
00:00 00:01 00:02 00:03 00:04 00:05 00:06
Product A Product B
Is Product B still better?
How to provision capacity for B?
What response time guarantee will it give?
Will your automated failover work correctly in the presence of stalls?
0
20000
40000
60000
80000
00:00 00:03 00:06
Engineering low variance > engineering max peak performance
Where does variance come from anyway?
From the query code path requesting resources with variable availability
C, C++, CPU, memory: caches, heap, mutexes, rwlocks
Memory/disk: data on disk, which could be cached
RDBMS: free space on WAL log etc
Client-server and clusters: network roundtrips
Database servers love being in homeostasis
All the required resources for queries readily available
In the presence of unpredictable load
Do not make query threads work for this
Monitor them in background and make them available as needed
In the presence of unpredictable workload
If you want to develop a DBMS:
Find an existing one to fork!
And then do not diverge
Listen to your users
Control quality continuously
Ensure stable performance