BlackRay FOSS Asia 2010
-
Upload
fschupp -
Category
Technology
-
view
349 -
download
1
description
Transcript of BlackRay FOSS Asia 2010
![Page 1: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/1.jpg)
1 FOSS Asia 2010
![Page 2: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/2.jpg)
2 FOSS Asia 2010
The State of the Engine
➔ Brief Technology Overview➔ New SQL Parser (lemon/quex)➔ User Defined Functions➔ BlackRay as a storage engine➔ Outlook: Realtime Data Updates
![Page 3: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/3.jpg)
FOSS Asia 2010
Brief BlackRay History
![Page 4: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/4.jpg)
4 FOSS Asia 2010
What is BlackRay?
● BlackRay is a relational, in-memory database● Supports SQL, utilizes PostgreSQL drivers● Fulltext (Tokenized) Search in Text fields● Object-Oriented API Support● Persistence via Files, Transaction support● Scalable and Fault Tolerant● Open Source, Open Community● Available under the GPLv2
![Page 5: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/5.jpg)
5 FOSS Asia 2010
Current Release
Current 0.10.0 – Released December 2009● Complete rewrite of SQL Parser (boost::spirit2)● PostgreSQL client compatibility (via network protocol) to
allow JDBC/ODBC... via PostgreSQL driver● Rewritten CLI tools● Major bugfixes (potential memory leaks)● Better Authentication suppor for Instances
![Page 6: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/6.jpg)
FOSS Asia 2010
Technology Overview
![Page 7: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/7.jpg)
7 FOSS Asia 2010
Why call it Data Engine?
● BlackRay is a hybrid between a relational database and a search engine thus we call it „→ data engine“
● Database features:● Relational structure, with Join between tables● Wildcards and index functions● SQL and JDBC/ODBC
● Search Engine Features● Fulltext retrieval (token index)● Phonetic and similar approximation search● Extremely low latency
![Page 8: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/8.jpg)
8 FOSS Asia 2010
BlackRay Architecture
C++ API
Java API
Management Server
InstanceServer
Data Universe(RAM Resident)
<
Redo Log
Snapshots
SQLInterface
Postgres*Clients
L5: Multi-Values
L4: Multi-Tokens
L5: Multi-Values
L3: Row Index
L2: Postings
L1: Dictionary
5-Perspective Index
Python API
PHP API
Python API
C# API
![Page 9: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/9.jpg)
9 FOSS Asia 2010
Data Universe
● BlackRay features a 5-Perspective Index ● Layer 1: Dictionary● Layer 2: Postings ● Layer 3: Row Index● Layer 4: Multi-Token Layer● Layer 5: Multi-Value Layer
● Layer 1 and 2 comprise a fully inverted Index● Statistics in this Index used for Query Plan Building● All data - index and raw output - are held in memory
![Page 10: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/10.jpg)
10 FOSS Asia 2010
Core BlackRay Features
● Standard loaders enable high performance loading of data into tables
● Persistence is done via file based snapshots● Snapshots enable data versioning and simple
backups● Basic ACID Transaction complianc is implemented
in BlackRay, without crash recovery support.
![Page 11: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/11.jpg)
11 FOSS Asia 2010
Query Interfaces
● BlackRay implements the PostgreSQL server socket interface and binary APIs in Java, C++ and Python
● PostgreSQL compatible drivers can be utilized against BlackRay (JDBC/ODBC)
● Native API enables object oriented data access ● Performance of native APIs currently is substantially
better than SQL via PostgreSQL drivers● Dynamic query building is very efficient with native
APIs
![Page 12: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/12.jpg)
FOSS Asia 2010
A New SQL Parser
![Page 13: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/13.jpg)
13 FOSS Asia 2010
A New SQL Parser - again?
● The 0.10 release included a much improved SQL parser, built with boost::spirit
● Quite solid, fast and simple to use● However, boost deprecates spirit1● boost::spirit2 is not compatible to spirit1, requiring a
rewrite anyways● Our impression: spirit2 requires too many resources
and large grammars result in huge generated files● Also: spirit and C++ templates do not mix well
![Page 14: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/14.jpg)
14 FOSS Asia 2010
What would be a better choice?
● Flex/Bison: ● The obvious choice of MySQL and PostgreSQL● Two-step compile process, generates C not C++● No Unicode support
● ANTLR: ● Odd grammar rules, not optimal for C++● Recursive Descent parsers are not suited for SQL
● Lemon/QUEX● Our new choice ;)
![Page 15: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/15.jpg)
15 FOSS Asia 2010
Lemon/Quex: Our experience
● Lemon:● Lemon is part of SQLite● Much more intuitive syntax than Flex syntax
● Quex:● Generates tokenizers in C++● Unicode and external Parser support● Partially buggy but all issues were fixed witihn days
● Synopsis: Lemon/Quex are like Bison/Flex, just with Unicode and C++ support and maybe easier to debug
![Page 16: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/16.jpg)
16 FOSS Asia 2010
Current Progress
● Basic SQL Features are ported from spirit to Lemon/Quex
● The „issue-77“ branch contains all recent SQL parser code
● Unit-Testing and Database level testing very solid● Will be part of the 0.11 release
![Page 17: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/17.jpg)
17 FOSS Asia 2010
Recent Additions
● Support for simple (single column) User Defined Functions is now complete
● Query portion (no subselect, no aggregate functions) is very stable
● Data Definition Language was added recently● CREATE SCHEMA ● CREATE TABLE● ALTER TABLE ● Index is created dynamically, so no CREATE INDEX
required
![Page 18: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/18.jpg)
FOSS Asia 2010
User Defined Functions
![Page 19: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/19.jpg)
19 FOSS Asia 2010
User Defined Functions
● BlackRay was designed with support for Index functions that operate on data in tables
● Functions pre-compute index results, improving speed and enabling queries that are not possible otherwise
● Functions are called on data load, and also on queries.
● Functions must not maintain state outside of tables of the same instance they operate on.
![Page 20: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/20.jpg)
20 FOSS Asia 2010
A Sample Function
● Using functions in BlackRaySELECT name FROM employee_table WHERE fx_phonetic (name) = 'mike';
● Functions need to be loaded beforehand:CREATE FUNCTION fx_phonetic(varchar, varchar) RETURNS int AS 'DIRECTORY/funcs', 'phonetic' ;
● The function must implement the BlackRay default function signature, which is almost identical to the MySQL and PGSQL signatures
![Page 21: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/21.jpg)
21 FOSS Asia 2010
Current State
● User Defined Function Repository fully implemented● All built-in functions ported to be compatible to User
Defined Functions● SQL support for User Defined Functions under way● Will be part of the 0.11 release
![Page 22: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/22.jpg)
FOSS Asia 2010
Our Adventure: BlackRay As A Storage Engine
![Page 23: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/23.jpg)
23 FOSS Asia 2010
Why even bother?
● In Fall 2009, we embarked on a little adventure to implement BlackRay as a storage engine
● The old Engine had only a minimal SQL interface and we lacked the expertise to build it ourselves
● Plugging into the MySQL ecosystem seemed like a very pleasant choice
● The features of BlackRay would make it a good query cache for large disk tables.....
![Page 24: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/24.jpg)
24 FOSS Asia 2010
Our First Problem
BlackRay does not support a simple table scan.....● It may seem strange, but due to it's design as an in-
memory index, we do not separate table and index● Each column index basically is the data of the column● BlackRay distinguishes select and output columns, both
of which remain in RAM● The index therefore was never designed to be forced
back into a row format, for a simple table walk
![Page 25: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/25.jpg)
25 FOSS Asia 2010
Possible Solution?
So, we can walk the Index instead?● Rather than scanning the table, it is possible to scan the
index instead● This only works for the columns markes „searchable“● Causes nasty errors when trying to select against result-
only columns● In tokenized index columns, getting the data back out
means concatenation with a blank between values – not nice, as tokenizing can follow complex rules
● Requires Refactoring of our Layer 3 (Row-Index)
![Page 26: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/26.jpg)
26 FOSS Asia 2010
Next Issue
Optimizing Queries● The BlackRay Optimizer uses the Layer 1/2 (Inverted
Index) and Layer 4 (Multi-Tokens) Data to chose a Query Path
● In BlackRay „SELECT text FROM t WHERE text LIKE '*pattern1*' AND text LIKE '*pattern2* is extremely efficient as the inverted index has all the data
● Even with OR this is an efficient Query, due to the fact that we can immediately chose the smaller query first and eliminate double matches
![Page 27: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/27.jpg)
27 FOSS Asia 2010
Next Issue
● Optimizing in the Storage Engine Interface?● In BlackRay, the Optimizer uses the AST from the SQL
Parser to figure out what to optimize● Based on a field or single Index level, the number of
matches really are not useful● Without utilizing the Layer2 and Layer4 structures, we
lose performance by several orders of magnitude● Personal Opinion: The MySQL Optimizer really seems to
like table scans, and tricking it with random vs sequential read cost did not do the trick
![Page 28: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/28.jpg)
28 FOSS Asia 2010
Functions in the Index
● Columns can take functions to be used on the data upon indexing, and when select is carried out
● The most common functions are – TOKENIZE – to support multi-token indexes– PHONETIC – match against defined phonetic rules– ALIAS – match a token against words with similar meaning
● Internally these functions could be considered Meta-Columns on the Index
● To be able to chose the proper column, we need to know what function was used in the select
![Page 29: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/29.jpg)
29 FOSS Asia 2010
Functions in the Index
Consider this Query:SELECT text FROM t WHERE fx_phonetic(text) LIKE 'maier%';
● Functions can take more than one parameter, and may be nested
● We could not quite figure out how to explain this to the MySQL Parser
● The function data would need to be available to the Index to chose where to look
![Page 30: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/30.jpg)
30 FOSS Asia 2010
Threading Models...
● BlackRay has a highly optimized Threading Model● In RAM, we do not expect I/O-waits, so a model of
two dedicated Threads per CPU core works really well
● Locking in the Index is built around this model● „One Thread per Conection“ requires at least a
careful review of the way critical data structures are accessed
![Page 31: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/31.jpg)
31 FOSS Asia 2010
.... Our Conclusion
● Currently, BlackRay really does not fit too well into the storage engingine architecture
● Did we lose all hope? Absolutely not.....● BlackRay Applications could really benefit from
being able to utilize MySQL features, including the Archive Engine as well as temporary tables in Heap
● Thanks to the excellent Blog and postings by Brian Aker, which allowed us to not make all beginner mistakes ourselves
![Page 32: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/32.jpg)
FOSS Asia 2010
Outlook: Realtime Update/Insert
![Page 33: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/33.jpg)
33 FOSS Asia 2010
Current Challenges
● Bulk Updates● BlackRay supports Insert and Delete via the Bulk Loader ● Updates are done via Insert & Delete
● Insert/Delete via API● An API exists for Insert/Delete● The Insert/Delete API is separate from the Query API● Both APIs cannot be used in the same Thread
● Insert/Delete via SQL● Currently Insert/Delete are not available via SQL
![Page 34: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/34.jpg)
34 FOSS Asia 2010
Supporting Insert/Delete
● Pull together Insert/Delete and Query APIs● Take out the separate APIs● Unified API will then support transactions
● Enable Insert/Delete via SQL● Extend the SQL Grammar to include INSERT/DELETE● Implement the functions via the unified API
● The Bulk Loader and SQL● Rewrite of the Bulk Loader to utilize the unified API,
rather than SQL
![Page 35: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/35.jpg)
35 FOSS Asia 2010
Performance Impact
● Insert and Delete has a severe performance impact on parallel queries
● Locking needs to be utilized to ensure transactional integrity, causing queries to stall on data modification
● Currently BlackRay uses sorted lists for the data ductionary and the other index layers
● For indeces that have frequent changes, it may be much more desirable to utilize other basic data structures underneath the index
![Page 36: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/36.jpg)
FOSS Asia 2010
Project Roadmap
![Page 37: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/37.jpg)
37 FOSS Asia 2010
Immediate Roadmap
● Planned 0.11.0 – Due in Fall 2010● Pluggable Function architecture (loadable libraries)● Make all index functions available in SQL● Support for Prepared Statements (ODBC/JDBC) ● Improved thread and memory management (Perftools?)
● BlackRay Admin Console (Remora) 0.11● Engine Statistics via GUI● Cluster Node management
![Page 38: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/38.jpg)
38 FOSS Asia 2010
Shortterm Roadmap
● Planned 0.12.0 – Due in February 2012● Realtime INSERT/UPDATE/DELETE● SQL to support subselect● Default aggregate functions (SUM/AVG/....)● Fix several potential memory leaks (smart pointers)
● The 0.12 release should be the last pre-GA release
![Page 39: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/39.jpg)
39 FOSS Asia 2010
Midterm Roadmap
● Scalability Features● Sharding & Partitioning Options● Federated Search
● Fully portable snapshot format (across platforms)● Query Performance Analyzer● Improved Statistics Module with GUI● BlackRay as a Storage Backend for SUN OpenDS
LDAP Engine
![Page 40: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/40.jpg)
40 FOSS Asia 2010
Midterm Roadmap
● Security Features● Improved User and Access Control concepts● SSL for all connections● External User Store (LDAP/OpenSSO/PAM...)
● Increased Platform support● Windows 7 and Windows Server platforms● Embedded platforms
● Other, random features by popular request.
![Page 41: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/41.jpg)
FOSS Asia 2010
The Team behind BlackRay
![Page 42: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/42.jpg)
42 FOSS Asia 2010
SoftMethod GmbH
● SoftMethod GmbH initiated the project in 2005 ● Company was founded in 2004 and currently has
10 employees● Focus of SoftMethod is high performance software
engineering● Product portfolio includes telco/contact center and
LDAP applications● SoftMethod also offers load testing and technical
software quality assurance support.
![Page 43: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/43.jpg)
43 FOSS Asia 2010
Development Team
● Felix Schupp, Initiator and Project Sponsor● Thomas Wunschel, Architect and Lead Developer● Mike Alexeev, Key Contributor (SQL/Functions)● Souvik Roy, Performance Analysis and Tools● Simon Courtenage, C++ and boost expert
![Page 44: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/44.jpg)
FOSS Asia 2010
Wrap-Up
![Page 45: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/45.jpg)
45 FOSS Asia 2010
What to do next
● Get BlackRay:● Register yourself on http://forge.softmethod.de● SVN checkout available at
http://svn.softmethod.de/opensource/blackray/trunk● Get Involved
● Anyone can register and create tickets, news etc● We have an active mailing list for discussion as well
● Contribute● We require a signed Contributor agreement before being
allowed commit access to the repository
![Page 46: BlackRay FOSS Asia 2010](https://reader034.fdocuments.net/reader034/viewer/2022052602/559ea3651a28abe2618b46e8/html5/thumbnails/46.jpg)
46 FOSS Asia 2010
Contact Us
● Website: http://www.blackray.org● Twitter: http://twitter.com/dataengine● Facebook http://facebook.com/dataengine● Mailing List: http://lists.softmethod.de● Download: http://sourceforge.net/projects/blackray
● Felix: [email protected]