KDB-1200 manual 02 · · 2017-12-25Title: KDB-1200_manual_02 Created Date: 5/9/2013 1:27:12 PM
Introduction to kdb+ and kdb+ usage in Deutsche...
Transcript of Introduction to kdb+ and kdb+ usage in Deutsche...
Introduction to kdb+ and kdb+ usage in Deutsche Bank
Andrey Babanin, TCAKDB team/Mercury team 12/02/2015
Global Technology Deutsche Bank
• During last 10 years data volumes from the vast majority of exchanges increased
by several degrees
Exchange data increase
Number of trade and quotes on NYSE, bln. daily
2004 2005 2006 2007 2008 2009 2010 2011 2012
2.0
1.5
1.0
0.5
Global Technology Deutsche Bank
It became more obvious each year that traditional relational DBMS are not able
effectively cope with such huge amounts of data:
• Traditional RDBMS are not able to effectively receive and save tens and hundreds
thousand ticks per second
• RDBMS doesn’t contain special functionality to operate on time series
(data ordered by date and time)
• Also traditional languages are not convenient for handling huge arrays of data and
has no built-in support for such structures
Background
Global Technology Deutsche Bank
• KDB+ is mainstream product of KX systems
• KDB+ provides the same techniques for data in memory and on disk
• KDB+ is accompanied with strong interpreted language Q
• During years of development and promotion KX got most of the top banks and
investment companies as their clients
KDB+ technology
www.kx.com
Global Technology Deutsche Bank
KDB+ feature set
Column-based In
Memory DB
Stream data processing Capability
Integrated development and querying
language
Compression rates up to
10X
No performance penalties on dynamically run queries
Optimized In memory and
Disk Processing
Interfacing Capabilities with Python,
R, Matlab and others
Built-in map-reduce for
many internal
functions
Data volumes of
over 2 billion records per
day
Global Technology Deutsche Bank
• KDB+ is specifically developed for 64-bit architecture
• With built-in multi-threading
• Program code for KDB+ is quite compact and optimized
• KDB+ utilizes broader range of Intel processor instructions
• Tests shows that KDB+ performance many times higher in comparison with
other DBMS systems
Performance
Global Technology Deutsche Bank
• KDB+ is non-relational column-oriented database
• KDB+ joining algorithms and data in memory and on disk
• Built-in database clusterization over several parameters
Data storage features
Same data handling methods for different storage models
Global Technology Deutsche Bank
• Built-in clusterization over dates, physical partitions and columns
• KDB+ optimizes complicated insert and update queries
• Data types describing temporals, from nanoseconds to year, are built-in first-
class data types
• KDB+ uses dynamic indexing for string data and also allows to use customized
indexing scenarios
Two string types are available in KDB+:
• fast strings, which are indexed automatically and aliased with integers;
• simple char list, which is slow.
Data handling
Global Technology Deutsche Bank
• KDB+ designed for the chained architecture
• Unified file and network I/O
• KDB+ also supports HTTP and WebSockets
Data transmission
Global Technology Deutsche Bank
• KDB+ supports C code natively
• There are extension libraries for C/C++, Java, .Net, R, Matlab, Perl, Python
• KDB+ has base functionality to work with Thomson Reuters and Bloomberg
• Has libraries to support protocols like Tibco, LBM, MAMA/MAMDA
• Also a lot of KDB+/Q extensions are collected on the code.kx.com
• Several Windows clients/IDEs are available to work with KDB+ databases
Program interfaces and extensions
Global Technology Deutsche Bank
• KDB+ is based on lists, which are ordered collections allowing duplicates,
whereas SQL is based on sets, which are unordered collections of distinct
elements.
• KDB+ stores data as contiguous items in column lists, whereas an RDMS
stores data as fields within non-contiguous rows.
• KDB+ table operations are vector operations on columns, whereas SQL
operates on individual fields and rows.
Comparing KDB+ to an RDBMS
Global Technology Deutsche Bank
Major differences between an RDBMS and KDB+
Traditional RDBMS KDB+ Database
Table Creation Tables defined declaratively using DDL and created on disk.
Tables created functionally in the q language.
Data Persistence Tables and related metadata held in an opaque repository. Tables are stored by row.
Serialized q entities stored in the O/S file system. No separate table metadata. Tables are stored by column.
Data Access Access to stored information is via DDL for metadata and SQL for data. Must retrieve via a query into program.
Data directly accessible in q. Provides query and functional forms for table manipulation.
Memory Residency
Tables reside on disk; query result sets reside in program memory.
Tables live in memory but can be persisted to disk. Column subsets are page faulted into memory for mapped tables.
Data Format Based on sets, which are unordered collection of distinct items. Data is stored in fields within rows, which are not contiguous.
Based on lists, which are ordered collections allowing duplicates. Data is stored as contiguous items in column lists.
Data Modification Persisted table modifiable via SQL (INSERT, UPDATE, etc.)
Memory resident tables modifiable via q and Q-sql. Persisted table modifiable only with append (upsert).
Data Programming
SQL is declarative relational. Programs, called stored procedures, written in proprietary procedural language.
Programs written in integrated vector functional language q. Tables are first class entities in q.
Transactions Support for transactions via COMMIT and ROLLBACK.
No built-in transaction support.
Global Technology Deutsche Bank
• Designed basing on set theory
• Terse language with single symbol operators and functions
• Functional language with lambda calculus
• Contains own SQL implementation
• Supports namespaces
• Supports automatic data types
• Contains garbage collector
Q language
Global Technology Deutsche Bank
More Q examples
Global Technology Deutsche Bank
Q features
• Q works with big data directly
• Lists, dictionaries and tables are base data types
• Temporal data types are built-in
• Tables has special attribute set
• Q is interpreted
• Q has good network connectivity
• Highly integrated with Unix
• Development with Q is much faster, comparing to other languages
Global Technology Deutsche Bank
Functional forms
The functional forms of select, update and delete can be used in any situation but are especially useful for programmatically generated queries, such as when column names are dynamically produced. The functional forms are,
?[t;c;b;a] / select
![t;c;b;a] / update and delete
where t is a table, a is a dictionary of aggregates, b is a dictionary of groupbys and c is a list of constraints.
The q interpreter parses the syntactic forms of select, exec, update and delete into their equivalent functional forms, so there is no performance difference.
Global Technology Deutsche Bank
Asof join The asof join is so-named because it is often used to join tables along time columns, but this is not a
restriction.
In general, the triadic function aj can be used to join two tables along common columns. Significantly, there is no requirement for any of the join columns to be keys. The syntax of asof join is,
aj [c1...cn;t1;t2]
where c1...cn is a symbol list of common column names for the join and t1 and t2 are the tables to be joined. The result is a table containing records from the left outer join of t1 and t2 along the specified columns.
For each record in t1, the result has one record containing all the items in t1. If there is no record in t2 whose values in the specified columns match those in the corresponding columns of t1, there are no further items in the result record.
If there are matching records in t2, the items of the last (in row order) matching record are appended to those of the t1 record in the result.
Global Technology Deutsche Bank
Q weaknesses
• Single-threaded for the most variety of tasks
• No built-in traditional user access control functionality
• No genuine debugger
• A lot of database management functions should be
implemented by yourself
• Same Q code is not often compatible with different Q versions
Global Technology Deutsche Bank
• KDB+ is base technology for the global market data capturing system – Mercury
• Global data storage system is used simultaneously for real-time calculations
as well as for historical analysis
• All logic is almost 100% on Q
Main KDB+ usage scenario in DB
User queries
Mercury asset stack layout
Global Technology Deutsche Bank
• 4 global locations - LND, NY, HK, TK
• 30 bln. incoming messages per day
• 2,5 mln. active market subscriptions
• 2 petabytes of disk space
• 3000 active processes working 24х7
• More than 200 active servers
• More than 70 mln. user queries per day
• More than 1000 users across DB
(>100 down stream applications)
• System environment consists of 4 clusters –
PROD and active DR, UAT and DEV in all regions
• There also several satellite systems built over Mercury
Market data capturing in numbers
Global Technology Deutsche Bank
KTS - DB’s KDB+ Development Framework
As over the past 7 years, the bank has gained substantial knowledge of the KDB+ technology through various implementations. This experience is key for newer projects that will implement new solutions. For new KDB+ based applications teams would have had build the same foundational features, resulting in duplication and inconsistent adoption of the technology. KTS was architected to be generic enough to address most of the current use cases, extensible to address future needs.
KTS Highlights
• Pre-built / Pre-tested: Increase the reliability of the new application and reduce the programming and testing effort, and time to market.
• Application / Framework Independence: Isolated application code from core kdb+ code decouples releases
• Data Capture functionality out of the box
• Command and Control: Supervisory process for management, state monitoring and administration. Control remote processes from a single point
• Data flows Segregation: Data flows can be separated in stacks that loads its specific code, configurations and business logic. Capture data in one stack, generate derived data in another, increasing performance
• Code Inheritance: Core functionality is inherited by the KTS base functions. These can be extended by the client as needed. Code can be inherited at the application level, stack, region or cluster levels
• Access Control: An application has the ability to limit access to a process, stack, cluster and even query
• Load balancing functionality and process replication
• Slice and Dice Data: Real-time and historical data can be joined and accessed through Gateways (Both Async and Sync)
• Java API allows seamless publishing and subscription
Global Technology Deutsche Bank
KTS component overview
Utilities
High Data Volume Management
Load Balancing Data Clustering
Application Operation Management
Process Management Data Recovery & Replication
Developer Frameworks
Code Management
Code Loader Configuration
Manager Code inheritance
Event Processing
Event Engine Event Scheduler
Operational Libraries
ACL – Access Control
Logging
Data Access & Storage
Real-time Database
Historical Database
Multi Source Publication
/Subscription Mercury Plug-in
Global Technology Deutsche Bank
KaaMS (KDB as a managed service)
Mission The Bank-standard managed service for realizing KDB+ application solutions.
Objectives
• Reduce the cost of implementing KDB+ solutions
• Enforce KTS Solution governance and best practices to simplify application support and stability
• Reduce time to market for KDB+ solutions
KaaMS Highlights
• Guided Engagement: End to end expert guidance for adoption of KTS libraries
• HW Advisory: Facilitate hardware capacity planning expertise
• KTS Training: To assist application teams with KTS component and framework usage
• KTS Advisory: Provide expert level KTS consulting / optimization / best practice review
• Specialist KDB Support: Provide experienced KDB Support with a global support model
Initiation
•Problem Evaluation
•Solution Design Review
•Technology applicability
Hardware Advisory
•Capacity Planning Guidance & Recommendations
•RfS Request Guidance
Business Functionality
•Consulting Services
•Actual Development is done by application team
Support
•Flexible SLAs
•L1 Support Option
•L2 Support Option
•Geneos Monitoring
•KTS Library Support
Production On boarding
•Implementation best practices
•Recommendations
KaaMS Client On boarding Workflow
Global Technology Deutsche Bank
KDB+ in High Frequency Trading system
Global Technology Deutsche Bank
TCA reporting
• TCA client reporting system is consuming and parsing Algo order flow for the real-time enrichment with pricing data provided by main KDB storage (sources are EBS, Reuters, internal DB rates)
• Actual transaction cost analysis work containing customized TWAP, VWAP, market impact and internal values benchmarking for Algo orders and fills, resulting in real-time graphic reporting for traders
• TCA reporting system covering global FX/Listed Derivatives transactions made through Algo platform
• Transaction cost analysis/Business intelligence client reporting provide the increase in transparency and client trust, along with better values for Algo trading platform
TCA project is a real-time client reporting engine to support FX/LD current instrument line (FX spots and crosses, index and commodity futures, future spreads, future baskets, options).
Global Technology Deutsche Bank
Real-time TCA work-flow
Global Technology Deutsche Bank
TCA event processor
FX Algo updates LD Algo updates
Real-time order enriching state machine (stack of functions which implements required business logic on order level)
FX/LD order disk cache (persisting cache aimed to restore processing state after restart)
Updates for order entry
In-memory buffer (MIG updates grouped over parent order ID )
Order entries and fill counts
Thread 1 – collect new messages from TP (from HDB in case of restoration)
Thread 2 – match new orders and corresponding fills and form cache entries
Market data GWs
Market data requests/ results
Real-time fill processor (business logic for individual
executions)
Updates on fill count
Thread 3 – enrich next FX/LD order, publish when ready
Thread 4– enrich group of pending FX/LD fills, publish when ready
Send order message to publisher
Send fill messages to EMS publisher/bus
process 2 process 1 …
Global Technology Deutsche Bank
HFT analytical platform
• Capture generated client pricing, client orders, executions, platform hedging activity, client pricing settings
• Generate enriched datasets that fall into one of the following categories: a. market impact b. trade valuation c. client reaction to pricing changes
• Provide tools for experimentation with parameters and functions
used to generate derived datasets
• Provide basic API to retrieve data as well as sophisticated wrapper functions for quantitative analysis
• Provide a visual toolkits to study the data described above
Global Technology Deutsche Bank
KDB+ market maker engine
• Market maker application aligned to the European Government business
• KDB+ application is key market making application with backend and frontend sides where backend is placed on KX side and frontend on trade engine part.
• Analytical application - native application which is running on trader's desktop and operated via excel spreadsheet
Global Technology Deutsche Bank
Q debugger (by Andrey Kozyrev)
http://code.kx.com/wsvn/code/contrib/akozyrev/debug/
Global Technology Deutsche Bank
Qpad - KDB+/Q client and IDE (by Oleg Zakharov)
http://www.qinsightpad.com/
Thank you!
Andrey Babanin, TCAKDB team/Mercury team [email protected]