imageimage

A first look at kdb+

Kdb+ provides a full relational database management system with time-series analysis that handles data in memory as well as stored on disk. We take a first look at it with a view to test it on our analytic platform.

Kdb+ is the flagship product of Kx Systems, a company which has been developing a propriety historical and real-time relational database since 1993. Kdb+ comes for many operating systems and it is available both in 32 and 64-bit platforms. Their 32-bit solution can be downloaded for free for evaluation purposes only, which is great if one wants to start playing with it. And that is exactly what I did, I downloaded the 32-bit linux version and extracted into my home directory. I renamed the directory q and copied the executable in ~/q/l32 to /usr/local/bin/.

After that I went to the online manual they provide (you will need to enter Userid: anonymous and Password: anonymous to view the link).

Here are a few bullet points about their system:

  • Kdb+ has embedded the Kx propriety language called q (not to be confused with another language called q the equational programming language.
  • q is a proprietary array processing language developed by Arthur Whitney. The language serves as the query language for kdb+ and it evolved from APL.

To get going fast, just type q on shell to enter the ''q console''

$ q
KDB+ 2.6 2009.09.15 Copyright (C) 1993-2009 Kx Systems
q)_

Follow their tutorial to start playing with the console. It is quite straightforward expecially if you are used to other scripting languages such as Python and R.

Data types in q

The backbone of the q language is formed by atoms, lists, dictionaries and tables:

  • Atoms are, as the name suggests, irreducible values with a specific data type. These basic data types correspond to those of SQL with some additional date and time related types that facilitate time series.
  • Lists are ordered collections of atoms and other lists. The order of the items in the list is positional (i.e., left-to-right) and is part of its definition. The lists (1;2) and (2;1) are different. SQL is based on sets, which are inherently unordered. This distinction leads to some subtle differences between the results of queries on q tables versus the result sets from analogous SQL queries. The inherent ordering of lists makes time series processing natural and fast in q, while it is cumbersome and performs poorly in standard SQL.
  • Dictionaries are a generalisation of lists and provide the foundation for tables. They are ordered collection of key-value pair. Similar, but not equivalent, to a dict in Python and map in C++.
  • Tables form the basis for kdb+. A table is a collection of named columns implemented as a dictionary. Consequently, q tables are column-oriented, in contrast to row-oriented tables in relational databases.

Using kdb with other languages

As any serious propriety software, kdb+ provides native interfaces in C/C++, Java, C# and Python. We will look at the Python interface and try to use it to administer a simple kdb+ database.

The Python interface repository is opened sourced and an Eclipse plugin is also available.

Stay tuned