Personal tools
User menu

Knowledge Base/Databases/Kdb

From Thalesians

Jump to: navigation, search

Contents

Acknowledgements

We would like to thank Ed Bierly, Jeff Borror, Mahesh BP, Bernard Cena, Aaron Davies, Ben Delo, Alex Donohue, Mark Gardner, Daniel Lister, Manish Patel, Charlie Skelton, Attila Vrabecz, and of course Arthur Whitney for their help and guidance.

Kdb+ in a nutshell

This is a very, very brief introduction to kdb+. Paul Bilokon has previously delivered it as a presentation in under 20 minutes. In order to understand it properly, we recommend that you install kdb+. You can download the 32-bit version for free from

http://kx.com/Developers/FREE-DOWNLOAD.php

You can play around with the 32-bit version but you are not permitted to use it in production. Also, 32-bit programs can only access up to 4 GB of Random Access Memory (RAM, i.e. "the" memory) whereas 64-bit programs can access more than 128 GB of RAM. This is a pretty strong restriction; most production users of this in-memory database (IMDB) would want to access more than 4 GB of RAM. Moreover, 64-bit programs offer increased performance.

I just called kdb+ an in-memory database, i.e. a database management system that primarily relies on main memory for computer data storage. Yes, kdb+ is an in-memory database. But is this the best description for kdb? What is it, really?

You will be pleased to know that kdb+ is really an interpreted programming language. You can call it with a script file to execute, like so

C:\q\q.exe foo.q

or use it as an interactive console

C:\q\q.exe

(Keep it running.)

In this sense it is no different from Python or Perl.

So we can compute things in kdb+. Simple things, like these:

q)5 + 3
8

(q) above is the prompt, very much like the Unix prompt.)

Or these. Here we are concatenating strings:

q)"Hello ", "World!"
"Hello World!"

But more often than not you will deal with symbols (a pretty special concept in kdb+) rather than strings, because these are usually indexed and stored efficiently:

q)a : `foo
q)a
`foo
Oh, we have just declared (and defined) a variable without giving you any warning. Yes,
a
is a variable and we didn't use "a = ..." but rather "a : ..." because K and Q likes to be different. :-)

Hang on, we are talking about kdb+, what are K and Q? These are the programming languages behind kdb+. K is a very terse (and difficult to read, for humans) programming language invented by Arthur Whitney and it became the foundation of kdb+. Q is essentially a wrapper around K which uses human words, such as "each" so it is much easier to understand. It's built on top of K. We will ignore the distinction between K and Q and refer to the language as "Q" in the sequel. It turns out that we have been programming in Q all along.

So kdb+ is really this Q programming language plus some philosophy. It's nice to know this fact. Now it is clear why the kdb+ executable is called q.exe (q on Unix). It is a Q interpreter.

What makes it an in-memory database?

Well, the fact that we can declare variables and store data in-memory, like so:

q)a : 3
q)b : 5
q)a + b
8

And these variables don't have to be scalars. They can be lists:

q)l1 : `foo`bar`baz
q)l1
`foo`bar`baz
q)l2 : 1 2 3 4 5 6 7 8
q)l2
1 2 3 4 5 6 7 8

Dictionaries:

q)d1 : `name`age!(`Methuselah`Jared`Noah; 969 962 950)
q)d1
name| Methuselah Jared Noah
age | 969        962   950

And, more importantly perhaps, tables:

q)t1 : ([] name:`symbol$(); age:`int$())
q)t1
name age
--------

where we can store data:

q)t1 ,: (`Methuselah, 969)
q)t1 ,: (`Jared, 962)
q)t1 ,: (`Noah, 950)
q)t1
name       age
--------------
Methuselah 969
Jared      962
Noah       950

Note that we could also define t1 by flipping a dictionary:

q)t1 : flip[d1]
q)t1
name       age
--------------
Methuselah 969
Jared      962
Noah       950

This table now "lives" in memory. It is in this sense that kdb+ is an in-memory database. And so are Python and Perl. But kdb+ comes with many useful pleasantries which Python and Perl don't have. And it has a lot of clever optimisation, such as indexing and caching, under the covers. These make it more deserving of the title than the aforementioned Python and Perl.

Notice that we can "put a port" on our kdb+ process:

q)\p 19123
q)

Then open a browser and go to http://localhost:19123/

We will see the aforementioned variables names a, b, l1, l2, d1, t1 in the frame on the left. We can click on them and in the right frame we'll see their values. E.g. for t1:

name       age
--------------
Methuselah 969
Jared      962
Noah       950

Great, so we got more than we bargained for. Kdb+ comes with a network interface that our browser can understand. We can even issue Q commands via the browser:

http://localhost:19123/?count t1

will give us the number of rows in t1, i.e.

3

We can use some SQL syntax in Q:

q)select age from t1 where name=`Noah
age
---
950

We can also save tables to binary files:

q)(`$":H:/myt1.dat") set t1
`:H:/myt1.dat

And load them from those binary files later on:

q)t2 : get (`$":H:/myt1.dat")
q)t2
name       age
--------------
Methuselah 969
Jared      962
Noah       950

We can hold the data in memory and we can persist it to disk. Conventional databases are in control of the persistence process. In kdb+ the user is in control. This may be taxing but enables him/her to perform some clever optimisations.

Rather than accessing kdb+ remotely (or locally, on localhost) from the browser we can use TCP and another q.exe process. If you start q.exe in another terminal, you can do this:

q)h : hopen `:localhost:19123
q)h "select from t1"
name       age
--------------
Methuselah 969
Jared      962
Noah       950
q)hclose h

Kdb nomenclature: k, Q, etc.

When talking about Kdb, several terms are used, often interchangeably: k, Q, kdb, kdb+... What does each of these mean?

  • k is the language that Arthur Whitney wrote after leaving Morgan Stanley. It was first released in 1993, was under exclusive contract to UBS until 1996, when it became generally available.
  • ksql was added in 1998 which was a layer on top of k. This is sometimes regarded to be part of k. ksql includes SQL-like constructs, such as select.
  • kdb is the in-memory database built on k.
  • kdb+ was released in June 2003. This was more or less a total rewrite of kdb for 64 bit systems. It uses k (version 4) to boot and has...
  • Q, which is believed to stand for Query language, a language layer on top of k, which is defined in terms of k. Both equivalent Q and k compile to the same byte code that is executed in the kdb+ byte interpreter. Q is like a macro language layer, e.g. type in Q is the equivalent of @: in k. Most kdb+ developers just use Q (not k), as it is more readable.

Many thanks to Charlie Skelton for his explanation.

Key bindings on Linux

When you run a Windows version of Q, the arrow keys move you up and down the command history, just as they do on a properly set up Linux terminal.

When you try using these keys in a Linux version of Q, you'll see something like

q)^[[A

for "up", which is not particularly helpful.

You can fix this by running q from rlwrap:

/home/pbilokon/bin/rlwrap /home/pbilokon/q/l64/q

Perhaps put this inside a shell script, start_q.sh:

#!/bin/bash

/home/pbilokon/bin/rlwrap /home/pbilokon/q/l64/q $@

To find out what rlwrap does, see http://utopia.knoware.nl/~hlub/uck/rlwrap/ (in brief, it is a "readline wrapper").

Does Q have to be unreadable?

No.

Although Q's creator, Arthur Whitney, believes in expressing logic in code as tersely as possible, you don't have to emulate his style. The following example generates geometric Brownian motion paths:

// A list of differences between the elements of a given list, lst. The
// length of the result is one less than the length of lst. The result
// is
//
//     ((lst[1] - lst[0]) (lst[2] - lst[1]) ... (lst[N-1] - lst[N-2]))
//
// where N is count lst.
//
differences : { [lst]
        lst[0] -': lst[(til ((count lst) - 1)) + 1]
    }

// Generate a single random number sampled from Uniform(0, 1).
//
// Example:
//
//     uniformRandomVariate[]
//
// Result:
//
//     (nondeterministic)
//
//     0.06551717
//
uniformRandomVariate : { []
        // 1 ? 1f generates a single (1) random number drawn from [0f, 1f]
        // [0] is required since (1 ? 1f) returns a list.

        (1 ? 1f)[0]
    }

nextGaussianRandomVariateAvailable : 0;
nextGaussianRandomVariate : 0f;

// Generate a single random number sampled from Normal(0, 1).
//
// The implementation uses the polar form of the Box-Muller transform.
//
// Example:
//
//     gaussianRandomVariate[]
//
// Result:
//
//     (nondeterministic)
//
//     -1.397263
//
gaussianRandomVariate : { []
        // Box-Muller transform takes two Uniform(0, 1) random numbers and
        // produces two Normal(0, 1) random numbers.

        // Its polar form is both faster and more robust numerically.

        // Because we generate two Normal(0, 1) random numbers at once and
        // return only one, we only have to compute them every other time
        // the function is called. We cache the second random number (the
        // one we did not return) and return it next time the function is
        // called, without computing.

        // This will stay 1 only if we need to compute the numbers.
        compute : 1;

        // Do we already have a cached random number available?
        if [nextGaussianRandomVariateAvailable;
            // If so, we don't need to compute.
            compute : 0;
            result : nextGaussianRandomVariate;
            nextGaussianRandomVariateAvailable :: 0;
        ];

        // So do we have to compute the random numbers?
        if [compute;
            w : 1f;

            while [w >= 1f;
                x1 : ((2f * uniformRandomVariate[]) - 1f);
                x2 : ((2f * uniformRandomVariate[]) - 1f);

                w : ((x1 * x1) + (x2 * x2))
            ];

            w : sqrt ((-2f * log[w]) % w);

            y1 : x1 * w;
            y2 : x2 * w;

            nextGaussianRandomVariate :: y2;
            nextGaussianRandomVariateAvailable :: 1;

            result : y1;
        ];

        result
    }

exponentialRandomVariate : { [lambda]
        neg[log[uniformRandomVariate[]]] % lambda
    }

// Parameters:
//
//   drift: drift per unit time, which is 1 year
//   volatility: per sqrt[unit time], which is sqrt[1 year]
//
geometricBrownianMotionFactor : { [drift; volatility; dateDifference]
        deterministicTerm : (drift - ((0.5 * (volatility xexp 2.0)))) * dateDifference;
        stochasticTerm : volatility * gaussianRandomVariate[] * sqrt[dateDifference];
        exp[deterministicTerm + stochasticTerm]
    }

// Simulate a geometric Brownian motion process (GBM), i.e.
//
//     dS_t = \mu S_t dt + \sigma S_t dW_t
//
// whence
//
//     S_t = S_0 e((\mu - \sigma^2 / 2) t + \sigma \sqrt{t} N_{0, 1})
//
// Parameters:
//
//   initialValue: the initial value, S_0
//   drift: drift per unit time, which is 1 day (!)
//   volatility: per sqrt[unit time], which is sqrt[1 day]
//   dateTimes: a list of datetimes or dates for which the values will be
//     generated
//
// Example:
//
//     dateRange : (2008.01.01 2008.01.02 2008.01.03 2008.01.04 2008.01.05)
//     geometricBrownianMotion[1.5490; 0.003; 0.12; dateRange]
//
// Result:
//
//     (nondeterministic)
//
//     1.549 1.5431551 1.5468368 1.5343732 1.5195453
//
geometricBrownianMotion : { [initialValue; drift; volatility; dateTimes]
        drift %: 365.25;
        volatility %: sqrt[365.25];
        dateDifferences : differences[dateTimes];

        f : geometricBrownianMotionFactor[drift; volatility];

        factors : f':[dateDifferences];

        (initialValue), initialValue *\ factors
    }

// Example:
//
//     exponentiallyDistributedDateTimes[5; 2010.06.16T00:00:00.000; 2010.06.18T00:00:00.000]
//
// Result:
//
//     (nondeterministic)
//
//     2010.06.16 00:27:46.087  2010.06.16 14:44:15.869  2010.06.17 00:40:05.673
//     2010.06.17 07:42:27.479  2010.06.17 13:32:56.693
//
exponentiallyDistributedDateTimes : { [ticksPerDay; beginDateTime; endDateTime]
        dateTimes : `datetime$();

        lastDateTime : beginDateTime + exponentialRandomVariate[ticksPerDay];

        while[lastDateTime <= endDateTime;
            dateTimes ,: lastDateTime;
            lastDateTime : lastDateTime + exponentialRandomVariate[ticksPerDay];
        ];

        dateTimes
    }

// Example:
//
//     geometricBrownianMotionTimeSeries[1.5490; 0.003; 0.15; 5; 2010.06.16T00:00:00.000; 2010.06.18T00:00:00.000]
//
// Result:
//
//     (nondeterministic)
//
//     dateTime                 dataPoint
//     ----------------------------------
//     2010.06.16 12:36:42.888  1.549
//     2010.06.16 20:18:18.691  1.5622148
//     2010.06.16 21:20:50.682  1.561131
//     2010.06.16 23:10:47.157  1.560705
//     2010.06.17 00:07:32.780  1.5647236
//     2010.06.17 00:53:56.429  1.5667019
//     2010.06.17 01:23:21.514  1.5637483
//     2010.06.17 04:24:04.107  1.5599954
//     2010.06.17 07:04:39.818  1.564712
//     2010.06.17 17:48:14.832  1.5824289
//     2010.06.17 18:07:24.626  1.582003
//
// Example:
//
//     geometricBrownianMotionTimeSeries[1.5490; 0.003; 0.15; 10; 2010.06.16T00:00:00.000; 2011.06.18T00:00:00.000]
//     geometricBrownianMotionTimeSeries[1.5490; 0.003; 0.0003; 10; 2010.06.16T00:00:00.000; 2011.06.18T00:00:00.000]
//
geometricBrownianMotionTimeSeries : { [initialValue; drift; volatility; ticksPerDay; beginDateTime; endDateTime]
        timeSeries : getEmptyTimeSeries[];
        
        dateTimes : exponentiallyDistributedDateTimes[ticksPerDay; beginDateTime; endDateTime];
        dataPoints : geometricBrownianMotion[initialValue; drift; volatility; dateTimes];

        flip [`dateTime`dataPoint!(dateTimes; dataPoints)]
    }

date_range : (2008.01.01 2008.01.02 2008.01.03 2008.01.04 2008.01.05)
geometricBrownianMotion[1.5490; 0.01; 0.12; date_range]

You can save it as random.q and then run

q random.q

You will see something like the following:

1.549 1.560322 1.574135 1.594816 1.622068

Timing commands

Often you need to know how long a particular command takes to run. The trick is to prepend \t to the command, e.g.

\t select from TestTable

Rather than returning the result of the select, this command will come back with the number of milliseconds spent, e.g.

735

for 735 milliseconds.

There is another way, which does not involve explicit calls to system commands.

startTime : .z.T

// ... Command to be timed ...

endTime : .z.T
duration : "t"$(endTime - startTime)
1 "The command took ", string[duration]

This will display something like

The command took 00:00:09.6251

We shall now suggest a more powerful approach to timing. Suppose we have a function that we want to time.

The function may take very little time to run, so we call it runCount time in a loop and obtain an average run duration by dividing the total time taken for runCount runs by runCount.

In order to obtain more significant results, we shall repeat this procedure trialCount times, noting the average run duration and its standard deviation.

Putting this all together, we have

timeFunction : { [trialCount; runCount; function]
        trialDurations : ();

        do [ trialCount;
                startTime : .z.T;
                do [ runCount;
                    function[];
                ];
                trialDurations ,: ((.z.T - startTime) % runCount);
            ];

        trialDurationAverage : avg[trialDurations];
        trialDurationStandardDeviation : sqrt[var[trialDurations]];

        (trialDurationAverage, trialDurationStandardDeviation)
    }

Suppose we want to time something like this:

function : { [] (100000?10) * (100000?10) };

We run

trialCount : 10;
runCount : 100;
timeFunction[trialCount; runCount; function];

The result it, say,

3.563 0.06165225

This means that on average function takes 3.563 milliseconds to run, the standard deviation being 0.06165225 for the trials.

String concatenation

Happens naturally:

"hello", " ", "world"

The result is

"hello world"

Now suppose you want to get "hello123world" from this:

d: 123
"hello", d, "world"

You will see

"h"
"e"
"l"
"l"
"o"
123
"w"
"o"
"r"
"l"
"d"

This is because a string is really a list of characters. Joining or concatenating two lists of characters (strings), you get another list of characters (another string). Joining or concatenating a list of characters (string) and an integer, you get a list of characters and integers (not a string at all).

Here is one solution:

"hello", string[d], "world"

String concatenation in query results

Suppose you have a table, say SampleTable that looks like this:

date       sym    time         bid     ask     location market
--------------------------------------------------------------
2010.04.26 FOOBAR 00:00:00.408 0.82587 0.82655 NY       ABC   
2010.04.26 FOOBAR 00:00:00.412 0.82603 0.8264  NY       DEF   
2010.04.26 FOOBAR 00:00:03.143 0.82592 0.8266  NY       ABC   
2010.04.26 FOOBAR 00:00:03.162 0.82608 0.82645 NY       DEF   
2010.04.26 FOOBAR 00:00:08.754 0.82597 0.82665 NY       ABC   
2010.04.26 FOOBAR 00:00:08.670 0.82613 0.8265  NY       DEF   
2010.04.26 FOOBAR 00:00:09.118 0.82599 0.82667 NY       ABC   
2010.04.26 FOOBAR 00:00:09.124 0.82615 0.82652 NY       DEF   
2010.04.26 FOOBAR 00:00:09.158 0.8262  0.82657 NY       DEF   
2010.04.26 FOOBAR 00:00:09.175 0.82604 0.82672 NY       ABC   

Suppose that you would like to select location and market as one field, contributor. Here is how you can achieve this:

select contributor: "_" sv/: flip string(location;market) from SampleTable

The results look as follows:

contributor
-----------
"NY_ABC"
"NY_DEF"
"NY_ABC"
"NY_DEF"
"NY_ABC"
"NY_DEF"
"NY_ABC"
"NY_DEF"
"NY_DEF"
"NY_ABC"

We would like to thank Attila Vrabecz for this tip.

Display more results on the screen

Suppose that you issue a query

cols SomeTable

You may see something like

`one`two`three`four`five`six`seven`eight`nine`ten`eleven`twelve`thirteen`fift..

Of course, instead of `one, `two, `three, ... you will see some column names. But the point is that the result is cut short by the "..".

Perhaps you want to see the rest of the column names. What can you do?

Try this:

\c

You will see something like

25 80

This means that the output is trimmed to 25 rows, 80 columns. You can override this:

\c 25 150

and re-issue the query

cols TimeTable

Now you should see more output.

Listing a table's columns

cols MarketDepth

where MarketDepth is the table name.

Connecting to a remote Kdb+ server

Suppose we are connecting to host myhost01, port 8011. Run the following in q:

h : hopen `:myhost01:8011

If you are connecting to localhost, you could use

h : hopen `::8011

You can now execute queries on h like this:

h "select distinct sym from MarketDepth where date=2006.06.01"

You should see some output:

sym
------
EURUSD
USDJPY
..

Once you are done, you can close the handle:

hclose h

Loading data from a remote server and saving it locally

Start q.

Check the current directory:

\cd

You may wish to change it to a pre-created output directory:

\cd /home/pbilokon/tmp

First define the date range we shall be working on. Say, it's 2008.06.01 to 2008.08.31 inclusive:

firstdate: 2008.06.01
lastdate: 2008.08.31
daterange: firstdate + til (lastdate - firstdate) + 1

Open a handle:

h: hopen`:myhost01:8012

We shall read and write the data day by day. Let's define the following two functions:

readdata:{[d] h(raze "`time xasc select from MarketDepth where date=", string d, ", sym in `GBPUSD`EURUSD")}
readandwritedata:{[d] MarketDepth::readdata[d]; .Q.dpft[`:.;d;`sym;`MarketDepth]}

readdata reads one day (specified by d) worth of data. readandwritedata calls readdata and saves the results to disk (local directory as a splayed table).

You can try applying this to a single day:

readandwritedata[2008.06.01]

Or process the entire daterange:

readandwritedata each daterange

Close the handle:

hclose h

Creating and populating a simple keyed table with time series identifiers (sym), dates (date) and values (val)

TestTable : ([sym:`symbol$(); date:`date$()] val:`float$())

You can inspect the resulting empty keyed table by simply issuing

TestTable

It looks like this:

sym date| val
--------| ---

We can now upsert our first value:

upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2874)

On issueing

TestTable

you will now see

sym    date      | val
-----------------| ------
EURUSD 2009.02.10| 1.2874

You can also

select from TestTable

and you will see the same

sym    date      | val
-----------------| ------
EURUSD 2009.02.10| 1.2874

Is it possible to upsert several rows at once? Yes, but you should be careful with the syntax. Note that there is no space in "`USDJPY`ARSRUB".

upsert [`TestTable] ([sym: `USDJPY`ARSRUB; date: 2009.02.10 2009.02.11]; val: 1.2874 10.252969)

Saving and loading tables

The simplest thing to do is to save the table to a file:

`:H:/somefile.dat set TestTable

This will save the table TestTable to a binary file named H:/somefile.dat.

To load it into a new table, use

AnotherTestTable : get `:H:/somefile.dat

Truncating a table

Truncating a table means deleting all data from it, which is of course not the same as deleting the table itself.

delete from `TestTable

Dropping a table

Dropping a table amounts to deleting it from the namespace. Assuming that your table is in the root namespace (.), we can delete it as follows:

delete TestTable from `.

Turning a non-keyed table into a keyed table

Suppose that you have created a non-keyed table:

TestTable : ([] sym:`symbol$(); date:`date$(); val:`float$())

And inserted/upserted some data:

upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2874)
upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2874)
upsert [`TestTable] (`EURUSD; 2009.02.11; 1.2901)

So that you can check by entering

TestTable

that the table looks like this:

sym    date       val
------------------------
EURUSD 2009.02.10 1.2874
EURUSD 2009.02.10 1.2874
EURUSD 2009.02.11 1.2901

How can you turn it into a keyed table?

By entering

`sym`date xkey `TestTable

Then you can check by entering

TestTable

that the table looks like this:

sym    date      | val
-----------------| ------
EURUSD 2009.02.10| 1.2874
EURUSD 2009.02.10| 1.2874
EURUSD 2009.02.11| 1.2901

Note, however, that even though TestTable is now a keyed table, it contains duplicates, which would have to be removed manually.

Turning a keyed table into a non-keyed table

If TestTable is your keyed table, you can modify it to remove the key as follows:

() xkey `TestTable

Note that the ` symbol tells Kdb+ to modify the existing TestTable.

Alternatively, if you would like to leave TestTable as it is and create a non-keyed version TestTable1 in addition to it, you can use

TestTable1 : () xkey TestTable

Dealing with duplicates

First, let us create a table for our experiments:

TestTable : ([] sym:`symbol$(); date:`date$(); val:`float$())
upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2874)
upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2874)
upsert [`TestTable] (`USDJPY; 2009.02.11; 1.2901)
upsert [`TestTable] (`FOOBAR; 2009.02.10; 2.3576)
upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2875)
upsert [`TestTable] (`EURUSD; 2009.02.11; 1.3000)
upsert [`TestTable] (`USDJPY; 2009.02.11; 1.2902)

Try

show TestTable

to see what it looks like:

sym    date       val
------------------------
EURUSD 2009.02.10 1.2874
EURUSD 2009.02.10 1.2874
USDJPY 2009.02.11 1.2901
FOOBAR 2009.02.10 2.3576
EURUSD 2009.02.10 1.2875
EURUSD 2009.02.11 1.3
USDJPY 2009.02.11 1.2902

We define duplicate rows as those that have both sym and date values equal.

To count the duplicates, use

select count i by sym, date from TestTable

(Notice that here by sym, date expresses our definition of duplicates.)

Here is the result:

sym    date      | x
-----------------| -
EURUSD 2009.02.10| 3
EURUSD 2009.02.11| 1
FOOBAR 2009.02.10| 1
USDJPY 2009.02.11| 2

Well, the rows with x = 1 don't really correspond to duplicates. We can identify duplicates as follows:

select sym, date, x from (select count i by sym, date from TestTable) where x > 1

This will show

sym    date       x
-------------------
EURUSD 2009.02.10 3
USDJPY 2009.02.11 2

How can we remove the duplicates?

select by sym, date from TestTable

will produce

sym    date      | val
-----------------| ------
EURUSD 2009.02.10| 1.2875
EURUSD 2009.02.11| 1.3
FOOBAR 2009.02.10| 2.3576
USDJPY 2009.02.11| 1.2902

which is a keyed table with the duplicates removed (only the last row survives, so (`EURUSD, 2009.02.11) is, deterministically, 1.3, not 1.2874).

We can save this result as, for example, TestTableWithoutDuplicates:

TestTableWithoutDuplicates : select by sym, date from TestTable

If we don't want a keyed table, we can remove the key:

() xkey `TestTableWithoutDuplicates
show TestTableWithoutDuplicates

now gives

sym    date       val
------------------------
EURUSD 2009.02.10 1.2875
EURUSD 2009.02.11 1.3
FOOBAR 2009.02.10 2.3576
USDJPY 2009.02.11 1.2901

Suppose we want to be a little bit more careful and work with indices of the duplicates. Moreover, suppose we wish to remove the duplicates from the original TestTable instead of creating a new TestTableWithoutDuplicates. We can proceed as follows.

select from (select duplicateIndices : i, lastDuplicateIndex : last i, duplicateCount : count i by sym, date from TestTable) where duplicateCount > 1

This will produce

sym    date      | duplicateIndices lastDuplicateIndex duplicateCount
-----------------| --------------------------------------------------
EURUSD 2009.02.10| 0 1 4            4                  3
USDJPY 2009.02.11| 2 6              6                  2

Notice that in the duplicateIndices column we have lists, not scalars. Of course, we can do this in k.

Now let's modify this slightly. We don't want to have the last index (which appears in lastDuplicateIndex anyway) under duplicateIndices. That's because we want to keep it when we finally come to removing duplicates.

We observe that for any list l

(count l - 1)#l

will return a copy of the list with the last element removed. Thus we can modify our query as follows:

select from (select duplicateIndices : (count[i] - 1)#i, lastDuplicateIndex : last i, duplicateCount : count i by sym, date from TestTable) where duplicateCount > 1

The result is now

sym    date      | duplicateIndices lastDuplicateIndex duplicateCount
-----------------| --------------------------------------------------
EURUSD 2009.02.10| 0 1              4                  3
USDJPY 2009.02.11| ,2               6                  2

just as required.

We want to remove the duplicates. We don't want the entire table above, so let's modify the query again:

select duplicateIndices from (select duplicateIndices : (count[i] - 1)#i, lastDuplicateIndex : last i, duplicateCount : count i by sym, date from TestTable) where duplicateCount > 1

This gives us a table of lists,

duplicateIndices
----------------
0 1
,2

But we don't need a table of lists. A list of lists will do, and we can obtain it simply by replacing select with exec:

indicesToRemove : (exec duplicateIndices from (select duplicateIndices : (count[i] - 1)#i, lastDuplicateIndex : last i, duplicateCount : count i by sym, date from TestTable) where duplicateCount > 1)

Now indicesToRemove is a list of lists:

q)indicesToRemove
0 1
,2

But we need a simple list of indices, (0 1 2). How can we obtain it?

q)() ,/ indicesToRemove
0 1 2

What exactly did we do? The over adverb (/) modifies a base dyadic function (in our case join (,)) so that the items of the second argument are applied iteratively to the first argument. (Note: There cannot be any whitespace between / and the function it modifies.)

Thus we effectively compute

((),(0 1)),(2)

There is an alternative:

q)raze indicesToRemove
0 1 2

We shall overwrite indicesToRemove with this more convenient form:

q)indicesToRemove : raze indicesToRemove
q)indicesToRemove
0 1 2

Now we are ready to remove the duplicates from TestTable:

q)delete from `TestTable where i in indicesToRemove
`TestTable

(Note: Don't forget the "`" in delete from `TestTable or else TestTable won't be modified, a new table will be constructed.)

Done:

q)show TestTable
sym    date       val
------------------------
FOOBAR 2009.02.10 2.3576
EURUSD 2009.02.10 1.2875
EURUSD 2009.02.11 1.3
USDJPY 2009.02.11 1.2902

Creating queries dynamically

You can construct your query by concatenating strings and then execute it. Here is an example. Let's define our usual

TestTable : ([] sym:`symbol$(); date:`date$(); val:`float$())

We shall now create a select query where we can vary the column to be selected. Let's set

columnToBeSelected : "sym"

and evaluate

value("select ", columnToBeSelected, " from TestTable")

This selects the (currently empty) sym column:

sym
---

What if we set columnToBeSelected to something else?

columnToBeSelected : "date"
value("select ", columnToBeSelected, " from TestTable")

gives

date
----

This is not quite satisfactory yet. We would like to abstract away the process of query creation and execution. For this purpose we define a function which takes two parameters, columnName and tableName:

selectColumn : {[columnName; tableName] value("select ", columnName, " from ", tableName) }

We can now execute

selectColumn["sym"; "TestTable"]

to select column sym from TestTable. We'll see

sym
---

Similarly we can run

selectColumn["date"; "TestTable"]

Et cetera.

Accessing command line arguments from scripts

Suppose we create a script called test.q which we shall call as

C:\q\q test.q Paul Bilokon 1982

We would like to access the command line arguments ("Paul", "Bilokon" and 1982) from the script.

Here's what we could put in test.q:

name : .z.x 0;
surname : .z.x 1;
dob : .z.x 2;

name
surname
"I"$dob

This will simply display our command line arguments:

"Paul"
"Bilokon"
1982

Note that "I"$dob is the syntax for parsing the dob string to an integer.

Transferring a table from a remote server to a local server

Connect to a remote server from the local server:

h:hopen`:someremotehost.com:19111

If you have to specify a username and password, use this instead:

h:hopen (`:someremotehost.com:19111:mylogin:mypassword;10000)

where 10000 stands for 10 second timeout.

Assuming that the remote table is named TestTable and you want to copy it to a local table TestTableCopy, you can then run

TestTableCopy : h "TestTable"

locally. If you only want the first 100000 rows of TestTable in TestTableCopy, you can do

TestTableCopy : h "100000#TestTable"

Oh, and don't forget to

hclose h

Converting a string to a symbol

b : "foo"
`$b

This gives

`foo

Appending to a list of strings

We have a list of strings:

L : ("foo"; "bar"; "baz")

How do we append another string, say "abc", to it?

L ,: "abc"

This gives

"foo"
"bar"
"baz"
"abc"

Logging and debug output

It is often very useful to see the intermediate results of a function to get a good idea of what it does. Let us write a simple recursive factorial function to illustrate this:

factorial : { [n]
        if [n = 0;
            result : 1;
        ];

        if [n > 0;
            result : n * factorial[n - 1];
        ];

        result
    }

We can convince ourselves that the function is working as required:

q)factorial[0]
1
q)factorial[1]
1
q)factorial[2]
2
q)factorial[3]
6
q)factorial[4]
24
q)factorial[5]
120

Now, let's change this function to display the intermediate results:

factorial : { [n]
        if [n = 0;
            result : 1;
        ];

        if [n > 0;
            result : n * factorial[n - 1];
        ];

        0N!result;
        result
    }

Let's test this:

q)a : factorial[8]
1
1
2
6
24
120
720
5040
40320
q)a
40320
q)

The line that does the logging is

        0N!result;

This sends the value of result to STDOUT.

We could apply this to strings:

        0N!"hello world"

or more complex objects, such as lists:

        0N!(1;2;"foo")

Returning tables from functions

We want to create, populate, and return a table from a function. Here is our first attempt (broken!):

fooBroken : { []
        t : ([] sym:`symbol$(); date:`date$(); val:`float$());
        upsert [`t] (`yo; 2009.01.01; 3.30);
        t
    }

When we try calling fooBroken we get this:

q)fooBroken[]
sym date val
------------

Somehow our upsert didn't work.

The reason why it didn't work is this. You cannot refer to locals by name in K, so `t in this upsert wouldn't work.

What did it do? Well, it referred to a global variable t, whereas the local one, the one we returned, remained unchanged. The upsert worked on that global variable, which you can inspect as follows:

q)t
`yo
2009.01.01
3.3

This t was created in the default namespace as a side effect:

q)\v
,`t

This is not quite what we wanted. There are several options. One is to use global variables:

fooUsingGlobals : { []
        `t set ([] sym:`symbol$(); date:`date$(); val:`float$());
        upsert[`t] (`yo; 2009.01.01; 3.30);
        :t
    }

This does work:

q)fooUsingGlobals[]
sym date       val
------------------
yo  2009.01.01 3.3

But it also has a side effect (it creates a global t):

q)\v
,`t

There is another solution. We cannot refer to a local by name, but we can assign to it:

fooUsingLocals : { []
        t : ([] sym:`symbol$(); date:`date$(); val:`float$());
        t:upsert [t] (`yo; 2009.01.01; 3.30);
        :t
    }

This works:

q)fooUsingLocals[]
sym date       val
------------------
yo  2009.01.01 3.3

without the aforementioned side effect:

q)\v
`symbol$()

Finally, we don't have to re-assign the local t. We could change fooUsingLocals (let's call the new version fooUsingLocals1) to use ,:`t doesn't work on locals, whereas ,: does:

fooUsingLocals1 : { []
        t : ([] sym:`symbol$(); date:`date$(); val:`float$());
        t ,: (`yo; 2009.01.01; 3.30);
        :t
    }

How to capture the result of the system commands \t, \v, etc., into a variable

\t, \v, etc., are semantically "special".

For example, this won't work:

timeTaken : \t select Foo from SomeTable

There is a workaround: use value:

timeTaken : value "\\t select Foo from SomeTable"

Notice that there are two slashes above, not one.

timeTaken is now set to the number of milliseconds it has taken to execute select Foo from SomeTable, as required.

Loops with indices

You may miss the conventional

for (int i = 0; i < 5; ++i)
{
    // Iterated action involving i
    cout << i << endl;
}

in Q. One way to achieve the above is

i : 0;
do [5;
        // Iterated action involving i
        0N!i;
    
        i : i + 1
    ];

The code in both language will result in

0
1
2
3
4

on the display.

Passing a function as an argument to another function

First, let us define our function:

myFunctor : { []
        2 * x
    }

Next, we shall define another function, which takes a function as its argument and applies it to 5:

myEvaluator : { [func]
        func[5]
    }

The result of

myEvaluator[myFunctor]

is

10

as expected.

Projections of a function of one argument; deferred function execution

Suppose that you have a function of a single argument:

someFunction : { [n]
        (n ? 10) * (n ? 10)
    }

It may make sense to project it on a given argument. Even though the resulting function will have zero arguments, and the values of all the arguments will be known, you may still want to use it as a functor, you may want to defer its execution.

Suppose that we want to time the execution of someFunction using our timing routine:

timeFunction : { [trialCount; runCount; function]
        trialDurations : ();

        do [ trialCount;
                startTime : .z.T;
                do [ runCount;
                    function[];
                ];
                trialDurations ,: ((.z.T - startTime) % runCount);
            ];

        trialDurationAverage : avg[trialDurations];
        trialDurationStandardDeviation : sqrt[var[trialDurations]];

        (trialDurationAverage, trialDurationStandardDeviation)
    }

Notice that timeFunction expects its argument function to take zero arguments, as it executes function[].

How can we use timeFunction to time the execution of someFunction?

Well, if someFunction took at least two arguments, we could project:

q)someFunction : { [n; dummy] (n ? 10) * (n ? 10) }
q)someFunction[10000]
{[n; dummy] (n ? 10) * (n ? 10) }[10000]

This is one solution, then: amend someFunction to take a second dummy argument.

We can then pass it to timeFunction as follows:

q)timeFunction[10; 100; someFunction[10000]]
0.344 0.06069596

But what if we don't want to (or can't) amend someFunction? There is another solution. Keep someFunction as it was,

someFunction : { [n]
        (n ? 10) * (n ? 10)
    }

but introduce the following helper:

project1 : { [function; x; dummy]
        function[x]
    }

Now we obtain the following projection:

q)project1[someFunction, 10000]
{[function; x; dummy]
        function[x]
    }[({[n] (n ? 10) * (n ? 10) };10000)]

Which we can pass to timeFunction:

q)timeFunction[10; 100; project1[someFunction; 10000]]
0.343 0.06116371

Sampling with and without replacement

This is straightforward. If we want 5 random indices from 0, 1, 2, 3, ..., 10, with replacement, we do this:

q)5?10
2 2 8 1 8

Notice the duplicates. If, on the other hand, we would like to sample without replacement, we do this:

q)-5?10
1 3 6 4 7

Now there are no duplicates. Notice the minus sign above.

The results of the sampling come out of order. We can easily sort them by applying asc:

q)asc[-5?10]
`s#3 4 6 7 8

Notice that the attribute s# has been applied, which indicates that the list is sorted.

We could put these random indices to good use. For example, we could use them to index another list:

q)foo : (100 200 300 400 500 600 700 800 900 1000)
q)foo[asc[-5?10]]
100 400 800 900 1000
q)foo[asc[-2?10]] : (123 456)
q)foo
123 200 300 400 500 456 700 800 900 1000

Unary -

In K, the arithmetic operators are always dyadic, so - does not act as a unary function:

q)foo : 5
q)-foo
'-

You should use neg:

q)neg foo
-5
q)foo
5

(neg does not change foo itself.)

However, -5 is fine as a literal:

q)foo : -5
q)foo
-5

Setting multiple elements of a list to the same value

q)l : (100 200 300 400 500 600 700 800 900 1000)
q)l
100 8 8 400 500 600 700 800 900 1000
q)l[1 2] : 5
q)l
100 5 5 400 500 600 700 800 900 1000
q)l[(4 + (til 3)), (8 9)] : 8
q)l
100 5 5 400 8 8 8 800 8 8

Pre-initialising lists

If you know the size of your list you can preinitialise it like so:

q)l : 10#0N
q)l
0N 0N 0N 0N 0N 0N 0N 0N 0N 0N

We have created a list of 10 elements, each set to 0N (the null integer).

We could have set them to 1, for example:

q)l : 10#1j
q)l
1 1 1 1 1 1 1 1 1 1j

Null

There is a "null" in K/Q. It is 0N:

q)foo : 0N
q)foo
0N
q)foo * 5
0N

Copy or reference?

q)foo : (1 2 3)
q)bar : foo
q)bar[0] : 357
q)foo
1 2 3
q)bar
357 2 3

bar changed, foo didn't.

Let's investigate further.

q)foo : (1 2 3)
q)changeFoo : { [someFoo] someFoo[0] : 357 }
q)changeFoo[foo]
q)foo
1 2 3

So foo didn't change.

Telling Q to exit when the script execution is over

If you load a script

q myscript.q

q will not exit by default when it's done. It will hang in there.

The trick is to add the exit command

\\

to the end of your script.

This is very useful if you are chaining scripts, etc.

Applying a function to each element of a list

Suppose that we have

q)l : (1*1; 2*2; 3*3; 4*4; 5*5; 6*6; 7*7; 8*8)

In other words,

q)l
1 4 9 16 25 36 49 64

What if we want to take the square root of each element?

q)sqrt l
1 2 3 4 5 6 7 8f

This works. We gave sqrt a list to work on, and the result is a list of square roots.

Can we take type of each element in a similar manner?

q)type l
6h

No, we got the type of l itself. Not of its element.

We can use each:

q)each[type] l
-6 -6 -6 -6 -6 -6 -6 -6h

That worked.

Of course, we can also use

q)each[sqrt] l
1 2 3 4 5 6 7 8f

to take the square root of each element. This is more explicit and leads to fewer surprises.

Exploring \v

\v will give us all the variables in the default namespace:

q)\v
`ProfilingResults`a`gapProbability`l`lastDate`names`nextStandardGaussianRando..

Some of these may be symbols, others integers, still others functions... How can we tell them apart?

We need to use type. But first we need to capture \v into something we can work with — a list — using value:

q)v : value "\\v"
q)v
`ProfilingResults`a`gapProbability`l`lastDate`names`nextStandardGaussianRando..

We can now access all of these...

q)v[0]
`ProfilingResults

But what are they, what are their types?

q)type v[0]
-11h

Symbols. Well, ProfilingResults is actually a table; `ProfilingResults is a symbol.

How can we get at the type of ProfilingResults?

Use value again:

q)type[value[v[0]]]
98h

And if we want the types of everything in v:

q)each[type] each[value] v
98 -11 -9 6 -14 11 -9 -6 -6 11 -6 -6 -6 99h

Displaying a table's schema

First let us create and populate a table:

q)SomeTable : ([] foo:`float$(); bar:`float$())
q)SomeTable ,: (0.3 0.5)

How can we figure out what SomeTable's schema is given SomeTable?

q)0N!0#SomeTable;
+`foo`bar!(`float$();`float$())

0#t gives you the schema. The above will print it in k format so you can see all the types.

Working with lists

Let's create an example list. We are going to play with it now.

q)l : (10 20 30 40 50)

How many elements does it have?

q)count l
5

What is the first element?

q)first l
10

What is the last element?

q)last l
50

What is the first element again? (Notice that the following is very similar to first l only a list of a single element is returned, not the actual element.)

q)1#l
,10

What is the last element again? (Notice that the following is very similar to last l only a list of a single element is returned, not the actual element.)

q)-1#l
,50

What are the first two elements?

q)2#l
10 20

What are the first three elements?

q)3#l
10 20 30

What are the last two elements?

q)-2#l
40 50

What are the last three elements?

q)-3#l
30 40 50

A list of all l's elements, excluding the last one:

q)(count[l] - 1)#l
10 20 30 40

A list of all l's elements, excluding the last two:

q)(count[l] - 2)#l
10 20 30

A list made up of l's first, second, and fourth elements:

q)l[0 1 3]
10 20 40

Or

q)indexList : (0 1 3)
q)l[indexList]
10 20 40

Let's create a copy of the list but exclude all occurrences of 20:

q)l except 20
10 30 40 50

Or perhaps we want to exclude all occurrences of 20 and 30:

q)l except 20 30
10 40 50

What is the sum of the elements? There are several ways to compute this:

q)sum l
150

Or

q)0 +/ l
150

Here the over adverb (/) modifies a base dyadic function plus (+) so that the items of the second argument are applied iteratively to the first argument. Thus we are computing

((((0 + 10) + 20) + 30) + 40) + 50

Along the same lines we can compute the product of the elements:

q)1 */ l
12000000

which amounts to

((((1 * 10) * 20) * 30) * 40) * 50

Say we want to add two lists elementwise:

q)l1 : (50 40 30 20 10)
q)l + l1
60 60 60 60 60

Similarly we can multiply two lists elementwise:

q)l * l1
500 800 900 800 500

Subtract:

q)l - l1
-40 -20 0 20 40

And divide:

q)l % l1
0.2 0.5 1 2 5

What if we want to obtain a list of all elements of l negated?

q)each[neg] l
-10 -20 -30 -40 -50

Or perhaps we want a list of square roots of l's elements:

q)each[sqrt] l
3.162278 4.472136 5.477226 6.324555 7.071068

Sorting tables

Consider the following

q)TestTable : ([] foo:`symbol$(); bar:`symbol$())
q)insert [`TestTable] (`k; `a)
q)insert [`TestTable] (`k; `z)
q)insert [`TestTable] (`k; `l)
q)insert [`TestTable] (`a; `a)
q)insert [`TestTable] (`a; `b)
q)insert [`TestTable] (`a; `a)
q)show TestTable

So we have

foo bar
-------
k   a
k   z
k   l
a   a
a   b
a   a

If we want to sort this table first by bar, then by foo, i.e. the table will be sorted by foo and within each foo value bar will be sorted, we can use

q)SortedTestTable : `foo`bar xasc TestTable
q)show SortedTestTable

This will produce

foo bar
-------
a   a
a   a
a   b
k   a
k   l
k   z

But the original TestTable hasn't changed:

q)show TestTable
foo bar
-------
k   a
k   z
k   l
a   a
a   b
a   a

In order to sort it in place, use

q)`foo`bar xasc `TestTable
`TestTable
q)show TestTable
foo bar
-------
a   a
a   a
a   b
k   a
k   l
k   z

Of course, the order `foo`bar versus `bar`foo matters:

q)`bar`foo xasc `TestTable
`TestTable
q)show TestTable
foo bar
-------
a   a
a   a
k   a
a   b
k   l
k   z

Types of list

Let's declare some lists:

q)K : ()
q)L : `int$()
q)M : `symbol$()
q)N : `char$()

And have a look at them:

q)K
q)L
`int$()
q)M
`symbol$()
q)N
""

And their types:

q)type K
0h
q)type L
6h
q)type M
11h
q)type N
10h

K is special in the sense that its type hasn't been pre-declared. Currently its type is 0h. It can become a list of any type. That will happen when the first element is added:

q)K ,: 357
q)K
,357
q)type K
6h

Now the type of K is 6h (same as that of L). It is now a list of integers (L has been a list of integers from the start). We can't add anything but integers to K (from now on):

q)K ,: `foo
'type

It's like trying to add a non-integer to L or a non-symbol to M, etc. We get the 'type error:

q)L ,: `foo
'type
q)M ,: 357
'type

But of course we could have declare K as we did and then add a symbol to it:

q)K : ()
q)K ,: `foo
q)type K
11h
q)K
,`foo
q)K ,: 357
'type

In this case K becomes a list of symbols (on adding `foo, the first element) and we can't add anything but symbols to it.

Byte representation

It is possible to obtain the byte representation of K data structures:

q)a : 1 2 3 4 5
q)-8!a
0x01000000220000000600050000000100000002000000030000000400000005000000

Thus the variable a is 34 bytes of data (byte 1: 0x01, byte 2: 0x00, byte 3: 0x00, byte 4: 0x00, byte 5: 0x22, ...).

In this example a is a list, but -8!... works for many other things:

q)TestTable : ([sym:`symbol$(); date:`date$()] val:`float$())
q)upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2874)
`TestTable
q)-8!TestTable
0x0100000059000000636200630b000200000073796d0064617465000000020000000b0001000..

These are the actual bytes that would be piped through a channel if you were to get the data from Kdb+ remotely.

However, this does not necessarily represent the way these data structure are stored in memory by Kdb.

Determining how much memory a given data structure occupies

We have already discussed byte representations, which you can obtain by using -8!.... They also provide a way of obtaining the size of a data structure in memory:

q)TestTable : ([sym:`symbol$(); date:`date$()] val:`float$())
q)upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2874)
`TestTable
q)(count -8!TestTable) % 1.5
59.33333

We are dividing the result of count -8!TestTable by 1.5 as the result is a better estimate of the number of bytes used by Kdb+ to represent the data table in memory. (Thanks to Manish Patel for this empirical finding.) This is still an estimate.

Another, more exact, way of determining how much memory a data structure occupies is by using

q)\w
118048 67634176 0 0j

We can compare the values before and after:

q)\w
117376 67634176 0 0j
q)TestTable : ([sym:`symbol$(); date:`date$()] val:`float$())
q)upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2874)
`TestTable
q)\w
117984 67634176 0 0j
q)upsert [`TestTable] (`USDJPY; 2009.02.10; 1.2874)
`TestTable
q)\w
117984 67634176 0 0j
q)upsert [`TestTable] (`GBPUSD; 2009.02.10; 1.2874)
`TestTable
q)\w
118048 67634176 0 0j

This value is exact. We can see that a table with a single entry has resulted in 608 bytes of memory being occupied, which is significantly different from our estimate above. This is probably due to the meta-data and key, which occupy a lot of memory in relation to the table's size.

Joining strings with a given separator

Suppose you have a list of strings:

q)l : ("foo";"bar";"baz")

And you want to turn it into a comma-separated string:

q)"," sv l
"foo,bar,baz"

sv is a built-in function; the name stands for "string from vector".

Profiler

nop : { []
    }

one : { []
        1
    }

double : { [x]
        2 * x
    }

add : { [x; y]
        x + y
    }
// \d .profiler

preCall : { [functionName]
        ProfilingResults ,: (`CALL, functionName, .z.T);
    }

postCall : { [functionName]
        ProfilingResults ,: (`CALL, functionName, .z.T);
    }

getFunctionParameters : { [functionName]
        value[value[functionName]][1]
    }

getFunctionParametersAsString : { [functionName]
        ";" sv string[getFunctionParameters[functionName]]
    }

wrapFunction : { [functionName]
        functionNameString : string[functionName];
        wrappedFunctionNameString : functionNameString, "_profiled";
        parametersString : getFunctionParametersAsString[functionName];
        renameFunctionCode : wrappedFunctionNameString, " : ", functionNameString;
        value renameFunctionCode;
        preCallProfilingCode : "preCall[`", functionNameString, "]";
        postCallProfilingCode : "postCall[`", functionNameString, "]";
        wrapFunctionCode : functionNameString, " : { [", parametersString, "] ",  preCallProfilingCode, "; result : ", wrappedFunctionNameString, "[", parametersString, "]; ", postCallProfilingCode, "; result }";
        value wrapFunctionCode;
        wrapFunctionCode;
    }

// \d .

// Retrieve the function names from the global namespace
functionNames : value "\\f"

ProfilingResults : ([] action:`symbol$(); functionName:`symbol$(); time:`time$() )

wrapFunction each (`one`double`add`nop);

Sorting one list by the corresponding elements of another list

Suppose that we have two lists:

L1 : 0.45 0.34 0.12 0.43 0.32
L2 : 3    3    2    1    8

We would like to sort L1 by the corresponding elements of L2. In other words, we want to get the result

0.43 0.12 0.45 0.34 0.32

because the corresponding elements of L2 would then appear in ascending order:

1    2    3    3    8

iasc achieves precisely that:

q)L1@iasc L2
0.43 0.12 0.45 0.34 0.32

How does this work? iasc sorts L2 but returns the original indices of its elements instead of the elements themselves:

q)iasc L2
3 2 0 1 4

Then @ produces the elements of L1 at these indices:

q)L1@(3 2 0 1 4)
0.43 0.12 0.45 0.34 0.32

Zeroing out the seconds and milliseconds in a date

Use

getDateMinute : { [dateTime]
        dateTime - ((`time$dateTime) mod 60000)
    }

For example:

getDateMinute[2010.04.23T01:02:03.456]

produces

2010.04.23 01:02:00.000

Left join

Suppose we would like to filter one table by another table. This is best illustrated by a toy example:

PeopleTable : ( [] name:`symbol$(); surname:`symbol$() )
PeopleTable ,: (`John; `vonNeumann)
PeopleTable ,: (`Stephen; `Hawking)
PeopleTable ,: (`Daniel; `Bernoulli)
MonthsOfBirthTable : ( [] name:`symbol$(); surname:`symbol$(); month:`symbol$() )
MonthsOfBirthTable ,: (`John; `Nash; `June)
MonthsOfBirthTable ,: (`John; `vonNeumann; `December)
MonthsOfBirthTable ,: (`Stephen; `Smale; `July)
MonthsOfBirthTable ,: (`Stephen; `Hawking; `January)
MonthsOfBirthTable ,: (`Daniel; `Bernoulli; `February)
MonthsOfBirthTable ,: (`Jacob; `Bernoulli; `December)

Now we would like to select from MonthsOfBirthTable all the entries which correspond to people in PeopleTable.

The following works:

PeopleTable lj (`name`surname xkey MonthsOfBirthTable)

producing the following result:

name    surname    month
---------------------------
John    vonNeumann December
Stephen Hawking    January
Daniel  Bernoulli  February

Note that we had to create a key for the right-hand table (MonthsOfBirthTable) with xkey. If our MonthsOfBirthTable were a keyed table in the first place, i.e.

MonthsOfBirthTable : ( [name:`symbol$(); surname:`symbol$()] month:`symbol$() )

we could get away with

PeopleTable lj MonthsOfBirthTable

If the right-hand table is in memory, key creation is fairly inexpensive. If it is stored on disk (partitioned and splayed, thus not keyed) then it is wise to avoid referencing the entire table in one go. It is better to partition the join appropriately (e.g. by date, as is usually the case) so each partition is mapped into memory.

What happense if we have multiple rows per person in the right-hand table? Consider the following example. As before,

PeopleTable : ( [] name:`symbol$(); surname:`symbol$() )
PeopleTable ,: (`John; `vonNeumann)
PeopleTable ,: (`Stephen; `Hawking)
PeopleTable ,: (`Daniel; `Bernoulli)

Now we would like to select all rows corresponding to those in PeopleTable from the following table:

ContributionsTable : ( [] name:`symbol$(); surname:`symbol$(); contribution:`symbol$() )
ContributionsTable ,: (`John; `Nash; `EquilibriumPointsInNPersonGames)
ContributionsTable ,: (`John; `Nash; `TwoPersonCooperativeGames)
ContributionsTable ,: (`John; `Nash; `RealAlgebraicManifolds)
ContributionsTable ,: (`John; `vonNeumann; `GameTheory)
ContributionsTable ,: (`John; `vonNeumann; `CellularAutomata)
ContributionsTable ,: (`John; `vonNeumann; `UniversalConstructor)
ContributionsTable ,: (`Stephen; `Smale; `SphereEversion)
ContributionsTable ,: (`Stephen; `Smale; `PoincareConjectureForDimensionsGreaterThanOrEqualToFive)
ContributionsTable ,: (`Stephen; `Smale; `SmaleHorseshoe)
ContributionsTable ,: (`Stephen; `Hawking; `SingularitiesInGeneralRelativity)
ContributionsTable ,: (`Stephen; `Hawking; `HawkingRadiation)
ContributionsTable ,: (`Daniel; `Bernoulli; `CompoundMotions)
ContributionsTable ,: (`Daniel; `Bernoulli; `Hydrodynamique)
ContributionsTable ,: (`Jacob; `Bernoulli; `TranscendentalCurves)
ContributionsTable ,: (`Jacob; `Bernoulli; `Isoperimetry)
ContributionsTable ,: (`Jacob; `Bernoulli; `SeparableDifferentialEquations)

If we try, as before,

PeopleTable lj (`name`surname xkey ContributionsTable)

But this will only return the first row from ContributionsTable for John von Neumann, Stephen Hawking, and Daniel Bernoulli:

name    surname    contribution
---------------------------------------------------
John    vonNeumann GameTheory
Stephen Hawking    SingularitiesInGeneralRelativity
Daniel  Bernoulli  CompoundMotions

Whereas we want all the rows. One way to achieve this is to xgroup ContributionsTable by name and surname first:

`name`surname xgroup ContributionsTable
name    surname   | contribution
------------------| --------------------------------------------------------------------------------------
John    Nash      | `EquilibriumPointsInNPersonGames`TwoPersonCooperativeGames`RealAlgebraicManifolds
John    vonNeumann| `GameTheory`CellularAutomata`UniversalConstructor
Stephen Smale     | `SphereEversion`PoincareConjectureForDimensionsGreaterThanOrEqualToFive`SmaleHorseshoe
Stephen Hawking   | `SingularitiesInGeneralRelativity`HawkingRadiation
Daniel  Bernoulli | `CompoundMotions`Hydrodynamique
Jacob   Bernoulli | `TranscendentalCurves`Isoperimetry`SeparableDifferentialEquations

Then do the left join. Then ungroup it back. All in one go:

ungroup (PeopleTable lj (`name`surname xgroup ContributionsTable))

Which gives us

name    surname    contribution
---------------------------------------------------
John    vonNeumann GameTheory
John    vonNeumann CellularAutomata
John    vonNeumann UniversalConstructor
Stephen Hawking    SingularitiesInGeneralRelativity
Stephen Hawking    HawkingRadiation
Daniel  Bernoulli  CompoundMotions
Daniel  Bernoulli  Hydrodynamique

as required.

We would like to thank Daniel Lister for his comments.

Functional select

You can read about the functional select here: https://code.kx.com/trac/wiki/QforMortals2/queries_q_sql#Functionalselect

Here we shall give some examples.

Remember that the functional select is

?[t;c;b;a]

where t is the table, c the contraints, b the by-phrase, and a the aggregations.

Example 0

?[QuotesTable; enlist 1b; 0b; cols[QuotesTable]!cols[QuotesTable]]

or the simpler

?[QuotesTable; enlist 1b; 0b; ()]

is equivalent to

select from QuotesTable

Example 1

select from QuotesTable where sym=`AUDCAD

is equivalent to

?[QuotesTable;(enlist (=; `sym; enlist `AUDCAD)); 0b; ()]

If the type of the column is `symbol$(), then sym=`AUDCAD corresponds to (enlist (=; `sym; enlist `AUDCAD)) above. Otherwise, we would use something like (enlist (=; `bid; 1.55)) (without the second enlist).

Example 2

This is a small variation on the previous example. We still want to

select from QuotesTable where sym=`AUDCAD

but we want to be more flexible. In this case, the table name (`QuotesTable) and "where" column name (`sym) are stored in variables:

tableName : `QuotesTable;
columnName : `sym;

and our functional select becomes

?[tableName;(enlist (=; columnName; enlist `AUDCAD)); 0b; ()]

This works, just as before.

Example 3

select from QuotesTable where bid > 1.5

is equivalent to

?[QuotesTable;(enlist (>; `bid; 1.5)); 0b; ()]

Example 4

select from QuotesTable where bid <> 0n

is equivalent to

?[QuotesTable;(enlist (<>; `bid; 0n)); 0b; ()]

Example 5

select bid from QuotesTable where sym=`AUDCAD

is equivalent to

?[QuotesTable;(enlist (=; `sym; enlist `AUDCAD)); 0b; (enlist `bid)!(enlist `bid)]

Comment

One of the most powerful features of this syntax is the ability to dynamically select a column, for example:

col : `bid
?[QuotesTable;(enlist (=; `sym; enlist `AUDCAD)); 0b; (enlist `dataPoint)!(enlist col)]

Example 6

select bid, ask from QuotesTable where sym=`AUDCAD

is equivalent to

?[QuotesTable;(enlist (=; `sym; enlist `AUDCAD)); 0b; `bid`ask!`bid`ask]

Example 7

select date, time, bid, offer from QuotesTable where date = 2012.03.08, market = `Equities, sym = `FTSE

is equivalent to

?[QuotesTable; ((=; `date; 2012.03.08); (=; `market; enlist `Equities); (=; `sym; enlist `FTSE)); 0b; `date`time`bid`offer!`date`time`bid`offer]

Listing variables (including tables) and functions defined on an instance

\v

lists the variables (including tables) in the global namespace (`.).

\f

lists the functions in the global namespace.

Deleting variables (including tables) and functions defined on an instance

To delete a variable, use

delete MyTableName from `.

and to delete a function, use

delete myFunctionName from `.

do the trick. You can check that the variable (or function) was deleted by running \v (or \f).

Concatenating compound columns on selecting from tables

Suppose that we define

ExampleTable1 : ([] sym:`symbol$(); list1:(); list2:());
ExampleTable1 ,: (`foo; (0.12 0.32 0.43); (0.32 0.23 0.43));
ExampleTable1 ,: (`foo; (0.32 0.33 0.54); (0.65 0.23 0.32));
ExampleTable1 ,: (`foo; (0.93 0.92 0.90); (0.90 0.89 0.54));

i.e. we have

sym list1          list2         
---------------------------------
foo 0.12 0.32 0.43 0.32 0.23 0.43
foo 0.32 0.33 0.54 0.65 0.23 0.32
foo 0.93 0.92 0.9  0.9  0.89 0.54

and we want to get

sym combinedList                 
---------------------------------
foo 0.12 0.32 0.43 0.32 0.23 0.43
foo 0.32 0.33 0.54 0.65 0.23 0.32
foo 0.93 0.92 0.9  0.9  0.89 0.54

This is achieved as follows:

select sym, combinedList: (list1, 'list2) from ExampleTable1

Applying a function to each date in a given list

It is very often useful to partition queries by dates, especially if this reflects the way in which your tables are stores.

Here is a simple toy example to illustrate the syntax.

displayIncrementedDate : { [message; date; increment]
        0N!message, string[date + increment];
        1b
    }

displayIncrementedDate["Date: ";;3] each (2010.05.11T13:06:00.000 + til 10)

This returns the result, 1111111111b, which is a list of booleans, and also displays

"Date: 2010.05.14T13:06:00.000"
"Date: 2010.05.15T13:06:00.000"
"Date: 2010.05.16T13:06:00.000"
"Date: 2010.05.17T13:06:00.000"
"Date: 2010.05.18T13:06:00.000"
"Date: 2010.05.19T13:06:00.000"
"Date: 2010.05.20T13:06:00.000"
"Date: 2010.05.21T13:06:00.000"
"Date: 2010.05.22T13:06:00.000"
"Date: 2010.05.23T13:06:00.000"

Partitioning a time interval by hour

The following function does the trick:

partitionTimeIntervalByHour : { [beginTime; endTime]
        (beginTime + ((til (ceiling[(endTime - beginTime) % 01:00:00.000])) * 01:00:00.000)), endTime
    }

For example:

beginTime : 01:00:00.000;
endTime : 03:00:00.000;
partitionTimeIntervalByHour[beginTime; endTime]
// Result: 01:00:00.000 02:00:00.000 03:00:00.000

beginTime : 01:00:00.000;
endTime : 01:00:00.000;
partitionTimeIntervalByHour[beginTime; endTime]
// Result: enlist 01:00:00.000

beginTime : 01:30:55.123;
endTime : 05:15:35.012;
partitionTimeIntervalByHour[beginTime; endTime]
// Result: 01:30:55.123 02:30:55.123 03:30:55.123 04:30:55.123 05:15:35.012

Processing the resulting intervals

This section is a sequel of the previous one. Suppose that we have partitioned our interval.

As before,

beginTime : 01:30:55.123;
endTime : 05:15:35.012;
times : partitionTimeIntervalByHour[beginTime; endTime];

Thus times has been set to

01:30:55.123 02:30:55.123 03:30:55.123 04:30:55.123 05:15:35.012

Now we want to process the resulting intervals, interval by interval, by applying, say,

processTimeInterval : { [beginTime; endTime]
        0N!"from ", string[beginTime], " to ", string[endTime]
    }

For example,

processTimeInterval[01:30:55.123;02:30:55.123]

will display the string "from 01:30:55.123 to 02:30:55.123" and also return it. Of course processTimeInterval could have been a lot smarter and more useful but this is not the point. The point is that we would like to apply processTimeInterval to each interval in our partition, i.e. we want "something" that will result in the following function calls:

processTimeInterval[01:30:55.123;02:30:55.123];
processTimeInterval[02:30:55.123;03:30:55.123];
processTimeInterval[03:30:55.123;04:30:55.123];
processTimeInterval[04:30:55.123;05:15:35.012];

Our first step is to produce the time intervals from times:

timeIntervals : -1_2#/:(0_til count times)_\:times;

Thus the variable timeIntervals has been set to:

(01:30:55.123 02:30:55.123;02:30:55.123 03:30:55.123;03:30:55.123 04:30:55.123;04:30:55.123 05:15:35.012)

Our next step is to apply processTimeInterval to each of the resulting time intervals:

results : (processTimeInterval .) each timeIntervals;

which produces the following results:

("from 01:30:55.123 to 02:30:55.123";"from 02:30:55.123 to 03:30:55.123";"from 03:30:55.123 to 04:30:55.123";"from 04:30:55.123 to 05:15:35.012")

We would like to thank Mahesh BP for his advice.

Supplying function arguments in a list

Suppose that you have the following function:

addUp : { [x; y] x + y }

You can, of course, call it as follows:

addUp[3; 5]

The result is 8. But what if you have

L : 3 5

And you want to do

addUp[L[0]; L[1]]

A more elegant way to do this is:

addUp . L

How to produce a string representation of a list of symbols

Suppose that we have

L : `foo`bar`baz

i.e. a list of symbols, and we would like to have

"`foo`bar`baz"

a string. This is particularly useful when one is constructing a query to be sent to a remote server. We can achieve this as follows:

raze "`", ' string L

Checking if a date is a weekend

isWeekend : { [date]
        weekOffset : date - `week$date;
        (weekOffset = 5) or (weekOffset = 6)
    }

Upper join versus asof-join

Suppose that we have the following three tables:

Table1 : ([] dateTime:`date$(); dataPoint1:`float$());
Table1 ,: (2010.05.18; 3.57);
Table1 ,: (2010.05.20; 3.58);
Table1 ,: (2010.05.21; 3.58);
Table1 ,: (2010.05.22; 3.59);
Table1 ,: (2010.05.23; 3.60);

Table2 : ([] dateTime:`date$(); dataPoint2:`float$());
Table2 ,: (2010.05.19; 1.23);
Table2 ,: (2010.05.20; 1.24);
Table2 ,: (2010.05.21; 1.25);
Table2 ,: (2010.05.24; 1.26);
Table2 ,: (2010.05.25; 1.25);

Table3 : ([] dateTime:`date$(); dataPoint3:`float$());
Table3 ,: (2010.05.17; 3.23);
Table3 ,: (2010.05.18; 3.24);
Table3 ,: (2010.05.21; 3.25);

We are interested in aligning the time series. There are several ways to achieve this involving different kinds of joins. We are going to use this example to demonstrate the semantics of these joins.

asof-join

First, the asof-join:

aj[`dateTime;aj[`dateTime;Table1;Table2];Table3]

This produces the following result:

dateTime    dataPoint1  dataPoint2  dataPoint3
2010.05.18  3.57        0n          3.24
2010.05.20  3.58        1.24        3.24
2010.05.21  3.58        1.25        3.25
2010.05.22  3.59        1.25        3.25
2010.05.23  3.6         1.25        3.25

For some purposes this might be what we need. Observe that the dateTimes are sorted and they are exactly those of Table1. We have lost some date-times — the ones that are in Table2 or Table3 but not in Table1.

Union join

This brings us to our next item: the union join. Consider this:

(uj/)1!'(Table1;Table2;Table3)

The result is:

dateTime   | dataPoint1  dataPoint2  dataPoint3
-----------------------------------------------
2010.05.18 | 3.57        0n          3.24
2010.05.20 | 3.58        1.24        0n
2010.05.21 | 3.58        1.25        3.25
2010.05.22 | 3.59        0n          0n
2010.05.23 | 3.6         0n          0n
2010.05.19 | 0n          1.23        0n
2010.05.24 | 0n          1.26        0n
2010.05.25 | 0n          1.25        0n
2010.05.17 | 0n          0n          3.23

This is not quite what we wanted. Sure, we have a union of dateTimes from Table1, Table2, and Table3, but they are no longer sorted and they are no longer filled. We need to do some more work:

fills `dateTime xasc (uj/)1!'(Table1;Table2;Table3)

In order to get:

dateTime   | dataPoint1  dataPoint2  dataPoint3
-----------------------------------------------
2010.05.17 | 0n          0n          3.23
2010.05.18 | 3.57        0n          3.24
2010.05.19 | 3.57        1.23        3.24
2010.05.20 | 3.58        1.24        3.24
2010.05.21 | 3.58        1.25        3.25
2010.05.22 | 3.59        1.25        3.25
2010.05.23 | 3.6         1.25        3.25
2010.05.24 | 3.6         1.26        3.25
2010.05.25 | 3.6         1.25        3.25

How efficient is this? Iteratively, we could achieve this in O(N1 + N2 + N3) operations. Looks like this query is suboptimal, as you do it in stages: (1) uj, (2) xasc, (3) fills. We would be interested in hearing a better solution!

Many thanks to Alex Donohue and Attila Vrabecz for their comments.

Multiplying out the columns of a table

Let us again consider the table we have just created:

Table1 : ([] dateTime:`date$(); dataPoint1:`float$());
Table1 ,: (2010.05.18; 3.57);
Table1 ,: (2010.05.20; 3.58);
Table1 ,: (2010.05.21; 3.58);
Table1 ,: (2010.05.22; 3.59);
Table1 ,: (2010.05.23; 3.60);

Table2 : ([] dateTime:`date$(); dataPoint2:`float$());
Table2 ,: (2010.05.19; 1.23);
Table2 ,: (2010.05.20; 1.24);
Table2 ,: (2010.05.21; 1.25);
Table2 ,: (2010.05.24; 1.26);
Table2 ,: (2010.05.25; 1.25);

Table3 : ([] dateTime:`date$(); dataPoint3:`float$());
Table3 ,: (2010.05.17; 3.23);
Table3 ,: (2010.05.18; 3.24);
Table3 ,: (2010.05.21; 3.25);

Table : fills `dateTime xasc (uj/)1!'(Table1;Table2;Table3)

i.e.

dateTime   | dataPoint1  dataPoint2  dataPoint3
-----------------------------------------------
2010.05.17 | 0n          0n          3.23
2010.05.18 | 3.57        0n          3.24
2010.05.19 | 3.57        1.23        3.24
2010.05.20 | 3.58        1.24        3.24
2010.05.21 | 3.58        1.25        3.25
2010.05.22 | 3.59        1.25        3.25
2010.05.23 | 3.6         1.25        3.25
2010.05.24 | 3.6         1.26        3.25
2010.05.25 | 3.6         1.25        3.25

Our next goal is to multiply out the columns dataPoint1, dataPoint2, and dataPoint3.

Solution 1

One way to achieve this is:

NonKeyedTable : () xkey Table;
Result : NonKeyedTable[`dataPoint1] * NonKeyedTable[`dataPoint2] * NonKeyedTable[`dataPoint3];
ResultTable : flip[`dateTime`dataPoint!(NonKeyedTable[`dateTime]; Result)]

and ResultTable becomes

dateTime    dataPoint
---------------------
2010.05.17  0n
2010.05.18  0n
2010.05.19  14.227164
2010.05.20  14.383008
2010.05.21  14.54375
2010.05.22  14.584375
2010.05.23  14.625
2010.05.24  14.742
2010.05.25  14.625

Now suppose that we don't know the names of the columns to be added in advance. We just want to multiply together all the columns save for the first one. Here is one way to do this:

NonKeyedTable : () xkey Table;
ColumnsToMultiply : NonKeyedTable[1_(cols NonKeyedTable)];
ResultTable : flip[`dateTime`dataPoint!(NonKeyedTable[`dateTime]; 1*/ColumnsToMultiply)]

The result is exactly the same. This time we didn't have to enumerate the columns explicitly.

Solution 2

This can be achieved a lot more succintly:

select dateTime, prd 1_flip 0!Table from Table

Courtesy of Attila Vrabecz.

Select with filtering

Suppose that you have a table TimeSeriesTable like this:

dateTime                 dataPoint
----------------------------------
2010.05.19 00:00:00.061	 1.23235
2010.05.19 00:00:00.063	 1.23235
2010.05.19 00:00:00.067	 1.23236
2010.05.19 00:00:00.087	 1.23235
2010.05.19 00:00:00.143	 1.23235
2010.05.19 00:00:00.144	 1.23235
2010.05.19 00:00:00.147	 1.23236
2010.05.19 00:00:00.238	 1.23236
2010.05.19 00:00:00.341	 1.21446
2010.05.19 00:00:00.345	 1.21446
2010.05.19 00:00:00.346	 1.21447
2010.05.19 00:00:00.347	 1.21448
...

You want to enable the user to provide a custom filter for selecting data from this table.

You can provide the following function:

getTimeSeries : { [filter]
        select from TimeSeriesTable where filter[dateTime; dataPoint]
    }

You can then prepare a useful filter:

timeIntervalTimeSeriesFilter : { [beginTime; endTime; dateTime; dataPoint]
        (`time$dateTime) within (beginTime; endTime)
    }

which the user can specialise and use as follows:

fltr : timeIntervalTimeSeriesFilter[00:30:00.000; 01:00:00.000]
getTimeSeries[fltr]

Conjunction and disjunction of filters

Let us prototype this first.

isOdd : { [x] mod[x; 2] }
isSmallPrime : { [x] (x = 2) or (x = 3) or (x = 5) or (x = 7) }

Let's test the following.

a : 2;

As expected,

1b and/ ({ x[a] } each (isOdd; isSmallPrime))

returns 0b because 2 isn't odd (although it is a small prime) and

0b or/ ({ x[a] } each (isOdd; isSmallPrime))

returns 1b because it is a small prime (even though it isn't odd).

Similarly,

a : 3
1b and/ ({ x[a] } each (isOdd; isSmallPrime))

returns true because 3 is both odd and a small prime and

0b or/ ({ x[a] } each (isOdd; isSmallPrime))

returns true for the very same reason.

We are now ready to develop further our previous example and introduce conjunction and disjunction of filters:

andFilter : { [filters; dateTime; dataPoint]
        1b and/ ({ [dt; dp; x] x[dt; dp] }[dateTime; dataPoint] each filters)
    }

orFilter : { [filters; dateTime; dataPoint]
        0b or/ ({ [dt; dp; x] x[dt; dp] }[dateTime; dataPoint] each filters)
    }

Now we want the data points that are either in the interval [00:30:00.000; 00:31:00.000] or in the interval [00:01:00.000; 00:02:00.000]. So we construct the following filter:

fltr : andFilter[(timeIntervalTimeSeriesFilter[00:30:00.000; 00:31:00.000]; timeIntervalTimeSeriesFilter[00:01:00.000; 00:02:00.000])]

And we can apply it as before

getTimeSeries[fltr]

Selecting N values preceding or following a given date-time

Suppose you have the following table:

TestTable : ( [] dateTime:`datetime$(); dataPoint:`float$() );
TestTable ,: (2010.05.28T13:32:02.000; 1.32);
TestTable ,: (2010.05.28T13:33:01.000; 1.52);
TestTable ,: (2010.05.28T13:33:02.000; 1.66);
TestTable ,: (2010.05.28T13:36:01.000; 1.73);
TestTable ,: (2010.05.28T13:40:02.000; 1.74);
TestTable ,: (2010.05.28T13:45:03.000; 1.75);

i.e.

dateTime                 dataPoint
----------------------------------
2010.05.28 13:32:02.000  1.32
2010.05.28 13:33:01.000  1.52
2010.05.28 13:33:02.000  1.66
2010.05.28 13:36:01.000  1.73
2010.05.28 13:40:02.000  1.74
2010.05.28 13:45:03.000  1.75

For a given list of datetimes

ListOfDateTimes : 2010.05.28T13:32:00.000 2010.05.28T13:33:00.000 2010.05.28T13:34:00.000 2010.05.28T13:35:00.000 2010.05.28T13:36:00.000 2010.05.28T13:37:00.000 2010.05.28T13:38:00.000;

we would like to select the three (more generally, N) data points that precede each date-time and the three (more generally, N) data points that follow that date-time.

Here is how this can be achieved:

select ListOfDateTimes, dataPoint(dateTime bin ListOfDateTimes)-\:reverse til 3 from TestTable

which gives

ListOfDateTimes          x
---------------------------------------
2010.05.28 13:32:00.000  0n 0n 0n
2010.05.28 13:33:00.000  0n 0n 1.32
2010.05.28 13:34:00.000  1.32 1.52 1.66
2010.05.28 13:35:00.000  1.32 1.52 1.66
2010.05.28 13:36:00.000  1.32 1.52 1.66
2010.05.28 13:37:00.000  1.52 1.66 1.73
2010.05.28 13:38:00.000  1.52 1.66 1.73

and

select ListOfDateTimes, dataPoint(dateTime bin ListOfDateTimes)+\:1+til 3 from TestTable

which gives

ListOfDateTimes          x
---------------------------------------
2010.05.28 13:32:00.000  1.32 1.52 1.66
2010.05.28 13:33:00.000  1.52 1.66 1.73
2010.05.28 13:34:00.000  1.73 1.74 1.75
2010.05.28 13:35:00.000  1.73 1.74 1.75
2010.05.28 13:36:00.000  1.73 1.74 1.75
2010.05.28 13:37:00.000  1.74 1.75 0n
2010.05.28 13:38:00.000  1.74 1.75 0n

Many thanks to Attila Vrabecz and Ed Bierly.

Functions with more than 8 parameters

There is a restriction in q: a function can have at most eight parameters. This restriction is dictated by the bytecode.

Suppose we try this:

h : { [arg1;arg2;arg3;arg4;arg5;arg6;arg7;arg8;arg9]
        "It worked!"
    }

Actually, it didn't work, because we get:

An error occurred during execution of the query.
The server sent the response:
params
Studio Hint: Possibly this error refers to too many parameters (8 max)

But there is a way around it: composing the function with plist:

f : (')[ { [args]
        count args
    };plist]

This is a very trivial function, of course, that simply returns the number of arguments. But it could do something smarter, of course.

It works:

q)f[1]
1

q)f[1;2]
2

q)f[1;2;3;4;5;6;7;8;9;10]
10
Now we can redefine
h
as something like this:
h : (')[ { [args]
        arg1 : args[0];
        arg2 : args[1];
        arg3 : args[2];
        arg4 : args[3];
        arg5 : args[4];
        arg6 : args[5];
        arg7 : args[6];
        arg8 : args[7];
        arg9 : args[8];

        // Do stuff
    };plist]

Many thanks to Ben Delo.

Interrupting a query without killing the q instance

Suppose you have a long-running query that you want to interrupt without killing the server. This can be done by issuing a SIGINT:

kill -2 12345

where 12345 is the relevant process ID which you can find out using ps or top. top very often comes in useful here as chances are the q instance in question will be maxing out on its CPU usage, so will appear right at the top of top.

You may have to wait for a minute or so until the signal is picked up. If several queries are queuing, you may have to repeat the command several times.

This requires that you have shell access to the relevant q server, which may not be the case. If indeed this is not the case, you may want to ask your system administrator to provide some infrastructure that would automate this task and enable you to access that infrastructure.

Adding a column to a table

Suppose you have your new column data in a list, myColumnData. The following will add it to the end of myTable:

myNewTable : update myColumn:myColumnData from myTable

To the beginning:

myNewTable : `myColumn xcols update myColumn:myColumnData from myTable

After N columns:

myNewTable : ((N sublist cols myTable), `myColumn) xcols update myColumn:myColumnData from myTable

Many thanks to Charlie Skelton for this contribution.

Applying a function to each value in a list

multiplyByFactor : { [factor; val] factor * val }
f : multiplyByFactor[3f];
f':[(1f; 2f; 3f; 4f; 5f)]

The result is:

3 6 9 12 15f

Updating tables

For example,

TestTable : ([] sym:`symbol$(); date:`date$(); val:`float$());
upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2874);
upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2875);
upsert [`TestTable] (`EURUSD; 2009.02.11; 1.2901);

creates

sym     date        val
--------------------------
EURUSD  2009.02.10  1.2874
EURUSD  2009.02.10  1.2875
EURUSD  2009.02.11  1.2901

Now we can scale the val by 10 for date = 2009.02.10:

update val:val*10 from `TestTable where date = 2009.02.10

Since in the above update we used `TestTable rather than just TestTable, the table got modified in place rather than copied, and we can check that

TestTable

is now

sym     date        val
--------------------------
EURUSD  2009.02.10  12.874
EURUSD  2009.02.10  12.875
EURUSD  2009.02.11  1.2901

Signalling errors and protected evaluation

It is straightforward to signal an error. Here is an example of a function that will always signal '`ouch:

dodgyFunction : { []
        '`ouch
    }

Indeed, when we try calling it from Studio,

dodgyFunction[]

we get

An error occurred during execution of the query.
The server sent the response:
ouch

There is an equivalent to try-catch in Q. It is called protected evaluation. We shall illustrate it by examples.

niceFunction : { []
        123
    }

dodgyFunction : { []
        '`ouch
    }

niceFunction[]
// Result: 123

dodgyFunction[]
// Result: ouch signal

niceResult : @[niceFunction; 0n; "failure"];
// Result: niceResult = 123

dodgyResult : @[dodgyFunction; 0n; "failure"];
// Result: dodgyResult = "failure"

For functions taking a single parameter (i.e. monadic functions) the syntax is identical:

monadicNiceFunction : { [x]
        3 * x
    }

monadicDodgyFunction : { [x]
        '`ouch
    }

monadicNiceFunction[5]
// Result: 15

monadicDodgyFunction[5]
// Result: ouch signal

niceResult : @[monadicNiceFunction; 5; "failure"];
// Result: niceResult = 15

dodgyResult : @[monadicDodgyFunction; 5; "failure"];
// Result: dodgyResult = "failure"

But for multivalent functions (diadic, triadic, etc., i.e. taking more than a single parameter), we need to use a slightly different syntax for protected evaluation, passing arguments in a list:

diadicNiceFunction : { [x; y]
        x + y
    }

diadicDodgyFunction : { [x; y]
        '`ouch
    }

diadicNiceFunction[3; 5]
// Result: 8

diadicDodgyFunction[3; 5]
// Result: ouch signal

niceResult : .[diadicNiceFunction; (3; 5); "failure"];
// Result: niceResult = 8

dodgyResult : .[diadicDodgyFunction; (3; 5); "failure"];
// Result: dodgyResult = "failure"

Setting multiple elements of a list in one go

Suppose we have

zzz : 10#0n

i.e.

0n 0n 0n 0n 0n 0n 0n 0n 0n 0n

We can set the first five elements to 1.0 as follows:

zzz[til[5]] : 5#1f

We can also set zzz[6] to 2.0 and zzz[8] to 3.0 in one go:

zzz[(6 8)] : (2 3f)

The end result is

1 1 1 1 1 0n 2 0n 3 0n

The syntax for specifying full path names in `:

It is

`:C:/my/path/and/file/name.csv

on Windows and

`:/my/path/and/file/name.csv

on *nix.

If your paths contain any non-kosher characters, such as dashes or spaces, you may wish to do this:

`":C:/my/path/and-file/name.csv"

However, when you come to run

`":C:/my/path/and-file/name.csv" set MyTable

you may see a "type" error. This is because of the order of evaluation (from right to left). You can correct it by bracketing:

(`":C:/my/path/and-file/name.csv") set MyTable

Unexplained examples

{x|y} . (1010110b; 0100010b)
/ 1110110b

(or) . (1010110b; 0100010b)
/ 1110110b

/ NOT
/ or . (1010110b; 0100010b)
/ <tt>(or)</tt> is a noun whereas <tt>or</tt> is a verb.

(or) . each[("aaa"; "bbb"; "abc"; "foo"; "bar") like] ("a*"; "b*")
/ 11101b
/ This only works with two arguments because or is diadic

each["pre" ,] ("fix";"view";"lude")
/ ("prefix";"preview";"prelude")

("enjoy";"invest";"postpone") ,\: "ment"
/ ("enjoyment";"investment";"postponement")

00000b(or)/(01000b;01010b;01001b)
/ 01011b

0b(or)/(01000b;01010b;01001b)
/ 01011b

enlist[`ColumnName] _ MyTable
/ ColumnName deleted from MyTable

l: `foo`bar`baz`bar`baz
l _ l?`bar
/ Deletes the first occurrence of bar

9 (1+)\ 2011.01.01
/ 2011.01.01 2011.01.02 2011.01.03 2011.01.04 2011.01.05 2011.01.06 2011.01.07 2011.01.08 2011.01.09 2011.01.10

{ 0N!x } each 9 (1+)\ 2011.01.01

firstDate: 2011.01.01;
lastDate: 2011.01.05;
(lastDate-firstDate) (1+)\ firstDate

`foo in (`foo`bar`baz;`foo;`bar`baz)
// 0b
`foo in '(`foo`bar`baz;`foo;`bar`baz)
// 110b

.z.t <-- UTC (GMT)
.z.T <-- server time
.z.t - .z.T

// Casting the column of a table
TestTable : ([] sym:`symbol$(); date:`date$(); val:`float$());
upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2874);
upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2875);
upsert [`TestTable] (`EURUSD; 2009.02.11; 1.2901);
meta TestTable
// val is of type f
TestTable[`val] : `real$TestTable[`val]
meta TestTable
// val is now of type e

enlist[`minute] _ () xkey (select first date, first sym, first time, first bid, first ask by minute: (date + time.minute) from quotes where date within (firstDate; lastDate), sym=`XYZ)

// Concatenate multiple lists or dictionaries into one
() ,/ ((`a`b`c!`A`B`C); (`d`e`f!`D`E`F); (`g`h!`G`H))
// `a`b`c`d`e`f`g`h!`A`B`C`D`E`F`G`H

// Merging multiple columns into one
select time, Sizes: (Size0,'Size1,'Size2,'Size3,'Size4) from Bids where date = 2012.09.25, sym = cusip

// We can then do something like this:
select time, sum each Sizes from
select time, Sizes: (Size0,'Size1,'Size2,'Size3,'Size4) from Bids where date = 2012.09.25, sym = cusip
  • This page was last modified on 11 August 2015, at 17:25.
  • This page has been accessed 243,935 times.