# Knowledge Base/Databases/Kdb

## Acknowledgements

We would like to thank Ed Bierly, Jeff Borror, Mahesh BP, Bernard Cena, Aaron Davies, Ben Delo, Alex Donohue, Mark Gardner, Daniel Lister, Manish Patel, Charlie Skelton, Attila Vrabecz, and of course Arthur Whitney for their help and guidance.

## Kdb+ in a nutshell

This is a very, very brief introduction to kdb+. Paul Bilokon has previously delivered it as a presentation in under 20 minutes. In order to understand it properly, we recommend that you install kdb+. You can download the 32-bit version for free from

You can play around with the 32-bit version but you are not permitted to use it in production. Also, 32-bit programs can only access up to 4 GB of Random Access Memory (RAM, i.e. "the" memory) whereas 64-bit programs can access more than 128 GB of RAM. This is a pretty strong restriction; most production users of this in-memory database (IMDB) would want to access more than 4 GB of RAM. Moreover, 64-bit programs offer increased performance.

I just called kdb+ an in-memory database, i.e. a database management system that primarily relies on main memory for computer data storage. Yes, kdb+ is an in-memory database. But is this the best description for kdb? What is it, really?

You will be pleased to know that kdb+ is really an interpreted programming language. You can call it with a script file to execute, like so

C:\q\q.exe foo.q


or use it as an interactive console

C:\q\q.exe


(Keep it running.)

In this sense it is no different from Python or Perl.

So we can compute things in kdb+. Simple things, like these:

q)5 + 3
8


(q) above is the prompt, very much like the Unix prompt.)

Or these. Here we are concatenating strings:

q)"Hello ", "World!"
"Hello World!"


But more often than not you will deal with symbols (a pretty special concept in kdb+) rather than strings, because these are usually indexed and stored efficiently:

q)a : foo
q)a
foo

Oh, we have just declared (and defined) a variable without giving you any warning. Yes,
a
is a variable and we didn't use "a = ..." but rather "a : ..." because K and Q likes to be different. :-)

Hang on, we are talking about kdb+, what are K and Q? These are the programming languages behind kdb+. K is a very terse (and difficult to read, for humans) programming language invented by Arthur Whitney and it became the foundation of kdb+. Q is essentially a wrapper around K which uses human words, such as "each" so it is much easier to understand. It's built on top of K. We will ignore the distinction between K and Q and refer to the language as "Q" in the sequel. It turns out that we have been programming in Q all along.

So kdb+ is really this Q programming language plus some philosophy. It's nice to know this fact. Now it is clear why the kdb+ executable is called q.exe (q on Unix). It is a Q interpreter.

What makes it an in-memory database?

Well, the fact that we can declare variables and store data in-memory, like so:

q)a : 3
q)b : 5
q)a + b
8


And these variables don't have to be scalars. They can be lists:

q)l1 : foobarbaz
q)l1
foobarbaz
q)l2 : 1 2 3 4 5 6 7 8
q)l2
1 2 3 4 5 6 7 8


Dictionaries:

q)d1 : nameage!(MethuselahJaredNoah; 969 962 950)
q)d1
name| Methuselah Jared Noah
age | 969        962   950


And, more importantly perhaps, tables:

q)t1 : ([] name:symbol$(); age:int$())
q)t1
name age
--------


where we can store data:

q)t1 ,: (Methuselah, 969)
q)t1 ,: (Jared, 962)
q)t1 ,: (Noah, 950)
q)t1
name       age
--------------
Methuselah 969
Jared      962
Noah       950


Note that we could also define t1 by flipping a dictionary:

q)t1 : flip[d1]
q)t1
name       age
--------------
Methuselah 969
Jared      962
Noah       950


This table now "lives" in memory. It is in this sense that kdb+ is an in-memory database. And so are Python and Perl. But kdb+ comes with many useful pleasantries which Python and Perl don't have. And it has a lot of clever optimisation, such as indexing and caching, under the covers. These make it more deserving of the title than the aforementioned Python and Perl.

Notice that we can "put a port" on our kdb+ process:

q)\p 19123
q)


Then open a browser and go to http://localhost:19123/

We will see the aforementioned variables names a, b, l1, l2, d1, t1 in the frame on the left. We can click on them and in the right frame we'll see their values. E.g. for t1:

name       age
--------------
Methuselah 969
Jared      962
Noah       950


Great, so we got more than we bargained for. Kdb+ comes with a network interface that our browser can understand. We can even issue Q commands via the browser:

http://localhost:19123/?count t1


will give us the number of rows in t1, i.e.

3


We can use some SQL syntax in Q:

q)select age from t1 where name=Noah
age
---
950


We can also save tables to binary files:

q)($":H:/myt1.dat") set t1 :H:/myt1.dat  And load them from those binary files later on: q)t2 : get ($":H:/myt1.dat")
q)t2
name       age
--------------
Methuselah 969
Jared      962
Noah       950


We can hold the data in memory and we can persist it to disk. Conventional databases are in control of the persistence process. In kdb+ the user is in control. This may be taxing but enables him/her to perform some clever optimisations.

Rather than accessing kdb+ remotely (or locally, on localhost) from the browser we can use TCP and another q.exe process. If you start q.exe in another terminal, you can do this:

q)h : hopen :localhost:19123
q)h "select from t1"
name       age
--------------
Methuselah 969
Jared      962
Noah       950
q)hclose h


## Kdb nomenclature: k, Q, etc.

When talking about Kdb, several terms are used, often interchangeably: k, Q, kdb, kdb+... What does each of these mean?

• k is the language that Arthur Whitney wrote after leaving Morgan Stanley. It was first released in 1993, was under exclusive contract to UBS until 1996, when it became generally available.
• ksql was added in 1998 which was a layer on top of k. This is sometimes regarded to be part of k. ksql includes SQL-like constructs, such as select.
• kdb is the in-memory database built on k.
• kdb+ was released in June 2003. This was more or less a total rewrite of kdb for 64 bit systems. It uses k (version 4) to boot and has...
• Q, which is believed to stand for Query language, a language layer on top of k, which is defined in terms of k. Both equivalent Q and k compile to the same byte code that is executed in the kdb+ byte interpreter. Q is like a macro language layer, e.g. type in Q is the equivalent of @: in k. Most kdb+ developers just use Q (not k), as it is more readable.

Many thanks to Charlie Skelton for his explanation.

## Key bindings on Linux

When you run a Windows version of Q, the arrow keys move you up and down the command history, just as they do on a properly set up Linux terminal.

When you try using these keys in a Linux version of Q, you'll see something like

q)^[[A


for "up", which is not particularly helpful.

You can fix this by running q from rlwrap:

/home/pbilokon/bin/rlwrap /home/pbilokon/q/l64/q


Perhaps put this inside a shell script, start_q.sh:

#!/bin/bash

/home/pbilokon/bin/rlwrap /home/pbilokon/q/l64/q $@  To find out what rlwrap does, see http://utopia.knoware.nl/~hlub/uck/rlwrap/ (in brief, it is a "readline wrapper"). ## Does Q have to be unreadable? No. Although Q's creator, Arthur Whitney, believes in expressing logic in code as tersely as possible, you don't have to emulate his style. The following example generates geometric Brownian motion paths: // A list of differences between the elements of a given list, lst. The // length of the result is one less than the length of lst. The result // is // // ((lst[1] - lst[0]) (lst[2] - lst[1]) ... (lst[N-1] - lst[N-2])) // // where N is count lst. // differences : { [lst] lst[0] -': lst[(til ((count lst) - 1)) + 1] } // Generate a single random number sampled from Uniform(0, 1). // // Example: // // uniformRandomVariate[] // // Result: // // (nondeterministic) // // 0.06551717 // uniformRandomVariate : { [] // 1 ? 1f generates a single (1) random number drawn from [0f, 1f] // [0] is required since (1 ? 1f) returns a list. (1 ? 1f)[0] } nextGaussianRandomVariateAvailable : 0; nextGaussianRandomVariate : 0f; // Generate a single random number sampled from Normal(0, 1). // // The implementation uses the polar form of the Box-Muller transform. // // Example: // // gaussianRandomVariate[] // // Result: // // (nondeterministic) // // -1.397263 // gaussianRandomVariate : { [] // Box-Muller transform takes two Uniform(0, 1) random numbers and // produces two Normal(0, 1) random numbers. // Its polar form is both faster and more robust numerically. // Because we generate two Normal(0, 1) random numbers at once and // return only one, we only have to compute them every other time // the function is called. We cache the second random number (the // one we did not return) and return it next time the function is // called, without computing. // This will stay 1 only if we need to compute the numbers. compute : 1; // Do we already have a cached random number available? if [nextGaussianRandomVariateAvailable; // If so, we don't need to compute. compute : 0; result : nextGaussianRandomVariate; nextGaussianRandomVariateAvailable :: 0; ]; // So do we have to compute the random numbers? if [compute; w : 1f; while [w >= 1f; x1 : ((2f * uniformRandomVariate[]) - 1f); x2 : ((2f * uniformRandomVariate[]) - 1f); w : ((x1 * x1) + (x2 * x2)) ]; w : sqrt ((-2f * log[w]) % w); y1 : x1 * w; y2 : x2 * w; nextGaussianRandomVariate :: y2; nextGaussianRandomVariateAvailable :: 1; result : y1; ]; result } exponentialRandomVariate : { [lambda] neg[log[uniformRandomVariate[]]] % lambda } // Parameters: // // drift: drift per unit time, which is 1 year // volatility: per sqrt[unit time], which is sqrt[1 year] // geometricBrownianMotionFactor : { [drift; volatility; dateDifference] deterministicTerm : (drift - ((0.5 * (volatility xexp 2.0)))) * dateDifference; stochasticTerm : volatility * gaussianRandomVariate[] * sqrt[dateDifference]; exp[deterministicTerm + stochasticTerm] } // Simulate a geometric Brownian motion process (GBM), i.e. // // dS_t = \mu S_t dt + \sigma S_t dW_t // // whence // // S_t = S_0 e((\mu - \sigma^2 / 2) t + \sigma \sqrt{t} N_{0, 1}) // // Parameters: // // initialValue: the initial value, S_0 // drift: drift per unit time, which is 1 day (!) // volatility: per sqrt[unit time], which is sqrt[1 day] // dateTimes: a list of datetimes or dates for which the values will be // generated // // Example: // // dateRange : (2008.01.01 2008.01.02 2008.01.03 2008.01.04 2008.01.05) // geometricBrownianMotion[1.5490; 0.003; 0.12; dateRange] // // Result: // // (nondeterministic) // // 1.549 1.5431551 1.5468368 1.5343732 1.5195453 // geometricBrownianMotion : { [initialValue; drift; volatility; dateTimes] drift %: 365.25; volatility %: sqrt[365.25]; dateDifferences : differences[dateTimes]; f : geometricBrownianMotionFactor[drift; volatility]; factors : f':[dateDifferences]; (initialValue), initialValue *\ factors } // Example: // // exponentiallyDistributedDateTimes[5; 2010.06.16T00:00:00.000; 2010.06.18T00:00:00.000] // // Result: // // (nondeterministic) // // 2010.06.16 00:27:46.087 2010.06.16 14:44:15.869 2010.06.17 00:40:05.673 // 2010.06.17 07:42:27.479 2010.06.17 13:32:56.693 // exponentiallyDistributedDateTimes : { [ticksPerDay; beginDateTime; endDateTime] dateTimes : datetime$();

lastDateTime : beginDateTime + exponentialRandomVariate[ticksPerDay];

while[lastDateTime <= endDateTime;
dateTimes ,: lastDateTime;
lastDateTime : lastDateTime + exponentialRandomVariate[ticksPerDay];
];

dateTimes
}

// Example:
//
//     geometricBrownianMotionTimeSeries[1.5490; 0.003; 0.15; 5; 2010.06.16T00:00:00.000; 2010.06.18T00:00:00.000]
//
// Result:
//
//     (nondeterministic)
//
//     dateTime                 dataPoint
//     ----------------------------------
//     2010.06.16 12:36:42.888  1.549
//     2010.06.16 20:18:18.691  1.5622148
//     2010.06.16 21:20:50.682  1.561131
//     2010.06.16 23:10:47.157  1.560705
//     2010.06.17 00:07:32.780  1.5647236
//     2010.06.17 00:53:56.429  1.5667019
//     2010.06.17 01:23:21.514  1.5637483
//     2010.06.17 04:24:04.107  1.5599954
//     2010.06.17 07:04:39.818  1.564712
//     2010.06.17 17:48:14.832  1.5824289
//     2010.06.17 18:07:24.626  1.582003
//
// Example:
//
//     geometricBrownianMotionTimeSeries[1.5490; 0.003; 0.15; 10; 2010.06.16T00:00:00.000; 2011.06.18T00:00:00.000]
//     geometricBrownianMotionTimeSeries[1.5490; 0.003; 0.0003; 10; 2010.06.16T00:00:00.000; 2011.06.18T00:00:00.000]
//
geometricBrownianMotionTimeSeries : { [initialValue; drift; volatility; ticksPerDay; beginDateTime; endDateTime]
timeSeries : getEmptyTimeSeries[];

dateTimes : exponentiallyDistributedDateTimes[ticksPerDay; beginDateTime; endDateTime];
dataPoints : geometricBrownianMotion[initialValue; drift; volatility; dateTimes];

flip [dateTimedataPoint!(dateTimes; dataPoints)]
}

date_range : (2008.01.01 2008.01.02 2008.01.03 2008.01.04 2008.01.05)
geometricBrownianMotion[1.5490; 0.01; 0.12; date_range]


You can save it as random.q and then run

q random.q


You will see something like the following:

1.549 1.560322 1.574135 1.594816 1.622068


## Timing commands

Often you need to know how long a particular command takes to run. The trick is to prepend \t to the command, e.g.

\t select from TestTable


Rather than returning the result of the select, this command will come back with the number of milliseconds spent, e.g.

735


for 735 milliseconds.

There is another way, which does not involve explicit calls to system commands.

startTime : .z.T

// ... Command to be timed ...

endTime : .z.T
duration : "t"$(endTime - startTime) 1 "The command took ", string[duration]  This will display something like The command took 00:00:09.6251  We shall now suggest a more powerful approach to timing. Suppose we have a function that we want to time. The function may take very little time to run, so we call it runCount time in a loop and obtain an average run duration by dividing the total time taken for runCount runs by runCount. In order to obtain more significant results, we shall repeat this procedure trialCount times, noting the average run duration and its standard deviation. Putting this all together, we have timeFunction : { [trialCount; runCount; function] trialDurations : (); do [ trialCount; startTime : .z.T; do [ runCount; function[]; ]; trialDurations ,: ((.z.T - startTime) % runCount); ]; trialDurationAverage : avg[trialDurations]; trialDurationStandardDeviation : sqrt[var[trialDurations]]; (trialDurationAverage, trialDurationStandardDeviation) }  Suppose we want to time something like this: function : { [] (100000?10) * (100000?10) };  We run trialCount : 10; runCount : 100; timeFunction[trialCount; runCount; function];  The result it, say, 3.563 0.06165225  This means that on average function takes 3.563 milliseconds to run, the standard deviation being 0.06165225 for the trials. ## String concatenation Happens naturally: "hello", " ", "world"  The result is "hello world"  Now suppose you want to get "hello123world" from this: d: 123 "hello", d, "world"  You will see "h" "e" "l" "l" "o" 123 "w" "o" "r" "l" "d"  This is because a string is really a list of characters. Joining or concatenating two lists of characters (strings), you get another list of characters (another string). Joining or concatenating a list of characters (string) and an integer, you get a list of characters and integers (not a string at all). Here is one solution: "hello", string[d], "world"  ## String concatenation in query results Suppose you have a table, say SampleTable that looks like this: date sym time bid ask location market -------------------------------------------------------------- 2010.04.26 FOOBAR 00:00:00.408 0.82587 0.82655 NY ABC 2010.04.26 FOOBAR 00:00:00.412 0.82603 0.8264 NY DEF 2010.04.26 FOOBAR 00:00:03.143 0.82592 0.8266 NY ABC 2010.04.26 FOOBAR 00:00:03.162 0.82608 0.82645 NY DEF 2010.04.26 FOOBAR 00:00:08.754 0.82597 0.82665 NY ABC 2010.04.26 FOOBAR 00:00:08.670 0.82613 0.8265 NY DEF 2010.04.26 FOOBAR 00:00:09.118 0.82599 0.82667 NY ABC 2010.04.26 FOOBAR 00:00:09.124 0.82615 0.82652 NY DEF 2010.04.26 FOOBAR 00:00:09.158 0.8262 0.82657 NY DEF 2010.04.26 FOOBAR 00:00:09.175 0.82604 0.82672 NY ABC  Suppose that you would like to select location and market as one field, contributor. Here is how you can achieve this: select contributor: "_" sv/: flip string(location;market) from SampleTable  The results look as follows: contributor ----------- "NY_ABC" "NY_DEF" "NY_ABC" "NY_DEF" "NY_ABC" "NY_DEF" "NY_ABC" "NY_DEF" "NY_DEF" "NY_ABC"  We would like to thank Attila Vrabecz for this tip. ## Display more results on the screen Suppose that you issue a query cols SomeTable  You may see something like onetwothreefourfivesixseveneightnineteneleventwelvethirteenfift..  Of course, instead of one, two, three, ... you will see some column names. But the point is that the result is cut short by the "..". Perhaps you want to see the rest of the column names. What can you do? Try this: \c  You will see something like 25 80  This means that the output is trimmed to 25 rows, 80 columns. You can override this: \c 25 150  and re-issue the query cols TimeTable  Now you should see more output. ## Listing a table's columns cols MarketDepth  where MarketDepth is the table name. ## Connecting to a remote Kdb+ server Suppose we are connecting to host myhost01, port 8011. Run the following in q: h : hopen :myhost01:8011  If you are connecting to localhost, you could use h : hopen ::8011  You can now execute queries on h like this: h "select distinct sym from MarketDepth where date=2006.06.01"  You should see some output: sym ------ EURUSD USDJPY ..  Once you are done, you can close the handle: hclose h  ## Loading data from a remote server and saving it locally Start q. Check the current directory: \cd  You may wish to change it to a pre-created output directory: \cd /home/pbilokon/tmp  First define the date range we shall be working on. Say, it's 2008.06.01 to 2008.08.31 inclusive: firstdate: 2008.06.01 lastdate: 2008.08.31 daterange: firstdate + til (lastdate - firstdate) + 1  Open a handle: h: hopen:myhost01:8012  We shall read and write the data day by day. Let's define the following two functions: readdata:{[d] h(raze "time xasc select from MarketDepth where date=", string d, ", sym in GBPUSDEURUSD")} readandwritedata:{[d] MarketDepth::readdata[d]; .Q.dpft[:.;d;sym;MarketDepth]}  readdata reads one day (specified by d) worth of data. readandwritedata calls readdata and saves the results to disk (local directory as a splayed table). You can try applying this to a single day: readandwritedata[2008.06.01]  Or process the entire daterange: readandwritedata each daterange  Close the handle: hclose h  ## Creating and populating a simple keyed table with time series identifiers (sym), dates (date) and values (val) TestTable : ([sym:symbol$(); date:date$()] val:float$())


You can inspect the resulting empty keyed table by simply issuing

TestTable


It looks like this:

sym date| val
--------| ---


We can now upsert our first value:

upsert [TestTable] (EURUSD; 2009.02.10; 1.2874)


On issueing

TestTable


you will now see

sym    date      | val
-----------------| ------
EURUSD 2009.02.10| 1.2874


You can also

select from TestTable


and you will see the same

sym    date      | val
-----------------| ------
EURUSD 2009.02.10| 1.2874


Is it possible to upsert several rows at once? Yes, but you should be careful with the syntax. Note that there is no space in "USDJPYARSRUB".

upsert [TestTable] ([sym: USDJPYARSRUB; date: 2009.02.10 2009.02.11]; val: 1.2874 10.252969)


The simplest thing to do is to save the table to a file:

:H:/somefile.dat set TestTable


This will save the table TestTable to a binary file named H:/somefile.dat.

To load it into a new table, use

AnotherTestTable : get :H:/somefile.dat


## Truncating a table

Truncating a table means deleting all data from it, which is of course not the same as deleting the table itself.

delete from TestTable


## Dropping a table

Dropping a table amounts to deleting it from the namespace. Assuming that your table is in the root namespace (.), we can delete it as follows:

delete TestTable from .


## Turning a non-keyed table into a keyed table

Suppose that you have created a non-keyed table:

TestTable : ([] sym:symbol$(); date:date$(); val:float$())  And inserted/upserted some data: upsert [TestTable] (EURUSD; 2009.02.10; 1.2874) upsert [TestTable] (EURUSD; 2009.02.10; 1.2874) upsert [TestTable] (EURUSD; 2009.02.11; 1.2901)  So that you can check by entering TestTable  that the table looks like this: sym date val ------------------------ EURUSD 2009.02.10 1.2874 EURUSD 2009.02.10 1.2874 EURUSD 2009.02.11 1.2901  How can you turn it into a keyed table? By entering symdate xkey TestTable  Then you can check by entering TestTable  that the table looks like this: sym date | val -----------------| ------ EURUSD 2009.02.10| 1.2874 EURUSD 2009.02.10| 1.2874 EURUSD 2009.02.11| 1.2901  Note, however, that even though TestTable is now a keyed table, it contains duplicates, which would have to be removed manually. ## Turning a keyed table into a non-keyed table If TestTable is your keyed table, you can modify it to remove the key as follows: () xkey TestTable  Note that the  symbol tells Kdb+ to modify the existing TestTable. Alternatively, if you would like to leave TestTable as it is and create a non-keyed version TestTable1 in addition to it, you can use TestTable1 : () xkey TestTable  ## Dealing with duplicates First, let us create a table for our experiments: TestTable : ([] sym:symbol$(); date:date$(); val:float$())
upsert [TestTable] (EURUSD; 2009.02.10; 1.2874)
upsert [TestTable] (EURUSD; 2009.02.10; 1.2874)
upsert [TestTable] (USDJPY; 2009.02.11; 1.2901)
upsert [TestTable] (FOOBAR; 2009.02.10; 2.3576)
upsert [TestTable] (EURUSD; 2009.02.10; 1.2875)
upsert [TestTable] (EURUSD; 2009.02.11; 1.3000)
upsert [TestTable] (USDJPY; 2009.02.11; 1.2902)


Try

show TestTable


to see what it looks like:

sym    date       val
------------------------
EURUSD 2009.02.10 1.2874
EURUSD 2009.02.10 1.2874
USDJPY 2009.02.11 1.2901
FOOBAR 2009.02.10 2.3576
EURUSD 2009.02.10 1.2875
EURUSD 2009.02.11 1.3
USDJPY 2009.02.11 1.2902


We define duplicate rows as those that have both sym and date values equal.

To count the duplicates, use

select count i by sym, date from TestTable


(Notice that here by sym, date expresses our definition of duplicates.)

Here is the result:

sym    date      | x
-----------------| -
EURUSD 2009.02.10| 3
EURUSD 2009.02.11| 1
FOOBAR 2009.02.10| 1
USDJPY 2009.02.11| 2


Well, the rows with x = 1 don't really correspond to duplicates. We can identify duplicates as follows:

select sym, date, x from (select count i by sym, date from TestTable) where x > 1


This will show

sym    date       x
-------------------
EURUSD 2009.02.10 3
USDJPY 2009.02.11 2


How can we remove the duplicates?

select by sym, date from TestTable


will produce

sym    date      | val
-----------------| ------
EURUSD 2009.02.10| 1.2875
EURUSD 2009.02.11| 1.3
FOOBAR 2009.02.10| 2.3576
USDJPY 2009.02.11| 1.2902


which is a keyed table with the duplicates removed (only the last row survives, so (EURUSD, 2009.02.11) is, deterministically, 1.3, not 1.2874).

We can save this result as, for example, TestTableWithoutDuplicates:

TestTableWithoutDuplicates : select by sym, date from TestTable


If we don't want a keyed table, we can remove the key:

() xkey TestTableWithoutDuplicates

show TestTableWithoutDuplicates


now gives

sym    date       val
------------------------
EURUSD 2009.02.10 1.2875
EURUSD 2009.02.11 1.3
FOOBAR 2009.02.10 2.3576
USDJPY 2009.02.11 1.2901


Suppose we want to be a little bit more careful and work with indices of the duplicates. Moreover, suppose we wish to remove the duplicates from the original TestTable instead of creating a new TestTableWithoutDuplicates. We can proceed as follows.

select from (select duplicateIndices : i, lastDuplicateIndex : last i, duplicateCount : count i by sym, date from TestTable) where duplicateCount > 1


This will produce

sym    date      | duplicateIndices lastDuplicateIndex duplicateCount
-----------------| --------------------------------------------------
EURUSD 2009.02.10| 0 1 4            4                  3
USDJPY 2009.02.11| 2 6              6                  2


Notice that in the duplicateIndices column we have lists, not scalars. Of course, we can do this in k.

Now let's modify this slightly. We don't want to have the last index (which appears in lastDuplicateIndex anyway) under duplicateIndices. That's because we want to keep it when we finally come to removing duplicates.

We observe that for any list l

(count l - 1)#l


will return a copy of the list with the last element removed. Thus we can modify our query as follows:

select from (select duplicateIndices : (count[i] - 1)#i, lastDuplicateIndex : last i, duplicateCount : count i by sym, date from TestTable) where duplicateCount > 1


The result is now

sym    date      | duplicateIndices lastDuplicateIndex duplicateCount
-----------------| --------------------------------------------------
EURUSD 2009.02.10| 0 1              4                  3
USDJPY 2009.02.11| ,2               6                  2


just as required.

We want to remove the duplicates. We don't want the entire table above, so let's modify the query again:

select duplicateIndices from (select duplicateIndices : (count[i] - 1)#i, lastDuplicateIndex : last i, duplicateCount : count i by sym, date from TestTable) where duplicateCount > 1


This gives us a table of lists,

duplicateIndices
----------------
0 1
,2


But we don't need a table of lists. A list of lists will do, and we can obtain it simply by replacing select with exec:

indicesToRemove : (exec duplicateIndices from (select duplicateIndices : (count[i] - 1)#i, lastDuplicateIndex : last i, duplicateCount : count i by sym, date from TestTable) where duplicateCount > 1)


Now indicesToRemove is a list of lists:

q)indicesToRemove
0 1
,2


But we need a simple list of indices, (0 1 2). How can we obtain it?

q)() ,/ indicesToRemove
0 1 2


What exactly did we do? The over adverb (/) modifies a base dyadic function (in our case join (,)) so that the items of the second argument are applied iteratively to the first argument. (Note: There cannot be any whitespace between / and the function it modifies.)

Thus we effectively compute

((),(0 1)),(2)


There is an alternative:

q)raze indicesToRemove
0 1 2


We shall overwrite indicesToRemove with this more convenient form:

q)indicesToRemove : raze indicesToRemove
q)indicesToRemove
0 1 2


Now we are ready to remove the duplicates from TestTable:

q)delete from TestTable where i in indicesToRemove
TestTable


(Note: Don't forget the "" in delete from TestTable or else TestTable won't be modified, a new table will be constructed.)

Done:

q)show TestTable
sym    date       val
------------------------
FOOBAR 2009.02.10 2.3576
EURUSD 2009.02.10 1.2875
EURUSD 2009.02.11 1.3
USDJPY 2009.02.11 1.2902


## Creating queries dynamically

You can construct your query by concatenating strings and then execute it. Here is an example. Let's define our usual

TestTable : ([] sym:symbol$(); date:date$(); val:float$())  We shall now create a select query where we can vary the column to be selected. Let's set columnToBeSelected : "sym"  and evaluate value("select ", columnToBeSelected, " from TestTable")  This selects the (currently empty) sym column: sym ---  What if we set columnToBeSelected to something else? columnToBeSelected : "date" value("select ", columnToBeSelected, " from TestTable")  gives date ----  This is not quite satisfactory yet. We would like to abstract away the process of query creation and execution. For this purpose we define a function which takes two parameters, columnName and tableName: selectColumn : {[columnName; tableName] value("select ", columnName, " from ", tableName) }  We can now execute selectColumn["sym"; "TestTable"]  to select column sym from TestTable. We'll see sym ---  Similarly we can run selectColumn["date"; "TestTable"]  Et cetera. ## Accessing command line arguments from scripts Suppose we create a script called test.q which we shall call as C:\q\q test.q Paul Bilokon 1982  We would like to access the command line arguments ("Paul", "Bilokon" and 1982) from the script. Here's what we could put in test.q: name : .z.x 0; surname : .z.x 1; dob : .z.x 2; name surname "I"$dob


This will simply display our command line arguments:

"Paul"
"Bilokon"
1982


Note that "I"$dob is the syntax for parsing the dob string to an integer. ## Transferring a table from a remote server to a local server Connect to a remote server from the local server: h:hopen:someremotehost.com:19111  If you have to specify a username and password, use this instead: h:hopen (:someremotehost.com:19111:mylogin:mypassword;10000)  where 10000 stands for 10 second timeout. Assuming that the remote table is named TestTable and you want to copy it to a local table TestTableCopy, you can then run TestTableCopy : h "TestTable"  locally. If you only want the first 100000 rows of TestTable in TestTableCopy, you can do TestTableCopy : h "100000#TestTable"  Oh, and don't forget to hclose h  ## Converting a string to a symbol b : "foo" $b


This gives

foo


## Appending to a list of strings

We have a list of strings:

L : ("foo"; "bar"; "baz")


How do we append another string, say "abc", to it?

L ,: "abc"


This gives

"foo"
"bar"
"baz"
"abc"


## Logging and debug output

It is often very useful to see the intermediate results of a function to get a good idea of what it does. Let us write a simple recursive factorial function to illustrate this:

factorial : { [n]
if [n = 0;
result : 1;
];

if [n > 0;
result : n * factorial[n - 1];
];

result
}


We can convince ourselves that the function is working as required:

q)factorial[0]
1
q)factorial[1]
1
q)factorial[2]
2
q)factorial[3]
6
q)factorial[4]
24
q)factorial[5]
120


Now, let's change this function to display the intermediate results:

factorial : { [n]
if [n = 0;
result : 1;
];

if [n > 0;
result : n * factorial[n - 1];
];

0N!result;
result
}


Let's test this:

q)a : factorial[8]
1
1
2
6
24
120
720
5040
40320
q)a
40320
q)


The line that does the logging is

        0N!result;


This sends the value of result to STDOUT.

We could apply this to strings:

        0N!"hello world"


or more complex objects, such as lists:

        0N!(1;2;"foo")


## Returning tables from functions

We want to create, populate, and return a table from a function. Here is our first attempt (broken!):

fooBroken : { []
t : ([] sym:symbol$(); date:date$(); val:float$()); upsert [t] (yo; 2009.01.01; 3.30); t }  When we try calling fooBroken we get this: q)fooBroken[] sym date val ------------  Somehow our upsert didn't work. The reason why it didn't work is this. You cannot refer to locals by name in K, so t in this upsert wouldn't work. What did it do? Well, it referred to a global variable t, whereas the local one, the one we returned, remained unchanged. The upsert worked on that global variable, which you can inspect as follows: q)t yo 2009.01.01 3.3  This t was created in the default namespace as a side effect: q)\v ,t  This is not quite what we wanted. There are several options. One is to use global variables: fooUsingGlobals : { [] t set ([] sym:symbol$(); date:date$(); val:float$());
upsert[t] (yo; 2009.01.01; 3.30);
:t
}


This does work:

q)fooUsingGlobals[]
sym date       val
------------------
yo  2009.01.01 3.3


But it also has a side effect (it creates a global t):

q)\v
,t


There is another solution. We cannot refer to a local by name, but we can assign to it:

fooUsingLocals : { []
t : ([] sym:symbol$(); date:date$(); val:float$()); t:upsert [t] (yo; 2009.01.01; 3.30); :t }  This works: q)fooUsingLocals[] sym date val ------------------ yo 2009.01.01 3.3  without the aforementioned side effect: q)\v symbol$()


Finally, we don't have to re-assign the local t. We could change fooUsingLocals (let's call the new version fooUsingLocals1) to use ,:t doesn't work on locals, whereas ,: does:

fooUsingLocals1 : { []
t : ([] sym:symbol$(); date:date$(); val:float$()); t ,: (yo; 2009.01.01; 3.30); :t }  ## How to capture the result of the system commands \t, \v, etc., into a variable \t, \v, etc., are semantically "special". For example, this won't work: timeTaken : \t select Foo from SomeTable  There is a workaround: use value: timeTaken : value "\\t select Foo from SomeTable"  Notice that there are two slashes above, not one. timeTaken is now set to the number of milliseconds it has taken to execute select Foo from SomeTable, as required. ## Loops with indices You may miss the conventional for (int i = 0; i < 5; ++i) { // Iterated action involving i cout << i << endl; } in Q. One way to achieve the above is i : 0; do [5; // Iterated action involving i 0N!i; i : i + 1 ];  The code in both language will result in 0 1 2 3 4  on the display. ## Passing a function as an argument to another function First, let us define our function: myFunctor : { [] 2 * x }  Next, we shall define another function, which takes a function as its argument and applies it to 5: myEvaluator : { [func] func[5] }  The result of myEvaluator[myFunctor]  is 10  as expected. ## Projections of a function of one argument; deferred function execution Suppose that you have a function of a single argument: someFunction : { [n] (n ? 10) * (n ? 10) }  It may make sense to project it on a given argument. Even though the resulting function will have zero arguments, and the values of all the arguments will be known, you may still want to use it as a functor, you may want to defer its execution. Suppose that we want to time the execution of someFunction using our timing routine: timeFunction : { [trialCount; runCount; function] trialDurations : (); do [ trialCount; startTime : .z.T; do [ runCount; function[]; ]; trialDurations ,: ((.z.T - startTime) % runCount); ]; trialDurationAverage : avg[trialDurations]; trialDurationStandardDeviation : sqrt[var[trialDurations]]; (trialDurationAverage, trialDurationStandardDeviation) }  Notice that timeFunction expects its argument function to take zero arguments, as it executes function[]. How can we use timeFunction to time the execution of someFunction? Well, if someFunction took at least two arguments, we could project: q)someFunction : { [n; dummy] (n ? 10) * (n ? 10) } q)someFunction[10000] {[n; dummy] (n ? 10) * (n ? 10) }[10000]  This is one solution, then: amend someFunction to take a second dummy argument. We can then pass it to timeFunction as follows: q)timeFunction[10; 100; someFunction[10000]] 0.344 0.06069596  But what if we don't want to (or can't) amend someFunction? There is another solution. Keep someFunction as it was, someFunction : { [n] (n ? 10) * (n ? 10) }  but introduce the following helper: project1 : { [function; x; dummy] function[x] }  Now we obtain the following projection: q)project1[someFunction, 10000] {[function; x; dummy] function[x] }[({[n] (n ? 10) * (n ? 10) };10000)]  Which we can pass to timeFunction: q)timeFunction[10; 100; project1[someFunction; 10000]] 0.343 0.06116371  ## Sampling with and without replacement This is straightforward. If we want 5 random indices from 0, 1, 2, 3, ..., 10, with replacement, we do this: q)5?10 2 2 8 1 8  Notice the duplicates. If, on the other hand, we would like to sample without replacement, we do this: q)-5?10 1 3 6 4 7  Now there are no duplicates. Notice the minus sign above. The results of the sampling come out of order. We can easily sort them by applying asc: q)asc[-5?10] s#3 4 6 7 8  Notice that the attribute s# has been applied, which indicates that the list is sorted. We could put these random indices to good use. For example, we could use them to index another list: q)foo : (100 200 300 400 500 600 700 800 900 1000) q)foo[asc[-5?10]] 100 400 800 900 1000 q)foo[asc[-2?10]] : (123 456) q)foo 123 200 300 400 500 456 700 800 900 1000  ## Unary - In K, the arithmetic operators are always dyadic, so - does not act as a unary function: q)foo : 5 q)-foo '-  You should use neg: q)neg foo -5 q)foo 5  (neg does not change foo itself.) However, -5 is fine as a literal: q)foo : -5 q)foo -5  ## Setting multiple elements of a list to the same value q)l : (100 200 300 400 500 600 700 800 900 1000) q)l 100 8 8 400 500 600 700 800 900 1000 q)l[1 2] : 5 q)l 100 5 5 400 500 600 700 800 900 1000 q)l[(4 + (til 3)), (8 9)] : 8 q)l 100 5 5 400 8 8 8 800 8 8  ## Pre-initialising lists If you know the size of your list you can preinitialise it like so: q)l : 10#0N q)l 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N  We have created a list of 10 elements, each set to 0N (the null integer). We could have set them to 1, for example: q)l : 10#1j q)l 1 1 1 1 1 1 1 1 1 1j  ## Null There is a "null" in K/Q. It is 0N: q)foo : 0N q)foo 0N q)foo * 5 0N  ## Copy or reference? q)foo : (1 2 3) q)bar : foo q)bar[0] : 357 q)foo 1 2 3 q)bar 357 2 3  bar changed, foo didn't. Let's investigate further. q)foo : (1 2 3) q)changeFoo : { [someFoo] someFoo[0] : 357 } q)changeFoo[foo] q)foo 1 2 3  So foo didn't change. ## Telling Q to exit when the script execution is over If you load a script q myscript.q  q will not exit by default when it's done. It will hang in there. The trick is to add the exit command \\  to the end of your script. This is very useful if you are chaining scripts, etc. ## Applying a function to each element of a list Suppose that we have q)l : (1*1; 2*2; 3*3; 4*4; 5*5; 6*6; 7*7; 8*8)  In other words, q)l 1 4 9 16 25 36 49 64  What if we want to take the square root of each element? q)sqrt l 1 2 3 4 5 6 7 8f  This works. We gave sqrt a list to work on, and the result is a list of square roots. Can we take type of each element in a similar manner? q)type l 6h  No, we got the type of l itself. Not of its element. We can use each: q)each[type] l -6 -6 -6 -6 -6 -6 -6 -6h  That worked. Of course, we can also use q)each[sqrt] l 1 2 3 4 5 6 7 8f  to take the square root of each element. This is more explicit and leads to fewer surprises. ## Exploring \v \v will give us all the variables in the default namespace: q)\v ProfilingResultsagapProbabilityllastDatenamesnextStandardGaussianRando..  Some of these may be symbols, others integers, still others functions... How can we tell them apart? We need to use type. But first we need to capture \v into something we can work with — a list — using value: q)v : value "\\v" q)v ProfilingResultsagapProbabilityllastDatenamesnextStandardGaussianRando..  We can now access all of these... q)v[0] ProfilingResults  But what are they, what are their types? q)type v[0] -11h  Symbols. Well, ProfilingResults is actually a table; ProfilingResults is a symbol. How can we get at the type of ProfilingResults? Use value again: q)type[value[v[0]]] 98h  And if we want the types of everything in v: q)each[type] each[value] v 98 -11 -9 6 -14 11 -9 -6 -6 11 -6 -6 -6 99h  ## Displaying a table's schema First let us create and populate a table: q)SomeTable : ([] foo:float$(); bar:float$()) q)SomeTable ,: (0.3 0.5)  How can we figure out what SomeTable's schema is given SomeTable? q)0N!0#SomeTable; +foobar!(float$();float$())  0#t gives you the schema. The above will print it in k format so you can see all the types. ## Working with lists Let's create an example list. We are going to play with it now. q)l : (10 20 30 40 50)  How many elements does it have? q)count l 5  What is the first element? q)first l 10  What is the last element? q)last l 50  What is the first element again? (Notice that the following is very similar to first l only a list of a single element is returned, not the actual element.) q)1#l ,10  What is the last element again? (Notice that the following is very similar to last l only a list of a single element is returned, not the actual element.) q)-1#l ,50  What are the first two elements? q)2#l 10 20  What are the first three elements? q)3#l 10 20 30  What are the last two elements? q)-2#l 40 50  What are the last three elements? q)-3#l 30 40 50  A list of all l's elements, excluding the last one: q)(count[l] - 1)#l 10 20 30 40  A list of all l's elements, excluding the last two: q)(count[l] - 2)#l 10 20 30  A list made up of l's first, second, and fourth elements: q)l[0 1 3] 10 20 40  Or q)indexList : (0 1 3) q)l[indexList] 10 20 40  Let's create a copy of the list but exclude all occurrences of 20: q)l except 20 10 30 40 50  Or perhaps we want to exclude all occurrences of 20 and 30: q)l except 20 30 10 40 50  What is the sum of the elements? There are several ways to compute this: q)sum l 150  Or q)0 +/ l 150  Here the over adverb (/) modifies a base dyadic function plus (+) so that the items of the second argument are applied iteratively to the first argument. Thus we are computing ((((0 + 10) + 20) + 30) + 40) + 50  Along the same lines we can compute the product of the elements: q)1 */ l 12000000  which amounts to ((((1 * 10) * 20) * 30) * 40) * 50  Say we want to add two lists elementwise: q)l1 : (50 40 30 20 10) q)l + l1 60 60 60 60 60  Similarly we can multiply two lists elementwise: q)l * l1 500 800 900 800 500  Subtract: q)l - l1 -40 -20 0 20 40  And divide: q)l % l1 0.2 0.5 1 2 5  What if we want to obtain a list of all elements of l negated? q)each[neg] l -10 -20 -30 -40 -50  Or perhaps we want a list of square roots of l's elements: q)each[sqrt] l 3.162278 4.472136 5.477226 6.324555 7.071068  ## Sorting tables Consider the following q)TestTable : ([] foo:symbol$(); bar:symbol$()) q)insert [TestTable] (k; a) q)insert [TestTable] (k; z) q)insert [TestTable] (k; l) q)insert [TestTable] (a; a) q)insert [TestTable] (a; b) q)insert [TestTable] (a; a) q)show TestTable  So we have foo bar ------- k a k z k l a a a b a a  If we want to sort this table first by bar, then by foo, i.e. the table will be sorted by foo and within each foo value bar will be sorted, we can use q)SortedTestTable : foobar xasc TestTable q)show SortedTestTable  This will produce foo bar ------- a a a a a b k a k l k z  But the original TestTable hasn't changed: q)show TestTable foo bar ------- k a k z k l a a a b a a  In order to sort it in place, use q)foobar xasc TestTable TestTable q)show TestTable foo bar ------- a a a a a b k a k l k z  Of course, the order foobar versus barfoo matters: q)barfoo xasc TestTable TestTable q)show TestTable foo bar ------- a a a a k a a b k l k z  ## Types of list Let's declare some lists: q)K : () q)L : int$()
q)M : symbol$() q)N : char$()


And have a look at them:

q)K
q)L
int$() q)M symbol$()
q)N
""


And their types:

q)type K
0h
q)type L
6h
q)type M
11h
q)type N
10h


K is special in the sense that its type hasn't been pre-declared. Currently its type is 0h. It can become a list of any type. That will happen when the first element is added:

q)K ,: 357
q)K
,357
q)type K
6h


Now the type of K is 6h (same as that of L). It is now a list of integers (L has been a list of integers from the start). We can't add anything but integers to K (from now on):

q)K ,: foo
'type


It's like trying to add a non-integer to L or a non-symbol to M, etc. We get the 'type error:

q)L ,: foo
'type
q)M ,: 357
'type


But of course we could have declare K as we did and then add a symbol to it:

q)K : ()
q)K ,: foo
q)type K
11h
q)K
,foo
q)K ,: 357
'type


In this case K becomes a list of symbols (on adding foo, the first element) and we can't add anything but symbols to it.

## Byte representation

It is possible to obtain the byte representation of K data structures:

q)a : 1 2 3 4 5
q)-8!a
0x01000000220000000600050000000100000002000000030000000400000005000000


Thus the variable a is 34 bytes of data (byte 1: 0x01, byte 2: 0x00, byte 3: 0x00, byte 4: 0x00, byte 5: 0x22, ...).

In this example a is a list, but -8!... works for many other things:

q)TestTable : ([sym:symbol$(); date:date$()] val:float$()) q)upsert [TestTable] (EURUSD; 2009.02.10; 1.2874) TestTable q)-8!TestTable 0x0100000059000000636200630b000200000073796d0064617465000000020000000b0001000..  These are the actual bytes that would be piped through a channel if you were to get the data from Kdb+ remotely. However, this does not necessarily represent the way these data structure are stored in memory by Kdb. ## Determining how much memory a given data structure occupies We have already discussed byte representations, which you can obtain by using -8!.... They also provide a way of obtaining the size of a data structure in memory: q)TestTable : ([sym:symbol$(); date:date$()] val:float$())
q)upsert [TestTable] (EURUSD; 2009.02.10; 1.2874)
TestTable
q)(count -8!TestTable) % 1.5
59.33333


We are dividing the result of count -8!TestTable by 1.5 as the result is a better estimate of the number of bytes used by Kdb+ to represent the data table in memory. (Thanks to Manish Patel for this empirical finding.) This is still an estimate.

Another, more exact, way of determining how much memory a data structure occupies is by using

q)\w
118048 67634176 0 0j


We can compare the values before and after:

q)\w
117376 67634176 0 0j
q)TestTable : ([sym:symbol$(); date:date$()] val:float$()) q)upsert [TestTable] (EURUSD; 2009.02.10; 1.2874) TestTable q)\w 117984 67634176 0 0j q)upsert [TestTable] (USDJPY; 2009.02.10; 1.2874) TestTable q)\w 117984 67634176 0 0j q)upsert [TestTable] (GBPUSD; 2009.02.10; 1.2874) TestTable q)\w 118048 67634176 0 0j  This value is exact. We can see that a table with a single entry has resulted in 608 bytes of memory being occupied, which is significantly different from our estimate above. This is probably due to the meta-data and key, which occupy a lot of memory in relation to the table's size. ## Joining strings with a given separator Suppose you have a list of strings: q)l : ("foo";"bar";"baz")  And you want to turn it into a comma-separated string: q)"," sv l "foo,bar,baz"  sv is a built-in function; the name stands for "string from vector". ## Profiler nop : { [] } one : { [] 1 } double : { [x] 2 * x } add : { [x; y] x + y }  // \d .profiler preCall : { [functionName] ProfilingResults ,: (CALL, functionName, .z.T); } postCall : { [functionName] ProfilingResults ,: (CALL, functionName, .z.T); } getFunctionParameters : { [functionName] value[value[functionName]][1] } getFunctionParametersAsString : { [functionName] ";" sv string[getFunctionParameters[functionName]] } wrapFunction : { [functionName] functionNameString : string[functionName]; wrappedFunctionNameString : functionNameString, "_profiled"; parametersString : getFunctionParametersAsString[functionName]; renameFunctionCode : wrappedFunctionNameString, " : ", functionNameString; value renameFunctionCode; preCallProfilingCode : "preCall[", functionNameString, "]"; postCallProfilingCode : "postCall[", functionNameString, "]"; wrapFunctionCode : functionNameString, " : { [", parametersString, "] ", preCallProfilingCode, "; result : ", wrappedFunctionNameString, "[", parametersString, "]; ", postCallProfilingCode, "; result }"; value wrapFunctionCode; wrapFunctionCode; } // \d . // Retrieve the function names from the global namespace functionNames : value "\\f" ProfilingResults : ([] action:symbol$(); functionName:symbol$(); time:time$() )

wrapFunction each (onedoubleaddnop);


## Sorting one list by the corresponding elements of another list

Suppose that we have two lists:

L1 : 0.45 0.34 0.12 0.43 0.32
L2 : 3    3    2    1    8


We would like to sort L1 by the corresponding elements of L2. In other words, we want to get the result

0.43 0.12 0.45 0.34 0.32


because the corresponding elements of L2 would then appear in ascending order:

1    2    3    3    8


iasc achieves precisely that:

q)L1@iasc L2
0.43 0.12 0.45 0.34 0.32


How does this work? iasc sorts L2 but returns the original indices of its elements instead of the elements themselves:

q)iasc L2
3 2 0 1 4


Then @ produces the elements of L1 at these indices:

q)L1@(3 2 0 1 4)
0.43 0.12 0.45 0.34 0.32


## Zeroing out the seconds and milliseconds in a date

Use

getDateMinute : { [dateTime]
dateTime - ((time$dateTime) mod 60000) }  For example: getDateMinute[2010.04.23T01:02:03.456]  produces 2010.04.23 01:02:00.000  ## Left join Suppose we would like to filter one table by another table. This is best illustrated by a toy example: PeopleTable : ( [] name:symbol$(); surname:symbol$() ) PeopleTable ,: (John; vonNeumann) PeopleTable ,: (Stephen; Hawking) PeopleTable ,: (Daniel; Bernoulli)  MonthsOfBirthTable : ( [] name:symbol$(); surname:symbol$(); month:symbol$() )
MonthsOfBirthTable ,: (John; Nash; June)
MonthsOfBirthTable ,: (John; vonNeumann; December)
MonthsOfBirthTable ,: (Stephen; Smale; July)
MonthsOfBirthTable ,: (Stephen; Hawking; January)
MonthsOfBirthTable ,: (Daniel; Bernoulli; February)
MonthsOfBirthTable ,: (Jacob; Bernoulli; December)


Now we would like to select from MonthsOfBirthTable all the entries which correspond to people in PeopleTable.

The following works:

PeopleTable lj (namesurname xkey MonthsOfBirthTable)


producing the following result:

name    surname    month
---------------------------
John    vonNeumann December
Stephen Hawking    January
Daniel  Bernoulli  February


Note that we had to create a key for the right-hand table (MonthsOfBirthTable) with xkey. If our MonthsOfBirthTable were a keyed table in the first place, i.e.

MonthsOfBirthTable : ( [name:symbol$(); surname:symbol$()] month:symbol$() )  we could get away with PeopleTable lj MonthsOfBirthTable  If the right-hand table is in memory, key creation is fairly inexpensive. If it is stored on disk (partitioned and splayed, thus not keyed) then it is wise to avoid referencing the entire table in one go. It is better to partition the join appropriately (e.g. by date, as is usually the case) so each partition is mapped into memory. What happense if we have multiple rows per person in the right-hand table? Consider the following example. As before, PeopleTable : ( [] name:symbol$(); surname:symbol$() ) PeopleTable ,: (John; vonNeumann) PeopleTable ,: (Stephen; Hawking) PeopleTable ,: (Daniel; Bernoulli)  Now we would like to select all rows corresponding to those in PeopleTable from the following table: ContributionsTable : ( [] name:symbol$(); surname:symbol$(); contribution:symbol$() )
ContributionsTable ,: (John; Nash; EquilibriumPointsInNPersonGames)
ContributionsTable ,: (John; Nash; TwoPersonCooperativeGames)
ContributionsTable ,: (John; Nash; RealAlgebraicManifolds)
ContributionsTable ,: (John; vonNeumann; GameTheory)
ContributionsTable ,: (John; vonNeumann; CellularAutomata)
ContributionsTable ,: (John; vonNeumann; UniversalConstructor)
ContributionsTable ,: (Stephen; Smale; SphereEversion)
ContributionsTable ,: (Stephen; Smale; PoincareConjectureForDimensionsGreaterThanOrEqualToFive)
ContributionsTable ,: (Stephen; Smale; SmaleHorseshoe)
ContributionsTable ,: (Stephen; Hawking; SingularitiesInGeneralRelativity)
ContributionsTable ,: (Stephen; Hawking; HawkingRadiation)
ContributionsTable ,: (Daniel; Bernoulli; CompoundMotions)
ContributionsTable ,: (Daniel; Bernoulli; Hydrodynamique)
ContributionsTable ,: (Jacob; Bernoulli; TranscendentalCurves)
ContributionsTable ,: (Jacob; Bernoulli; Isoperimetry)
ContributionsTable ,: (Jacob; Bernoulli; SeparableDifferentialEquations)


If we try, as before,

PeopleTable lj (namesurname xkey ContributionsTable)


But this will only return the first row from ContributionsTable for John von Neumann, Stephen Hawking, and Daniel Bernoulli:

name    surname    contribution
---------------------------------------------------
John    vonNeumann GameTheory
Stephen Hawking    SingularitiesInGeneralRelativity
Daniel  Bernoulli  CompoundMotions


Whereas we want all the rows. One way to achieve this is to xgroup ContributionsTable by name and surname first:

namesurname xgroup ContributionsTable

name    surname   | contribution
------------------| --------------------------------------------------------------------------------------
John    Nash      | EquilibriumPointsInNPersonGamesTwoPersonCooperativeGamesRealAlgebraicManifolds
John    vonNeumann| GameTheoryCellularAutomataUniversalConstructor
Stephen Smale     | SphereEversionPoincareConjectureForDimensionsGreaterThanOrEqualToFiveSmaleHorseshoe
Stephen Hawking   | SingularitiesInGeneralRelativityHawkingRadiation
Daniel  Bernoulli | CompoundMotionsHydrodynamique
Jacob   Bernoulli | TranscendentalCurvesIsoperimetrySeparableDifferentialEquations


Then do the left join. Then ungroup it back. All in one go:

ungroup (PeopleTable lj (namesurname xgroup ContributionsTable))


Which gives us

name    surname    contribution
---------------------------------------------------
John    vonNeumann GameTheory
John    vonNeumann CellularAutomata
John    vonNeumann UniversalConstructor
Stephen Hawking    SingularitiesInGeneralRelativity
Daniel  Bernoulli  CompoundMotions
Daniel  Bernoulli  Hydrodynamique


as required.

We would like to thank Daniel Lister for his comments.

## Functional select

Here we shall give some examples.

Remember that the functional select is

?[t;c;b;a]


where t is the table, c the contraints, b the by-phrase, and a the aggregations.

### Example 0

?[QuotesTable; enlist 1b; 0b; cols[QuotesTable]!cols[QuotesTable]]


or the simpler

?[QuotesTable; enlist 1b; 0b; ()]


is equivalent to

select from QuotesTable


### Example 1

select from QuotesTable where sym=AUDCAD


is equivalent to

?[QuotesTable;(enlist (=; sym; enlist AUDCAD)); 0b; ()]


If the type of the column is symbol$(), then sym=AUDCAD corresponds to (enlist (=; sym; enlist AUDCAD)) above. Otherwise, we would use something like (enlist (=; bid; 1.55)) (without the second enlist). ### Example 2 This is a small variation on the previous example. We still want to select from QuotesTable where sym=AUDCAD  but we want to be more flexible. In this case, the table name (QuotesTable) and "where" column name (sym) are stored in variables: tableName : QuotesTable; columnName : sym;  and our functional select becomes ?[tableName;(enlist (=; columnName; enlist AUDCAD)); 0b; ()]  This works, just as before. ### Example 3 select from QuotesTable where bid > 1.5  is equivalent to ?[QuotesTable;(enlist (>; bid; 1.5)); 0b; ()]  ### Example 4 select from QuotesTable where bid <> 0n  is equivalent to ?[QuotesTable;(enlist (<>; bid; 0n)); 0b; ()]  ### Example 5 select bid from QuotesTable where sym=AUDCAD  is equivalent to ?[QuotesTable;(enlist (=; sym; enlist AUDCAD)); 0b; (enlist bid)!(enlist bid)]  ### Comment One of the most powerful features of this syntax is the ability to dynamically select a column, for example: col : bid ?[QuotesTable;(enlist (=; sym; enlist AUDCAD)); 0b; (enlist dataPoint)!(enlist col)]  ### Example 6 select bid, ask from QuotesTable where sym=AUDCAD  is equivalent to ?[QuotesTable;(enlist (=; sym; enlist AUDCAD)); 0b; bidask!bidask]  ### Example 7 select date, time, bid, offer from QuotesTable where date = 2012.03.08, market = Equities, sym = FTSE  is equivalent to ?[QuotesTable; ((=; date; 2012.03.08); (=; market; enlist Equities); (=; sym; enlist FTSE)); 0b; datetimebidoffer!datetimebidoffer]  ## Listing variables (including tables) and functions defined on an instance \v  lists the variables (including tables) in the global namespace (.). \f  lists the functions in the global namespace. ## Deleting variables (including tables) and functions defined on an instance To delete a variable, use delete MyTableName from .  and to delete a function, use delete myFunctionName from .  do the trick. You can check that the variable (or function) was deleted by running \v (or \f). ## Concatenating compound columns on selecting from tables Suppose that we define ExampleTable1 : ([] sym:symbol$(); list1:(); list2:());
ExampleTable1 ,: (foo; (0.12 0.32 0.43); (0.32 0.23 0.43));
ExampleTable1 ,: (foo; (0.32 0.33 0.54); (0.65 0.23 0.32));
ExampleTable1 ,: (foo; (0.93 0.92 0.90); (0.90 0.89 0.54));


i.e. we have

sym list1          list2
---------------------------------
foo 0.12 0.32 0.43 0.32 0.23 0.43
foo 0.32 0.33 0.54 0.65 0.23 0.32
foo 0.93 0.92 0.9  0.9  0.89 0.54


and we want to get

sym combinedList
---------------------------------
foo 0.12 0.32 0.43 0.32 0.23 0.43
foo 0.32 0.33 0.54 0.65 0.23 0.32
foo 0.93 0.92 0.9  0.9  0.89 0.54


This is achieved as follows:

select sym, combinedList: (list1, 'list2) from ExampleTable1


## Applying a function to each date in a given list

It is very often useful to partition queries by dates, especially if this reflects the way in which your tables are stores.

Here is a simple toy example to illustrate the syntax.

displayIncrementedDate : { [message; date; increment]
0N!message, string[date + increment];
1b
}

displayIncrementedDate["Date: ";;3] each (2010.05.11T13:06:00.000 + til 10)


This returns the result, 1111111111b, which is a list of booleans, and also displays

"Date: 2010.05.14T13:06:00.000"
"Date: 2010.05.15T13:06:00.000"
"Date: 2010.05.16T13:06:00.000"
"Date: 2010.05.17T13:06:00.000"
"Date: 2010.05.18T13:06:00.000"
"Date: 2010.05.19T13:06:00.000"
"Date: 2010.05.20T13:06:00.000"
"Date: 2010.05.21T13:06:00.000"
"Date: 2010.05.22T13:06:00.000"
"Date: 2010.05.23T13:06:00.000"


## Partitioning a time interval by hour

The following function does the trick:

partitionTimeIntervalByHour : { [beginTime; endTime]
(beginTime + ((til (ceiling[(endTime - beginTime) % 01:00:00.000])) * 01:00:00.000)), endTime
}


For example:

beginTime : 01:00:00.000;
endTime : 03:00:00.000;
partitionTimeIntervalByHour[beginTime; endTime]
// Result: 01:00:00.000 02:00:00.000 03:00:00.000

beginTime : 01:00:00.000;
endTime : 01:00:00.000;
partitionTimeIntervalByHour[beginTime; endTime]
// Result: enlist 01:00:00.000

beginTime : 01:30:55.123;
endTime : 05:15:35.012;
partitionTimeIntervalByHour[beginTime; endTime]
// Result: 01:30:55.123 02:30:55.123 03:30:55.123 04:30:55.123 05:15:35.012


## Processing the resulting intervals

This section is a sequel of the previous one. Suppose that we have partitioned our interval.

As before,

beginTime : 01:30:55.123;
endTime : 05:15:35.012;
times : partitionTimeIntervalByHour[beginTime; endTime];


Thus times has been set to

01:30:55.123 02:30:55.123 03:30:55.123 04:30:55.123 05:15:35.012


Now we want to process the resulting intervals, interval by interval, by applying, say,

processTimeInterval : { [beginTime; endTime]
0N!"from ", string[beginTime], " to ", string[endTime]
}


For example,

processTimeInterval[01:30:55.123;02:30:55.123]


will display the string "from 01:30:55.123 to 02:30:55.123" and also return it. Of course processTimeInterval could have been a lot smarter and more useful but this is not the point. The point is that we would like to apply processTimeInterval to each interval in our partition, i.e. we want "something" that will result in the following function calls:

processTimeInterval[01:30:55.123;02:30:55.123];
processTimeInterval[02:30:55.123;03:30:55.123];
processTimeInterval[03:30:55.123;04:30:55.123];
processTimeInterval[04:30:55.123;05:15:35.012];


Our first step is to produce the time intervals from times:

timeIntervals : -1_2#/:(0_til count times)_\:times;


Thus the variable timeIntervals has been set to:

(01:30:55.123 02:30:55.123;02:30:55.123 03:30:55.123;03:30:55.123 04:30:55.123;04:30:55.123 05:15:35.012)


Our next step is to apply processTimeInterval to each of the resulting time intervals:

results : (processTimeInterval .) each timeIntervals;


which produces the following results:

("from 01:30:55.123 to 02:30:55.123";"from 02:30:55.123 to 03:30:55.123";"from 03:30:55.123 to 04:30:55.123";"from 04:30:55.123 to 05:15:35.012")


We would like to thank Mahesh BP for his advice.

## Supplying function arguments in a list

Suppose that you have the following function:

addUp : { [x; y] x + y }


You can, of course, call it as follows:

addUp[3; 5]


The result is 8. But what if you have

L : 3 5


And you want to do

addUp[L[0]; L[1]]


A more elegant way to do this is:

addUp . L


## How to produce a string representation of a list of symbols

Suppose that we have

L : foobarbaz


i.e. a list of symbols, and we would like to have

"foobarbaz"


a string. This is particularly useful when one is constructing a query to be sent to a remote server. We can achieve this as follows:

raze "", ' string L


## Checking if a date is a weekend

isWeekend : { [date]
weekOffset : date - week$date; (weekOffset = 5) or (weekOffset = 6) }  ## Upper join versus asof-join Suppose that we have the following three tables: Table1 : ([] dateTime:date$(); dataPoint1:float$()); Table1 ,: (2010.05.18; 3.57); Table1 ,: (2010.05.20; 3.58); Table1 ,: (2010.05.21; 3.58); Table1 ,: (2010.05.22; 3.59); Table1 ,: (2010.05.23; 3.60); Table2 : ([] dateTime:date$(); dataPoint2:float$()); Table2 ,: (2010.05.19; 1.23); Table2 ,: (2010.05.20; 1.24); Table2 ,: (2010.05.21; 1.25); Table2 ,: (2010.05.24; 1.26); Table2 ,: (2010.05.25; 1.25); Table3 : ([] dateTime:date$(); dataPoint3:float()); Table3 ,: (2010.05.17; 3.23); Table3 ,: (2010.05.18; 3.24); Table3 ,: (2010.05.21; 3.25);  We are interested in aligning the time series. There are several ways to achieve this involving different kinds of joins. We are going to use this example to demonstrate the semantics of these joins. ### asof-join First, the asof-join: aj[dateTime;aj[dateTime;Table1;Table2];Table3]  This produces the following result: dateTime dataPoint1 dataPoint2 dataPoint3 2010.05.18 3.57 0n 3.24 2010.05.20 3.58 1.24 3.24 2010.05.21 3.58 1.25 3.25 2010.05.22 3.59 1.25 3.25 2010.05.23 3.6 1.25 3.25  For some purposes this might be what we need. Observe that the dateTimes are sorted and they are exactly those of Table1. We have lost some date-times — the ones that are in Table2 or Table3 but not in Table1. ### Union join This brings us to our next item: the union join. Consider this: (uj/)1!'(Table1;Table2;Table3)  The result is: dateTime | dataPoint1 dataPoint2 dataPoint3 ----------------------------------------------- 2010.05.18 | 3.57 0n 3.24 2010.05.20 | 3.58 1.24 0n 2010.05.21 | 3.58 1.25 3.25 2010.05.22 | 3.59 0n 0n 2010.05.23 | 3.6 0n 0n 2010.05.19 | 0n 1.23 0n 2010.05.24 | 0n 1.26 0n 2010.05.25 | 0n 1.25 0n 2010.05.17 | 0n 0n 3.23  This is not quite what we wanted. Sure, we have a union of dateTimes from Table1, Table2, and Table3, but they are no longer sorted and they are no longer filled. We need to do some more work: fills dateTime xasc (uj/)1!'(Table1;Table2;Table3)  In order to get: dateTime | dataPoint1 dataPoint2 dataPoint3 ----------------------------------------------- 2010.05.17 | 0n 0n 3.23 2010.05.18 | 3.57 0n 3.24 2010.05.19 | 3.57 1.23 3.24 2010.05.20 | 3.58 1.24 3.24 2010.05.21 | 3.58 1.25 3.25 2010.05.22 | 3.59 1.25 3.25 2010.05.23 | 3.6 1.25 3.25 2010.05.24 | 3.6 1.26 3.25 2010.05.25 | 3.6 1.25 3.25  How efficient is this? Iteratively, we could achieve this in O(N1 + N2 + N3) operations. Looks like this query is suboptimal, as you do it in stages: (1) uj, (2) xasc, (3) fills. We would be interested in hearing a better solution! Many thanks to Alex Donohue and Attila Vrabecz for their comments. ## Multiplying out the columns of a table Let us again consider the table we have just created: Table1 : ([] dateTime:date(); dataPoint1:float$()); Table1 ,: (2010.05.18; 3.57); Table1 ,: (2010.05.20; 3.58); Table1 ,: (2010.05.21; 3.58); Table1 ,: (2010.05.22; 3.59); Table1 ,: (2010.05.23; 3.60); Table2 : ([] dateTime:date$(); dataPoint2:float$()); Table2 ,: (2010.05.19; 1.23); Table2 ,: (2010.05.20; 1.24); Table2 ,: (2010.05.21; 1.25); Table2 ,: (2010.05.24; 1.26); Table2 ,: (2010.05.25; 1.25); Table3 : ([] dateTime:date$(); dataPoint3:float$()); Table3 ,: (2010.05.17; 3.23); Table3 ,: (2010.05.18; 3.24); Table3 ,: (2010.05.21; 3.25); Table : fills dateTime xasc (uj/)1!'(Table1;Table2;Table3)  i.e. dateTime | dataPoint1 dataPoint2 dataPoint3 ----------------------------------------------- 2010.05.17 | 0n 0n 3.23 2010.05.18 | 3.57 0n 3.24 2010.05.19 | 3.57 1.23 3.24 2010.05.20 | 3.58 1.24 3.24 2010.05.21 | 3.58 1.25 3.25 2010.05.22 | 3.59 1.25 3.25 2010.05.23 | 3.6 1.25 3.25 2010.05.24 | 3.6 1.26 3.25 2010.05.25 | 3.6 1.25 3.25  Our next goal is to multiply out the columns dataPoint1, dataPoint2, and dataPoint3. ### Solution 1 One way to achieve this is: NonKeyedTable : () xkey Table; Result : NonKeyedTable[dataPoint1] * NonKeyedTable[dataPoint2] * NonKeyedTable[dataPoint3]; ResultTable : flip[dateTimedataPoint!(NonKeyedTable[dateTime]; Result)]  and ResultTable becomes dateTime dataPoint --------------------- 2010.05.17 0n 2010.05.18 0n 2010.05.19 14.227164 2010.05.20 14.383008 2010.05.21 14.54375 2010.05.22 14.584375 2010.05.23 14.625 2010.05.24 14.742 2010.05.25 14.625  Now suppose that we don't know the names of the columns to be added in advance. We just want to multiply together all the columns save for the first one. Here is one way to do this: NonKeyedTable : () xkey Table; ColumnsToMultiply : NonKeyedTable[1_(cols NonKeyedTable)]; ResultTable : flip[dateTimedataPoint!(NonKeyedTable[dateTime]; 1*/ColumnsToMultiply)]  The result is exactly the same. This time we didn't have to enumerate the columns explicitly. ### Solution 2 This can be achieved a lot more succintly: select dateTime, prd 1_flip 0!Table from Table  Courtesy of Attila Vrabecz. ## Select with filtering Suppose that you have a table TimeSeriesTable like this: dateTime dataPoint ---------------------------------- 2010.05.19 00:00:00.061 1.23235 2010.05.19 00:00:00.063 1.23235 2010.05.19 00:00:00.067 1.23236 2010.05.19 00:00:00.087 1.23235 2010.05.19 00:00:00.143 1.23235 2010.05.19 00:00:00.144 1.23235 2010.05.19 00:00:00.147 1.23236 2010.05.19 00:00:00.238 1.23236 2010.05.19 00:00:00.341 1.21446 2010.05.19 00:00:00.345 1.21446 2010.05.19 00:00:00.346 1.21447 2010.05.19 00:00:00.347 1.21448 ...  You want to enable the user to provide a custom filter for selecting data from this table. You can provide the following function: getTimeSeries : { [filter] select from TimeSeriesTable where filter[dateTime; dataPoint] }  You can then prepare a useful filter: timeIntervalTimeSeriesFilter : { [beginTime; endTime; dateTime; dataPoint] (time$dateTime) within (beginTime; endTime)
}


which the user can specialise and use as follows:

fltr : timeIntervalTimeSeriesFilter[00:30:00.000; 01:00:00.000]
getTimeSeries[fltr]


## Conjunction and disjunction of filters

Let us prototype this first.

isOdd : { [x] mod[x; 2] }
isSmallPrime : { [x] (x = 2) or (x = 3) or (x = 5) or (x = 7) }


Let's test the following.

a : 2;


As expected,

1b and/ ({ x[a] } each (isOdd; isSmallPrime))


returns 0b because 2 isn't odd (although it is a small prime) and

0b or/ ({ x[a] } each (isOdd; isSmallPrime))


returns 1b because it is a small prime (even though it isn't odd).

Similarly,

a : 3

1b and/ ({ x[a] } each (isOdd; isSmallPrime))


returns true because 3 is both odd and a small prime and

0b or/ ({ x[a] } each (isOdd; isSmallPrime))


returns true for the very same reason.

We are now ready to develop further our previous example and introduce conjunction and disjunction of filters:

andFilter : { [filters; dateTime; dataPoint]
1b and/ ({ [dt; dp; x] x[dt; dp] }[dateTime; dataPoint] each filters)
}

orFilter : { [filters; dateTime; dataPoint]
0b or/ ({ [dt; dp; x] x[dt; dp] }[dateTime; dataPoint] each filters)
}


Now we want the data points that are either in the interval [00:30:00.000; 00:31:00.000] or in the interval [00:01:00.000; 00:02:00.000]. So we construct the following filter:

fltr : andFilter[(timeIntervalTimeSeriesFilter[00:30:00.000; 00:31:00.000]; timeIntervalTimeSeriesFilter[00:01:00.000; 00:02:00.000])]


And we can apply it as before

getTimeSeries[fltr]


## Selecting N values preceding or following a given date-time

Suppose you have the following table:

TestTable : ( [] dateTime:datetime$(); dataPoint:float$() );
TestTable ,: (2010.05.28T13:32:02.000; 1.32);
TestTable ,: (2010.05.28T13:33:01.000; 1.52);
TestTable ,: (2010.05.28T13:33:02.000; 1.66);
TestTable ,: (2010.05.28T13:36:01.000; 1.73);
TestTable ,: (2010.05.28T13:40:02.000; 1.74);
TestTable ,: (2010.05.28T13:45:03.000; 1.75);


i.e.

dateTime                 dataPoint
----------------------------------
2010.05.28 13:32:02.000  1.32
2010.05.28 13:33:01.000  1.52
2010.05.28 13:33:02.000  1.66
2010.05.28 13:36:01.000  1.73
2010.05.28 13:40:02.000  1.74
2010.05.28 13:45:03.000  1.75


For a given list of datetimes

ListOfDateTimes : 2010.05.28T13:32:00.000 2010.05.28T13:33:00.000 2010.05.28T13:34:00.000 2010.05.28T13:35:00.000 2010.05.28T13:36:00.000 2010.05.28T13:37:00.000 2010.05.28T13:38:00.000;


we would like to select the three (more generally, N) data points that precede each date-time and the three (more generally, N) data points that follow that date-time.

Here is how this can be achieved:

select ListOfDateTimes, dataPoint(dateTime bin ListOfDateTimes)-\:reverse til 3 from TestTable


which gives

ListOfDateTimes          x
---------------------------------------
2010.05.28 13:32:00.000  0n 0n 0n
2010.05.28 13:33:00.000  0n 0n 1.32
2010.05.28 13:34:00.000  1.32 1.52 1.66
2010.05.28 13:35:00.000  1.32 1.52 1.66
2010.05.28 13:36:00.000  1.32 1.52 1.66
2010.05.28 13:37:00.000  1.52 1.66 1.73
2010.05.28 13:38:00.000  1.52 1.66 1.73


and

select ListOfDateTimes, dataPoint(dateTime bin ListOfDateTimes)+\:1+til 3 from TestTable


which gives

ListOfDateTimes          x
---------------------------------------
2010.05.28 13:32:00.000  1.32 1.52 1.66
2010.05.28 13:33:00.000  1.52 1.66 1.73
2010.05.28 13:34:00.000  1.73 1.74 1.75
2010.05.28 13:35:00.000  1.73 1.74 1.75
2010.05.28 13:36:00.000  1.73 1.74 1.75
2010.05.28 13:37:00.000  1.74 1.75 0n
2010.05.28 13:38:00.000  1.74 1.75 0n


Many thanks to Attila Vrabecz and Ed Bierly.

## Functions with more than 8 parameters

There is a restriction in q: a function can have at most eight parameters. This restriction is dictated by the bytecode.

Suppose we try this:

h : { [arg1;arg2;arg3;arg4;arg5;arg6;arg7;arg8;arg9]
"It worked!"
}


Actually, it didn't work, because we get:

An error occurred during execution of the query.
The server sent the response:
params
Studio Hint: Possibly this error refers to too many parameters (8 max)


But there is a way around it: composing the function with plist:

f : (')[ { [args]
count args
};plist]


This is a very trivial function, of course, that simply returns the number of arguments. But it could do something smarter, of course.

It works:

q)f[1]
1

q)f[1;2]
2

q)f[1;2;3;4;5;6;7;8;9;10]
10

Now we can redefine
h
as something like this:
h : (')[ { [args]
arg1 : args[0];
arg2 : args[1];
arg3 : args[2];
arg4 : args[3];
arg5 : args[4];
arg6 : args[5];
arg7 : args[6];
arg8 : args[7];
arg9 : args[8];

// Do stuff
};plist]


Many thanks to Ben Delo.

## Interrupting a query without killing the q instance

Suppose you have a long-running query that you want to interrupt without killing the server. This can be done by issuing a SIGINT:

kill -2 12345


where 12345 is the relevant process ID which you can find out using ps or top. top very often comes in useful here as chances are the q instance in question will be maxing out on its CPU usage, so will appear right at the top of top.

You may have to wait for a minute or so until the signal is picked up. If several queries are queuing, you may have to repeat the command several times.

This requires that you have shell access to the relevant q server, which may not be the case. If indeed this is not the case, you may want to ask your system administrator to provide some infrastructure that would automate this task and enable you to access that infrastructure.

## Adding a column to a table

Suppose you have your new column data in a list, myColumnData. The following will add it to the end of myTable:

myNewTable : update myColumn:myColumnData from myTable


To the beginning:

myNewTable : myColumn xcols update myColumn:myColumnData from myTable


After N columns:

myNewTable : ((N sublist cols myTable), myColumn) xcols update myColumn:myColumnData from myTable


Many thanks to Charlie Skelton for this contribution.

## Applying a function to each value in a list

multiplyByFactor : { [factor; val] factor * val }
f : multiplyByFactor[3f];
f':[(1f; 2f; 3f; 4f; 5f)]


The result is:

3 6 9 12 15f


For example,

TestTable : ([] sym:symbol$(); date:date$(); val:float$()); upsert [TestTable] (EURUSD; 2009.02.10; 1.2874); upsert [TestTable] (EURUSD; 2009.02.10; 1.2875); upsert [TestTable] (EURUSD; 2009.02.11; 1.2901);  creates sym date val -------------------------- EURUSD 2009.02.10 1.2874 EURUSD 2009.02.10 1.2875 EURUSD 2009.02.11 1.2901  Now we can scale the val by 10 for date = 2009.02.10: update val:val*10 from TestTable where date = 2009.02.10  Since in the above update we used TestTable rather than just TestTable, the table got modified in place rather than copied, and we can check that TestTable  is now sym date val -------------------------- EURUSD 2009.02.10 12.874 EURUSD 2009.02.10 12.875 EURUSD 2009.02.11 1.2901  ## Signalling errors and protected evaluation It is straightforward to signal an error. Here is an example of a function that will always signal 'ouch: dodgyFunction : { [] 'ouch }  Indeed, when we try calling it from Studio, dodgyFunction[]  we get An error occurred during execution of the query. The server sent the response: ouch  There is an equivalent to try-catch in Q. It is called protected evaluation. We shall illustrate it by examples. niceFunction : { [] 123 } dodgyFunction : { [] 'ouch } niceFunction[] // Result: 123 dodgyFunction[] // Result: ouch signal niceResult : @[niceFunction; 0n; "failure"]; // Result: niceResult = 123 dodgyResult : @[dodgyFunction; 0n; "failure"]; // Result: dodgyResult = "failure"  For functions taking a single parameter (i.e. monadic functions) the syntax is identical: monadicNiceFunction : { [x] 3 * x } monadicDodgyFunction : { [x] 'ouch } monadicNiceFunction[5] // Result: 15 monadicDodgyFunction[5] // Result: ouch signal niceResult : @[monadicNiceFunction; 5; "failure"]; // Result: niceResult = 15 dodgyResult : @[monadicDodgyFunction; 5; "failure"]; // Result: dodgyResult = "failure"  But for multivalent functions (diadic, triadic, etc., i.e. taking more than a single parameter), we need to use a slightly different syntax for protected evaluation, passing arguments in a list: diadicNiceFunction : { [x; y] x + y } diadicDodgyFunction : { [x; y] 'ouch } diadicNiceFunction[3; 5] // Result: 8 diadicDodgyFunction[3; 5] // Result: ouch signal niceResult : .[diadicNiceFunction; (3; 5); "failure"]; // Result: niceResult = 8 dodgyResult : .[diadicDodgyFunction; (3; 5); "failure"]; // Result: dodgyResult = "failure"  ## Setting multiple elements of a list in one go Suppose we have zzz : 10#0n  i.e. 0n 0n 0n 0n 0n 0n 0n 0n 0n 0n  We can set the first five elements to 1.0 as follows: zzz[til[5]] : 5#1f  We can also set zzz[6] to 2.0 and zzz[8] to 3.0 in one go: zzz[(6 8)] : (2 3f)  The end result is 1 1 1 1 1 0n 2 0n 3 0n  ## The syntax for specifying full path names in : It is :C:/my/path/and/file/name.csv  on Windows and :/my/path/and/file/name.csv  on *nix. If your paths contain any non-kosher characters, such as dashes or spaces, you may wish to do this: ":C:/my/path/and-file/name.csv"  However, when you come to run ":C:/my/path/and-file/name.csv" set MyTable  you may see a "type" error. This is because of the order of evaluation (from right to left). You can correct it by bracketing: (":C:/my/path/and-file/name.csv") set MyTable  ## Unexplained examples {x|y} . (1010110b; 0100010b) / 1110110b (or) . (1010110b; 0100010b) / 1110110b / NOT / or . (1010110b; 0100010b) / <tt>(or)</tt> is a noun whereas <tt>or</tt> is a verb. (or) . each[("aaa"; "bbb"; "abc"; "foo"; "bar") like] ("a*"; "b*") / 11101b / This only works with two arguments because or is diadic each["pre" ,] ("fix";"view";"lude") / ("prefix";"preview";"prelude") ("enjoy";"invest";"postpone") ,\: "ment" / ("enjoyment";"investment";"postponement") 00000b(or)/(01000b;01010b;01001b) / 01011b 0b(or)/(01000b;01010b;01001b) / 01011b enlist[ColumnName] _ MyTable / ColumnName deleted from MyTable l: foobarbazbarbaz l _ l?bar / Deletes the first occurrence of bar 9 (1+)\ 2011.01.01 / 2011.01.01 2011.01.02 2011.01.03 2011.01.04 2011.01.05 2011.01.06 2011.01.07 2011.01.08 2011.01.09 2011.01.10 { 0N!x } each 9 (1+)\ 2011.01.01 firstDate: 2011.01.01; lastDate: 2011.01.05; (lastDate-firstDate) (1+)\ firstDate foo in (foobarbaz;foo;barbaz) // 0b foo in '(foobarbaz;foo;barbaz) // 110b .z.t <-- UTC (GMT) .z.T <-- server time .z.t - .z.T // Casting the column of a table TestTable : ([] sym:symbol$(); date:date$(); val:float$());
upsert [TestTable] (EURUSD; 2009.02.10; 1.2874);
upsert [TestTable] (EURUSD; 2009.02.10; 1.2875);
upsert [TestTable] (EURUSD; 2009.02.11; 1.2901);
meta TestTable
// val is of type f
TestTable[val] : real\$TestTable[val]
meta TestTable
// val is now of type e

enlist[minute] _ () xkey (select first date, first sym, first time, first bid, first ask by minute: (date + time.minute) from quotes where date within (firstDate; lastDate), sym=XYZ)

// Concatenate multiple lists or dictionaries into one
() ,/ ((abc!ABC); (def!DEF); (gh!GH))
// abcdefgh!ABCDEFGH

// Merging multiple columns into one
select time, Sizes: (Size0,'Size1,'Size2,'Size3,'Size4) from Bids where date = 2012.09.25, sym = cusip

// We can then do something like this:
select time, sum each Sizes from
select time, Sizes: (Size0,'Size1,'Size2,'Size3,'Size4) from Bids where date = 2012.09.25, sym = cusip