Knowledge Base/Databases/Kdb
Acknowledgements
We would like to thank Ed Bierly, Jeff Borror, Mahesh BP, Bernard Cena, Aaron Davies, Ben Delo, Alex Donohue, Mark Gardner, Daniel Lister, Manish Patel, Charlie Skelton, Attila Vrabecz, and of course Arthur Whitney for their help and guidance.
Kdb+ in a nutshell
This is a very, very brief introduction to kdb+. Paul Bilokon has previously delivered it as a presentation in under 20 minutes. In order to understand it properly, we recommend that you install kdb+. You can download the 32-bit version for free from
http://kx.com/Developers/FREE-DOWNLOAD.php
You can play around with the 32-bit version but you are not permitted to use it in production. Also, 32-bit programs can only access up to 4 GB of Random Access Memory (RAM, i.e. "the" memory) whereas 64-bit programs can access more than 128 GB of RAM. This is a pretty strong restriction; most production users of this in-memory database (IMDB) would want to access more than 4 GB of RAM. Moreover, 64-bit programs offer increased performance.
I just called kdb+ an in-memory database, i.e. a database management system that primarily relies on main memory for computer data storage. Yes, kdb+ is an in-memory database. But is this the best description for kdb? What is it, really?
You will be pleased to know that kdb+ is really an interpreted programming language. You can call it with a script file to execute, like so
C:\q\q.exe foo.q
or use it as an interactive console
C:\q\q.exe
(Keep it running.)
In this sense it is no different from Python or Perl.
So we can compute things in kdb+. Simple things, like these:
q)5 + 3 8
(q) above is the prompt, very much like the Unix prompt.)
Or these. Here we are concatenating strings:
q)"Hello ", "World!" "Hello World!"
But more often than not you will deal with symbols (a pretty special concept in kdb+) rather than strings, because these are usually indexed and stored efficiently:
q)a : `foo q)a `fooOh, we have just declared (and defined) a variable without giving you any warning. Yes,
ais a variable and we didn't use "a = ..." but rather "a : ..." because K and Q likes to be different. :-)
Hang on, we are talking about kdb+, what are K and Q? These are the programming languages behind kdb+. K is a very terse (and difficult to read, for humans) programming language invented by Arthur Whitney and it became the foundation of kdb+. Q is essentially a wrapper around K which uses human words, such as "each" so it is much easier to understand. It's built on top of K. We will ignore the distinction between K and Q and refer to the language as "Q" in the sequel. It turns out that we have been programming in Q all along.
So kdb+ is really this Q programming language plus some philosophy. It's nice to know this fact. Now it is clear why the kdb+ executable is called q.exe (q on Unix). It is a Q interpreter.
What makes it an in-memory database?
Well, the fact that we can declare variables and store data in-memory, like so:
q)a : 3 q)b : 5 q)a + b 8
And these variables don't have to be scalars. They can be lists:
q)l1 : `foo`bar`baz q)l1 `foo`bar`baz q)l2 : 1 2 3 4 5 6 7 8 q)l2 1 2 3 4 5 6 7 8
Dictionaries:
q)d1 : `name`age!(`Methuselah`Jared`Noah; 969 962 950) q)d1 name| Methuselah Jared Noah age | 969 962 950
And, more importantly perhaps, tables:
q)t1 : ([] name:`symbol$(); age:`int$()) q)t1 name age --------
where we can store data:
q)t1 ,: (`Methuselah, 969) q)t1 ,: (`Jared, 962) q)t1 ,: (`Noah, 950) q)t1 name age -------------- Methuselah 969 Jared 962 Noah 950
Note that we could also define t1 by flipping a dictionary:
q)t1 : flip[d1] q)t1 name age -------------- Methuselah 969 Jared 962 Noah 950
This table now "lives" in memory. It is in this sense that kdb+ is an in-memory database. And so are Python and Perl. But kdb+ comes with many useful pleasantries which Python and Perl don't have. And it has a lot of clever optimisation, such as indexing and caching, under the covers. These make it more deserving of the title than the aforementioned Python and Perl.
Notice that we can "put a port" on our kdb+ process:
q)\p 19123 q)
Then open a browser and go to http://localhost:19123/
We will see the aforementioned variables names a, b, l1, l2, d1, t1 in the frame on the left. We can click on them and in the right frame we'll see their values. E.g. for t1:
name age -------------- Methuselah 969 Jared 962 Noah 950
Great, so we got more than we bargained for. Kdb+ comes with a network interface that our browser can understand. We can even issue Q commands via the browser:
http://localhost:19123/?count t1
will give us the number of rows in t1, i.e.
3
We can use some SQL syntax in Q:
q)select age from t1 where name=`Noah age --- 950
We can also save tables to binary files:
q)(`$":H:/myt1.dat") set t1 `:H:/myt1.dat
And load them from those binary files later on:
q)t2 : get (`$":H:/myt1.dat") q)t2 name age -------------- Methuselah 969 Jared 962 Noah 950
We can hold the data in memory and we can persist it to disk. Conventional databases are in control of the persistence process. In kdb+ the user is in control. This may be taxing but enables him/her to perform some clever optimisations.
Rather than accessing kdb+ remotely (or locally, on localhost) from the browser we can use TCP and another q.exe process. If you start q.exe in another terminal, you can do this:
q)h : hopen `:localhost:19123 q)h "select from t1" name age -------------- Methuselah 969 Jared 962 Noah 950 q)hclose h
Kdb nomenclature: k, Q, etc.
When talking about Kdb, several terms are used, often interchangeably: k, Q, kdb, kdb+... What does each of these mean?
- k is the language that Arthur Whitney wrote after leaving Morgan Stanley. It was first released in 1993, was under exclusive contract to UBS until 1996, when it became generally available.
- ksql was added in 1998 which was a layer on top of k. This is sometimes regarded to be part of k. ksql includes SQL-like constructs, such as select.
- kdb is the in-memory database built on k.
- kdb+ was released in June 2003. This was more or less a total rewrite of kdb for 64 bit systems. It uses k (version 4) to boot and has...
- Q, which is believed to stand for Query language, a language layer on top of k, which is defined in terms of k. Both equivalent Q and k compile to the same byte code that is executed in the kdb+ byte interpreter. Q is like a macro language layer, e.g. type in Q is the equivalent of @: in k. Most kdb+ developers just use Q (not k), as it is more readable.
Many thanks to Charlie Skelton for his explanation.
Key bindings on Linux
When you run a Windows version of Q, the arrow keys move you up and down the command history, just as they do on a properly set up Linux terminal.
When you try using these keys in a Linux version of Q, you'll see something like
q)^[[A
for "up", which is not particularly helpful.
You can fix this by running q from rlwrap:
/home/pbilokon/bin/rlwrap /home/pbilokon/q/l64/q
Perhaps put this inside a shell script, start_q.sh:
#!/bin/bash /home/pbilokon/bin/rlwrap /home/pbilokon/q/l64/q $@
To find out what rlwrap does, see http://utopia.knoware.nl/~hlub/uck/rlwrap/ (in brief, it is a "readline wrapper").
Does Q have to be unreadable?
No.
Although Q's creator, Arthur Whitney, believes in expressing logic in code as tersely as possible, you don't have to emulate his style. The following example generates geometric Brownian motion paths:
// A list of differences between the elements of a given list, lst. The
// length of the result is one less than the length of lst. The result
// is
//
// ((lst[1] - lst[0]) (lst[2] - lst[1]) ... (lst[N-1] - lst[N-2]))
//
// where N is count lst.
//
differences : { [lst]
lst[0] -': lst[(til ((count lst) - 1)) + 1]
}
// Generate a single random number sampled from Uniform(0, 1).
//
// Example:
//
// uniformRandomVariate[]
//
// Result:
//
// (nondeterministic)
//
// 0.06551717
//
uniformRandomVariate : { []
// 1 ? 1f generates a single (1) random number drawn from [0f, 1f]
// [0] is required since (1 ? 1f) returns a list.
(1 ? 1f)[0]
}
nextGaussianRandomVariateAvailable : 0;
nextGaussianRandomVariate : 0f;
// Generate a single random number sampled from Normal(0, 1).
//
// The implementation uses the polar form of the Box-Muller transform.
//
// Example:
//
// gaussianRandomVariate[]
//
// Result:
//
// (nondeterministic)
//
// -1.397263
//
gaussianRandomVariate : { []
// Box-Muller transform takes two Uniform(0, 1) random numbers and
// produces two Normal(0, 1) random numbers.
// Its polar form is both faster and more robust numerically.
// Because we generate two Normal(0, 1) random numbers at once and
// return only one, we only have to compute them every other time
// the function is called. We cache the second random number (the
// one we did not return) and return it next time the function is
// called, without computing.
// This will stay 1 only if we need to compute the numbers.
compute : 1;
// Do we already have a cached random number available?
if [nextGaussianRandomVariateAvailable;
// If so, we don't need to compute.
compute : 0;
result : nextGaussianRandomVariate;
nextGaussianRandomVariateAvailable :: 0;
];
// So do we have to compute the random numbers?
if [compute;
w : 1f;
while [w >= 1f;
x1 : ((2f * uniformRandomVariate[]) - 1f);
x2 : ((2f * uniformRandomVariate[]) - 1f);
w : ((x1 * x1) + (x2 * x2))
];
w : sqrt ((-2f * log[w]) % w);
y1 : x1 * w;
y2 : x2 * w;
nextGaussianRandomVariate :: y2;
nextGaussianRandomVariateAvailable :: 1;
result : y1;
];
result
}
exponentialRandomVariate : { [lambda]
neg[log[uniformRandomVariate[]]] % lambda
}
// Parameters:
//
// drift: drift per unit time, which is 1 year
// volatility: per sqrt[unit time], which is sqrt[1 year]
//
geometricBrownianMotionFactor : { [drift; volatility; dateDifference]
deterministicTerm : (drift - ((0.5 * (volatility xexp 2.0)))) * dateDifference;
stochasticTerm : volatility * gaussianRandomVariate[] * sqrt[dateDifference];
exp[deterministicTerm + stochasticTerm]
}
// Simulate a geometric Brownian motion process (GBM), i.e.
//
// dS_t = \mu S_t dt + \sigma S_t dW_t
//
// whence
//
// S_t = S_0 e((\mu - \sigma^2 / 2) t + \sigma \sqrt{t} N_{0, 1})
//
// Parameters:
//
// initialValue: the initial value, S_0
// drift: drift per unit time, which is 1 day (!)
// volatility: per sqrt[unit time], which is sqrt[1 day]
// dateTimes: a list of datetimes or dates for which the values will be
// generated
//
// Example:
//
// dateRange : (2008.01.01 2008.01.02 2008.01.03 2008.01.04 2008.01.05)
// geometricBrownianMotion[1.5490; 0.003; 0.12; dateRange]
//
// Result:
//
// (nondeterministic)
//
// 1.549 1.5431551 1.5468368 1.5343732 1.5195453
//
geometricBrownianMotion : { [initialValue; drift; volatility; dateTimes]
drift %: 365.25;
volatility %: sqrt[365.25];
dateDifferences : differences[dateTimes];
f : geometricBrownianMotionFactor[drift; volatility];
factors : f':[dateDifferences];
(initialValue), initialValue *\ factors
}
// Example:
//
// exponentiallyDistributedDateTimes[5; 2010.06.16T00:00:00.000; 2010.06.18T00:00:00.000]
//
// Result:
//
// (nondeterministic)
//
// 2010.06.16 00:27:46.087 2010.06.16 14:44:15.869 2010.06.17 00:40:05.673
// 2010.06.17 07:42:27.479 2010.06.17 13:32:56.693
//
exponentiallyDistributedDateTimes : { [ticksPerDay; beginDateTime; endDateTime]
dateTimes : `datetime$();
lastDateTime : beginDateTime + exponentialRandomVariate[ticksPerDay];
while[lastDateTime <= endDateTime;
dateTimes ,: lastDateTime;
lastDateTime : lastDateTime + exponentialRandomVariate[ticksPerDay];
];
dateTimes
}
// Example:
//
// geometricBrownianMotionTimeSeries[1.5490; 0.003; 0.15; 5; 2010.06.16T00:00:00.000; 2010.06.18T00:00:00.000]
//
// Result:
//
// (nondeterministic)
//
// dateTime dataPoint
// ----------------------------------
// 2010.06.16 12:36:42.888 1.549
// 2010.06.16 20:18:18.691 1.5622148
// 2010.06.16 21:20:50.682 1.561131
// 2010.06.16 23:10:47.157 1.560705
// 2010.06.17 00:07:32.780 1.5647236
// 2010.06.17 00:53:56.429 1.5667019
// 2010.06.17 01:23:21.514 1.5637483
// 2010.06.17 04:24:04.107 1.5599954
// 2010.06.17 07:04:39.818 1.564712
// 2010.06.17 17:48:14.832 1.5824289
// 2010.06.17 18:07:24.626 1.582003
//
// Example:
//
// geometricBrownianMotionTimeSeries[1.5490; 0.003; 0.15; 10; 2010.06.16T00:00:00.000; 2011.06.18T00:00:00.000]
// geometricBrownianMotionTimeSeries[1.5490; 0.003; 0.0003; 10; 2010.06.16T00:00:00.000; 2011.06.18T00:00:00.000]
//
geometricBrownianMotionTimeSeries : { [initialValue; drift; volatility; ticksPerDay; beginDateTime; endDateTime]
timeSeries : getEmptyTimeSeries[];
dateTimes : exponentiallyDistributedDateTimes[ticksPerDay; beginDateTime; endDateTime];
dataPoints : geometricBrownianMotion[initialValue; drift; volatility; dateTimes];
flip [`dateTime`dataPoint!(dateTimes; dataPoints)]
}
date_range : (2008.01.01 2008.01.02 2008.01.03 2008.01.04 2008.01.05)
geometric_brownian_motion[1.5490; 0.01; 0.12; date_range]
You can save it as random.q and then run
q random.q
You will see something like the following:
1.549 1.560322 1.574135 1.594816 1.622068
Timing commands
Often you need to know how long a particular command takes to run. The trick is to prepend \t to the command, e.g.
\t select from TestTable
Rather than returning the result of the select, this command will come back with the number of milliseconds spent, e.g.
735
for 735 milliseconds.
There is another way, which does not involve explicit calls to system commands.
startTime : .z.T // ... Command to be timed ... endTime : .z.T duration : "t"$(endTime - startTime) 1 "The command took ", string[duration]
This will display something like
The command took 00:00:09.6251
We shall now suggest a more powerful approach to timing. Suppose we have a function that we want to time.
The function may take very little time to run, so we call it runCount time in a loop and obtain an average run duration by dividing the total time taken for runCount runs by runCount.
In order to obtain more significant results, we shall repeat this procedure trialCount times, noting the average run duration and its standard deviation.
Putting this all together, we have
timeFunction : { [trialCount; runCount; function]
trialDurations : ();
do [ trialCount;
startTime : .z.T;
do [ runCount;
function[];
];
trialDurations ,: ((.z.T - startTime) % runCount);
];
trialDurationAverage : avg[trialDurations];
trialDurationStandardDeviation : sqrt[var[trialDurations]];
(trialDurationAverage, trialDurationStandardDeviation)
}
Suppose we want to time something like this:
function : { [] (100000?10) * (100000?10) };
We run
trialCount : 10; runCount : 100; timeFunction[trialCount; runCount; function];
The result it, say,
3.563 0.06165225
This means that on average function takes 3.563 milliseconds to run, the standard deviation being 0.06165225 for the trials.
String concatenation
Happens naturally:
"hello", " ", "world"
The result is
"hello world"
Now suppose you want to get "hello123world" from this:
d: 123 "hello", d, "world"
You will see
"h" "e" "l" "l" "o" 123 "w" "o" "r" "l" "d"
This is because a string is really a list of characters. Joining or concatenating two lists of characters (strings), you get another list of characters (another string). Joining or concatenating a list of characters (string) and an integer, you get a list of characters and integers (not a string at all).
Here is one solution:
"hello", string[d], "world"
String concatenation in query results
Suppose you have a table, say SampleTable that looks like this:
date sym time bid ask location market -------------------------------------------------------------- 2010.04.26 FOOBAR 00:00:00.408 0.82587 0.82655 NY ABC 2010.04.26 FOOBAR 00:00:00.412 0.82603 0.8264 NY DEF 2010.04.26 FOOBAR 00:00:03.143 0.82592 0.8266 NY ABC 2010.04.26 FOOBAR 00:00:03.162 0.82608 0.82645 NY DEF 2010.04.26 FOOBAR 00:00:08.754 0.82597 0.82665 NY ABC 2010.04.26 FOOBAR 00:00:08.670 0.82613 0.8265 NY DEF 2010.04.26 FOOBAR 00:00:09.118 0.82599 0.82667 NY ABC 2010.04.26 FOOBAR 00:00:09.124 0.82615 0.82652 NY DEF 2010.04.26 FOOBAR 00:00:09.158 0.8262 0.82657 NY DEF 2010.04.26 FOOBAR 00:00:09.175 0.82604 0.82672 NY ABC
Suppose that you would like to select location and market as one field, contributor. Here is how you can achieve this:
select contributor: "_" sv/: flip string(location;market) from SampleTable
The results look as follows:
contributor ----------- "NY_ABC" "NY_DEF" "NY_ABC" "NY_DEF" "NY_ABC" "NY_DEF" "NY_ABC" "NY_DEF" "NY_DEF" "NY_ABC"
We would like to thank Attila Vrabecz for this tip.
Display more results on the screen
Suppose that you issue a query
cols SomeTable
You may see something like
`one`two`three`four`five`six`seven`eight`nine`ten`eleven`twelve`thirteen`fift..
Of course, instead of `one, `two, `three, ... you will see some column names. But the point is that the result is cut short by the "..".
Perhaps you want to see the rest of the column names. What can you do?
Try this:
\c
You will see something like
25 80
This means that the output is trimmed to 25 rows, 80 columns. You can override this:
\c 25 150
and re-issue the query
cols TimeTable
Now you should see more output.
Listing a table's columns
cols MarketDepth
where MarketDepth is the table name.
Connecting to a remote Kdb+ server
Suppose we are connecting to host myhost01, port 8011. Run the following in q:
h : hopen `:myhost01:8011
If you are connecting to localhost, you could use
h : hopen `::8011
You can now execute queries on h like this:
h "select distinct sym from MarketDepth where date=2006.06.01"
You should see some output:
sym ------ EURUSD USDJPY ..
Once you are done, you can close the handle:
hclose h
Loading data from a remote server and saving it locally
Start q.
Check the current directory:
\cd
You may wish to change it to a pre-created output directory:
\cd /home/pbilokon/tmp
First define the date range we shall be working on. Say, it's 2008.06.01 to 2008.08.31 inclusive:
firstdate: 2008.06.01 lastdate: 2008.08.31 daterange: firstdate + til (lastdate - firstdate) + 1
Open a handle:
h: hopen`:myhost01:8012
We shall read and write the data day by day. Let's define the following two functions:
readdata:{[d] h(raze "`time xasc select from MarketDepth where date=", string d, ", sym in `GBPUSD`EURUSD")}
readandwritedata:{[d] MarketDepth::readdata[d]; .Q.dpft[`:.;d;`sym;`MarketDepth]}
readdata reads one day (specified by d) worth of data. readandwritedata calls readdata and saves the results to disk (local directory as a splayed table).
You can try applying this to a single day:
readandwritedata[2008.06.01]
Or process the entire daterange:
readandwritedata each daterange
Close the handle:
hclose h
Creating and populating a simple keyed table with time series identifiers (sym), dates (date) and values (val)
TestTable : ([sym:`symbol$(); date:`date$()] val:`float$())
You can inspect the resulting empty keyed table by simply issuing
TestTable
It looks like this:
sym date| val --------| ---
We can now upsert our first value:
upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2874)
On issueing
TestTable
you will now see
sym date | val -----------------| ------ EURUSD 2009.02.10| 1.2874
You can also
select from TestTable
and you will see the same
sym date | val -----------------| ------ EURUSD 2009.02.10| 1.2874
Is it possible to upsert several rows at once? Yes, but you should be careful with the syntax. Note that there is no space in "`USDJPY`ARSRUB".
upsert [`TestTable] ([sym: `USDJPY`ARSRUB; date: 2009.02.10 2009.02.11]; val: 1.2874 10.252969)
Saving and loading tables
The simplest thing to do is to save the table to a file:
`:H:/somefile.dat set TestTable
This will save the table TestTable to a binary file named H:/somefile.dat.
To load it into a new table, use
AnotherTestTable : get `:H:/somefile.dat
Truncating a table
Truncating a table means deleting all data from it, which is of course not the same as deleting the table itself.
delete from `TestTable
Dropping a table
Dropping a table amounts to deleting it from the namespace. Assuming that your table is in the root namespace (.), we can delete it as follows:
delete TestTable from `.
Turning a non-keyed table into a keyed table
Suppose that you have created a non-keyed table:
TestTable : ([] sym:`symbol$(); date:`date$(); val:`float$())
And inserted/upserted some data:
upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2874) upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2874) upsert [`TestTable] (`EURUSD; 2009.02.11; 1.2901)
So that you can check by entering
TestTable
that the table looks like this:
sym date val ------------------------ EURUSD 2009.02.10 1.2874 EURUSD 2009.02.10 1.2874 EURUSD 2009.02.11 1.2901
How can you turn it into a keyed table?
By entering
`sym`date xkey `TestTable
Then you can check by entering
TestTable
that the table looks like this:
sym date | val -----------------| ------ EURUSD 2009.02.10| 1.2874 EURUSD 2009.02.10| 1.2874 EURUSD 2009.02.11| 1.2901
Note, however, that even though TestTable is now a keyed table, it contains duplicates, which would have to be removed manually.
Turning a keyed table into a non-keyed table
If TestTable is your keyed table, you can modify it to remove the key as follows:
() xkey `TestTable
Note that the ` symbol tells Kdb+ to modify the existing TestTable.
Alternatively, if you would like to leave TestTable as it is and create a non-keyed version TestTable1 in addition to it, you can use
TestTable1 : () xkey TestTable
Dealing with duplicates
First, let us create a table for our experiments:
TestTable : ([] sym:`symbol$(); date:`date$(); val:`float$()) upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2874) upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2874) upsert [`TestTable] (`USDJPY; 2009.02.11; 1.2901) upsert [`TestTable] (`FOOBAR; 2009.02.10; 2.3576) upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2875) upsert [`TestTable] (`EURUSD; 2009.02.11; 1.3000) upsert [`TestTable] (`USDJPY; 2009.02.11; 1.2902)
Try
show TestTable
to see what it looks like:
sym date val ------------------------ EURUSD 2009.02.10 1.2874 EURUSD 2009.02.10 1.2874 USDJPY 2009.02.11 1.2901 FOOBAR 2009.02.10 2.3576 EURUSD 2009.02.10 1.2875 EURUSD 2009.02.11 1.3 USDJPY 2009.02.11 1.2902
We define duplicate rows as those that have both sym and date values equal.
To count the duplicates, use
select count i by sym, date from TestTable
(Notice that here by sym, date expresses our definition of duplicates.)
Here is the result:
sym date | x -----------------| - EURUSD 2009.02.10| 3 EURUSD 2009.02.11| 1 FOOBAR 2009.02.10| 1 USDJPY 2009.02.11| 2
Well, the rows with x = 1 don't really correspond to duplicates. We can identify duplicates as follows:
select sym, date, x from (select count i by sym, date from TestTable) where x > 1
This will show
sym date x ------------------- EURUSD 2009.02.10 3 USDJPY 2009.02.11 2
How can we remove the duplicates?
select by sym, date from TestTable
will produce
sym date | val -----------------| ------ EURUSD 2009.02.10| 1.2875 EURUSD 2009.02.11| 1.3 FOOBAR 2009.02.10| 2.3576 USDJPY 2009.02.11| 1.2902
which is a keyed table with the duplicates removed (only the last row survives, so (`EURUSD, 2009.02.11) is, deterministically, 1.3, not 1.2874).
We can save this result as, for example, TestTableWithoutDuplicates:
TestTableWithoutDuplicates : select by sym, date from TestTable
If we don't want a keyed table, we can remove the key:
() xkey `TestTableWithoutDuplicates
show TestTableWithoutDuplicates
now gives
sym date val ------------------------ EURUSD 2009.02.10 1.2875 EURUSD 2009.02.11 1.3 FOOBAR 2009.02.10 2.3576 USDJPY 2009.02.11 1.2901
Suppose we want to be a little bit more careful and work with indices of the duplicates. Moreover, suppose we wish to remove the duplicates from the original TestTable instead of creating a new TestTableWithoutDuplicates. We can proceed as follows.
select from (select duplicateIndices : i, lastDuplicateIndex : last i, duplicateCount : count i by sym, date from TestTable) where duplicateCount > 1
This will produce
sym date | duplicateIndices lastDuplicateIndex duplicateCount -----------------| -------------------------------------------------- EURUSD 2009.02.10| 0 1 4 4 3 USDJPY 2009.02.11| 2 6 6 2
Notice that in the duplicateIndices column we have lists, not scalars. Of course, we can do this in k.
Now let's modify this slightly. We don't want to have the last index (which appears in lastDuplicateIndex anyway) under duplicateIndices. That's because we want to keep it when we finally come to removing duplicates.
We observe that for any list l
(count l - 1)#l
will return a copy of the list with the last element removed. Thus we can modify our query as follows:
select from (select duplicateIndices : (count[i] - 1)#i, lastDuplicateIndex : last i, duplicateCount : count i by sym, date from TestTable) where duplicateCount > 1
The result is now
sym date | duplicateIndices lastDuplicateIndex duplicateCount -----------------| -------------------------------------------------- EURUSD 2009.02.10| 0 1 4 3 USDJPY 2009.02.11| ,2 6 2
just as required.
We want to remove the duplicates. We don't want the entire table above, so let's modify the query again:
select duplicateIndices from (select duplicateIndices : (count[i] - 1)#i, lastDuplicateIndex : last i, duplicateCount : count i by sym, date from TestTable) where duplicateCount > 1
This gives us a table of lists,
duplicateIndices ---------------- 0 1 ,2
But we don't need a table of lists. A list of lists will do, and we can obtain it simply by replacing select with exec:
indicesToRemove : (exec duplicateIndices from (select duplicateIndices : (count[i] - 1)#i, lastDuplicateIndex : last i, duplicateCount : count i by sym, date from TestTable) where duplicateCount > 1)
Now indicesToRemove is a list of lists:
q)indicesToRemove 0 1 ,2
But we need a simple list of indices, (0 1 2). How can we obtain it?
q)() ,/ indicesToRemove 0 1 2
What exactly did we do? The over adverb (/) modifies a base dyadic function (in our case join (,)) so that the items of the second argument are applied iteratively to the first argument. (Note: There cannot be any whitespace between / and the function it modifies.)
Thus we effectively compute
((),(0 1)),(2)
There is an alternative:
q)raze indicesToRemove 0 1 2
We shall overwrite indicesToRemove with this more convenient form:
q)indicesToRemove : raze indicesToRemove q)indicesToRemove 0 1 2
Now we are ready to remove the duplicates from TestTable:
q)delete from `TestTable where i in indicesToRemove `TestTable
(Note: Don't forget the "`" in delete from `TestTable or else TestTable won't be modified, a new table will be constructed.)
Done:
q)show TestTable sym date val ------------------------ FOOBAR 2009.02.10 2.3576 EURUSD 2009.02.10 1.2875 EURUSD 2009.02.11 1.3 USDJPY 2009.02.11 1.2902
Creating queries dynamically
You can construct your query by concatenating strings and then execute it. Here is an example. Let's define our usual
TestTable : ([] sym:`symbol$(); date:`date$(); val:`float$())
We shall now create a select query where we can vary the column to be selected. Let's set
columnToBeSelected : "sym"
and evaluate
value("select ", columnToBeSelected, " from TestTable")
This selects the (currently empty) sym column:
sym ---
What if we set columnToBeSelected to something else?
columnToBeSelected : "date"
value("select ", columnToBeSelected, " from TestTable")
gives
date ----
This is not quite satisfactory yet. We would like to abstract away the process of query creation and execution. For this purpose we define a function which takes two parameters, columnName and tableName:
selectColumn : {[columnName; tableName] value("select ", columnName, " from ", tableName) }
We can now execute
selectColumn["sym"; "TestTable"]
to select column sym from TestTable. We'll see
sym ---
Similarly we can run
selectColumn["date"; "TestTable"]
Et cetera.
Accessing command line arguments from scripts
Suppose we create a script called test.q which we shall call as
C:\q\q test.q Paul Bilokon 1982
We would like to access the command line arguments ("Paul", "Bilokon" and 1982) from the script.
Here's what we could put in test.q:
name : .z.x 0; surname : .z.x 1; dob : .z.x 2; name surname "I"$dob
This will simply display our command line arguments:
"Paul" "Bilokon" 1982
Note that "I"$dob is the syntax for parsing the dob string to an integer.
Transferring a table from a remote server to a local server
Connect to a remote server from the local server:
h:hopen`:someremotehost.com:19111
If you have to specify a username and password, use this instead:
h:hopen (`:someremotehost.com:19111:mylogin:mypassword;10000)
where 10000 stands for 10 second timeout.
Assuming that the remote table is named TestTable and you want to copy it to a local table TestTableCopy, you can then run
TestTableCopy : h "TestTable"
locally. If you only want the first 100000 rows of TestTable in TestTableCopy, you can do
TestTableCopy : h "100000#TestTable"
Oh, and don't forget to
hclose h
Converting a string to a symbol
b : "foo" `$b
This gives
`foo
Appending to a list of strings
We have a list of strings:
L : ("foo"; "bar"; "baz")
How do we append another string, say "abc", to it?
L ,: "abc"
This gives
"foo" "bar" "baz" "abc"
Logging and debug output
It is often very useful to see the intermediate results of a function to get a good idea of what it does. Let us write a simple recursive factorial function to illustrate this:
factorial : { [n]
if [n = 0;
result : 1;
];
if [n > 0;
result : n * factorial[n - 1];
];
result
}
We can convince ourselves that the function is working as required:
q)factorial[0] 1 q)factorial[1] 1 q)factorial[2] 2 q)factorial[3] 6 q)factorial[4] 24 q)factorial[5] 120
Now, let's change this function to display the intermediate results:
factorial : { [n]
if [n = 0;
result : 1;
];
if [n > 0;
result : n * factorial[n - 1];
];
0N!result;
result
}
Let's test this:
q)a : factorial[8] 1 1 2 6 24 120 720 5040 40320 q)a 40320 q)
The line that does the logging is
0N!result;
This sends the value of result to STDOUT.
We could apply this to strings:
0N!"hello world"
or more complex objects, such as lists:
0N!(1;2;"foo")
Returning tables from functions
We want to create, populate, and return a table from a function. Here is our first attempt (broken!):
fooBroken : { []
t : ([] sym:`symbol$(); date:`date$(); val:`float$());
upsert [`t] (`yo; 2009.01.01; 3.30);
t
}
When we try calling fooBroken we get this:
q)fooBroken[] sym date val ------------
Somehow our upsert didn't work.
The reason why it didn't work is this. You cannot refer to locals by name in K, so `t in this upsert wouldn't work.
What did it do? Well, it referred to a global variable t, whereas the local one, the one we returned, remained unchanged. The upsert worked on that global variable, which you can inspect as follows:
q)t `yo 2009.01.01 3.3
This t was created in the default namespace as a side effect:
q)\v ,`t
This is not quite what we wanted. There are several options. One is to use global variables:
fooUsingGlobals : { []
`t set ([] sym:`symbol$(); date:`date$(); val:`float$());
upsert[`t] (`yo; 2009.01.01; 3.30);
:t
}
This does work:
q)fooUsingGlobals[] sym date val ------------------ yo 2009.01.01 3.3
But it also has a side effect (it creates a global t):
q)\v ,`t
There is another solution. We cannot refer to a local by name, but we can assign to it:
fooUsingLocals : { []
t : ([] sym:`symbol$(); date:`date$(); val:`float$());
t:upsert [t] (`yo; 2009.01.01; 3.30);
:t
}
This works:
q)fooUsingLocals[] sym date val ------------------ yo 2009.01.01 3.3
without the aforementioned side effect:
q)\v `symbol$()
Finally, we don't have to re-assign the local t. We could change fooUsingLocals (let's call the new version fooUsingLocals1) to use ,: — `t doesn't work on locals, whereas ,: does:
fooUsingLocals1 : { []
t : ([] sym:`symbol$(); date:`date$(); val:`float$());
t ,: (`yo; 2009.01.01; 3.30);
:t
}
How to capture the result of the system commands \t, \v, etc., into a variable
\t, \v, etc., are semantically "special".
For example, this won't work:
timeTaken : \t select Foo from SomeTable
There is a workaround: use value:
timeTaken : value "\\t select Foo from SomeTable"
Notice that there are two slashes above, not one.
timeTaken is now set to the number of milliseconds it has taken to execute select Foo from SomeTable, as required.
Loops with indices
You may miss the conventional
for (int i = 0; i < 5; ++i) { // Iterated action involving i cout << i << endl; }
in Q. One way to achieve the above is
i : 0;
do [5;
// Iterated action involving i
0N!i;
i : i + 1
];
The code in both language will result in
0 1 2 3 4
on the display.
Passing a function as an argument to another function
First, let us define our function:
myFunctor : { []
2 * x
}
Next, we shall define another function, which takes a function as its argument and applies it to 5:
myEvaluator : { [func]
func[5]
}
The result of
myEvaluator[myFunctor]
is
10
as expected.
Projections of a function of one argument; deferred function execution
Suppose that you have a function of a single argument:
someFunction : { [n]
(n ? 10) * (n ? 10)
}
It may make sense to project it on a given argument. Even though the resulting function will have zero arguments, and the values of all the arguments will be known, you may still want to use it as a functor, you may want to defer its execution.
Suppose that we want to time the execution of someFunction using our timing routine:
timeFunction : { [trialCount; runCount; function]
trialDurations : ();
do [ trialCount;
startTime : .z.T;
do [ runCount;
function[];
];
trialDurations ,: ((.z.T - startTime) % runCount);
];
trialDurationAverage : avg[trialDurations];
trialDurationStandardDeviation : sqrt[var[trialDurations]];
(trialDurationAverage, trialDurationStandardDeviation)
}
Notice that timeFunction expects its argument function to take zero arguments, as it executes function[].
How can we use timeFunction to time the execution of someFunction?
Well, if someFunction took at least two arguments, we could project:
q)someFunction : { [n; dummy] (n ? 10) * (n ? 10) }
q)someFunction[10000]
{[n; dummy] (n ? 10) * (n ? 10) }[10000]
This is one solution, then: amend someFunction to take a second dummy argument.
We can then pass it to timeFunction as follows:
q)timeFunction[10; 100; someFunction[10000]] 0.344 0.06069596
But what if we don't want to (or can't) amend someFunction? There is another solution. Keep someFunction as it was,
someFunction : { [n]
(n ? 10) * (n ? 10)
}
but introduce the following helper:
project1 : { [function; x; dummy]
function[x]
}
Now we obtain the following projection:
q)project1[someFunction, 10000]
{[function; x; dummy]
function[x]
}[({[n] (n ? 10) * (n ? 10) };10000)]
Which we can pass to timeFunction:
q)timeFunction[10; 100; project1[someFunction; 10000]] 0.343 0.06116371
Sampling with and without replacement
This is straightforward. If we want 5 random indices from 0, 1, 2, 3, ..., 10, with replacement, we do this:
q)5?10 2 2 8 1 8
Notice the duplicates. If, on the other hand, we would like to sample without replacement, we do this:
q)-5?10 1 3 6 4 7
Now there are no duplicates. Notice the minus sign above.
The results of the sampling come out of order. We can easily sort them by applying asc:
q)asc[-5?10] `s#3 4 6 7 8
Notice that the attribute s# has been applied, which indicates that the list is sorted.
We could put these random indices to good use. For example, we could use them to index another list:
q)foo : (100 200 300 400 500 600 700 800 900 1000) q)foo[asc[-5?10]] 100 400 800 900 1000 q)foo[asc[-2?10]] : (123 456) q)foo 123 200 300 400 500 456 700 800 900 1000
Unary -
In K, the arithmetic operators are always dyadic, so - does not act as a unary function:
q)foo : 5 q)-foo '-
You should use neg:
q)neg foo -5 q)foo 5
(neg does not change foo itself.)
However, -5 is fine as a literal:
q)foo : -5 q)foo -5
Setting multiple elements of a list to the same value
q)l : (100 200 300 400 500 600 700 800 900 1000) q)l 100 8 8 400 500 600 700 800 900 1000 q)l[1 2] : 5 q)l 100 5 5 400 500 600 700 800 900 1000 q)l[(4 + (til 3)), (8 9)] : 8 q)l 100 5 5 400 8 8 8 800 8 8
Pre-initialising lists
If you know the size of your list you can preinitialise it like so:
q)l : 10#0N q)l 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N
We have created a list of 10 elements, each set to 0N (the null integer).
We could have set them to 1, for example:
q)l : 10#1j q)l 1 1 1 1 1 1 1 1 1 1j
Null
There is a "null" in K/Q. It is 0N:
q)foo : 0N q)foo 0N q)foo * 5 0N
Copy or reference?
q)foo : (1 2 3) q)bar : foo q)bar[0] : 357 q)foo 1 2 3 q)bar 357 2 3
bar changed, foo didn't.
Let's investigate further.
q)foo : (1 2 3)
q)changeFoo : { [someFoo] someFoo[0] : 357 }
q)changeFoo[foo]
q)foo
1 2 3
So foo didn't change.
Telling Q to exit when the script execution is over
If you load a script
q myscript.q
q will not exit by default when it's done. It will hang in there.
The trick is to add the exit command
\\
to the end of your script.
This is very useful if you are chaining scripts, etc.
Applying a function to each element of a list
Suppose that we have
q)l : (1*1; 2*2; 3*3; 4*4; 5*5; 6*6; 7*7; 8*8)
In other words,
q)l 1 4 9 16 25 36 49 64
What if we want to take the square root of each element?
q)sqrt l 1 2 3 4 5 6 7 8f
This works. We gave sqrt a list to work on, and the result is a list of square roots.
Can we take type of each element in a similar manner?
q)type l 6h
No, we got the type of l itself. Not of its element.
We can use each:
q)each[type] l -6 -6 -6 -6 -6 -6 -6 -6h
That worked.
Of course, we can also use
q)each[sqrt] l 1 2 3 4 5 6 7 8f
to take the square root of each element. This is more explicit and leads to fewer surprises.
Exploring \v
\v will give us all the variables in the default namespace:
q)\v `ProfilingResults`a`gapProbability`l`lastDate`names`nextStandardGaussianRando..
Some of these may be symbols, others integers, still others functions... How can we tell them apart?
We need to use type. But first we need to capture \v into something we can work with — a list — using value:
q)v : value "\\v" q)v `ProfilingResults`a`gapProbability`l`lastDate`names`nextStandardGaussianRando..
We can now access all of these...
q)v[0] `ProfilingResults
But what are they, what are their types?
q)type v[0] -11h
Symbols. Well, ProfilingResults is actually a table; `ProfilingResults is a symbol.
How can we get at the type of ProfilingResults?
Use value again:
q)type[value[v[0]]] 98h
And if we want the types of everything in v:
q)each[type] each[value] v 98 -11 -9 6 -14 11 -9 -6 -6 11 -6 -6 -6 99h
Displaying a table's schema
First let us create and populate a table:
q)SomeTable : ([] foo:`float$(); bar:`float$()) q)SomeTable ,: (0.3 0.5)
How can we figure out what SomeTable's schema is given SomeTable?
q)0N!0#SomeTable; +`foo`bar!(`float$();`float$())
0#t gives you the schema. The above will print it in k format so you can see all the types.
Working with lists
Let's create an example list. We are going to play with it now.
q)l : (10 20 30 40 50)
How many elements does it have?
q)count l 5
What is the first element?
q)first l 10
What is the last element?
q)last l 50
What is the first element again? (Notice that the following is very similar to first l only a list of a single element is returned, not the actual element.)
q)1#l ,10
What is the last element again? (Notice that the following is very similar to last l only a list of a single element is returned, not the actual element.)
q)-1#l ,50
What are the first two elements?
q)2#l 10 20
What are the first three elements?
q)3#l 10 20 30
What are the last two elements?
q)-2#l 40 50
What are the last three elements?
q)-3#l 30 40 50
A list of all l's elements, excluding the last one:
q)(count[l] - 1)#l 10 20 30 40
A list of all l's elements, excluding the last two:
q)(count[l] - 2)#l 10 20 30
A list made up of l's first, second, and fourth elements:
q)l[0 1 3] 10 20 40
Or
q)indexList : (0 1 3) q)l[indexList] 10 20 40
Let's create a copy of the list but exclude all occurrences of 20:
q)l except 20 10 30 40 50
Or perhaps we want to exclude all occurrences of 20 and 30:
q)l except 20 30 10 40 50
What is the sum of the elements? There are several ways to compute this:
q)sum l 150
Or
q)0 +/ l 150
Here the over adverb (/) modifies a base dyadic function plus (+) so that the items of the second argument are applied iteratively to the first argument. Thus we are computing
((((0 + 10) + 20) + 30) + 40) + 50
Along the same lines we can compute the product of the elements:
q)1 */ l 12000000
which amounts to
((((1 * 10) * 20) * 30) * 40) * 50
Say we want to add two lists elementwise:
q)l1 : (50 40 30 20 10) q)l + l1 60 60 60 60 60
Similarly we can multiply two lists elementwise:
q)l * l1 500 800 900 800 500
Subtract:
q)l - l1 -40 -20 0 20 40
And divide:
q)l % l1 0.2 0.5 1 2 5
What if we want to obtain a list of all elements of l negated?
q)each[neg] l -10 -20 -30 -40 -50
Or perhaps we want a list of square roots of l's elements:
q)each[sqrt] l 3.162278 4.472136 5.477226 6.324555 7.071068
Sorting tables
Consider the following
q)TestTable : ([] foo:`symbol$(); bar:`symbol$()) q)insert [`TestTable] (`k; `a) q)insert [`TestTable] (`k; `z) q)insert [`TestTable] (`k; `l) q)insert [`TestTable] (`a; `a) q)insert [`TestTable] (`a; `b) q)insert [`TestTable] (`a; `a) q)show TestTable
So we have
foo bar ------- k a k z k l a a a b a a
If we want to sort this table first by bar, then by foo, i.e. the table will be sorted by foo and within each foo value bar will be sorted, we can use
q)SortedTestTable : `foo`bar xasc TestTable q)show SortedTestTable
This will produce
foo bar ------- a a a a a b k a k l k z
But the original TestTable hasn't changed:
q)show TestTable foo bar ------- k a k z k l a a a b a a
In order to sort it in place, use
q)`foo`bar xasc `TestTable `TestTable q)show TestTable foo bar ------- a a a a a b k a k l k z
Of course, the order `foo`bar versus `bar`foo matters:
q)`bar`foo xasc `TestTable `TestTable q)show TestTable foo bar ------- a a a a k a a b k l k z
Types of list
Let's declare some lists:
q)K : () q)L : `int$() q)M : `symbol$() q)N : `char$()
And have a look at them:
q)K q)L `int$() q)M `symbol$() q)N ""
And their types:
q)type K 0h q)type L 6h q)type M 11h q)type N 10h
K is special in the sense that its type hasn't been pre-declared. Currently its type is 0h. It can become a list of any type. That will happen when the first element is added:
q)K ,: 357 q)K ,357 q)type K 6h
Now the type of K is 6h (same as that of L). It is now a list of integers (L has been a list of integers from the start). We can't add anything but integers to K (from now on):
q)K ,: `foo 'type
It's like trying to add a non-integer to L or a non-symbol to M, etc. We get the 'type error:
q)L ,: `foo 'type q)M ,: 357 'type
But of course we could have declare K as we did and then add a symbol to it:
q)K : () q)K ,: `foo q)type K 11h q)K ,`foo q)K ,: 357 'type
In this case K becomes a list of symbols (on adding `foo, the first element) and we can't add anything but symbols to it.
Byte representation
It is possible to obtain the byte representation of K data structures:
q)a : 1 2 3 4 5 q)-8!a 0x01000000220000000600050000000100000002000000030000000400000005000000
Thus the variable a is 34 bytes of data (byte 1: 0x01, byte 2: 0x00, byte 3: 0x00, byte 4: 0x00, byte 5: 0x22, ...).
In this example a is a list, but -8!... works for many other things:
q)TestTable : ([sym:`symbol$(); date:`date$()] val:`float$()) q)upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2874) `TestTable q)-8!TestTable 0x0100000059000000636200630b000200000073796d0064617465000000020000000b0001000..
These are the actual bytes that would be piped through a channel if you were to get the data from Kdb+ remotely.
However, this does not necessarily represent the way these data structure are stored in memory by Kdb.
Determining how much memory a given data structure occupies
We have already discussed byte representations, which you can obtain by using -8!.... They also provide a way of obtaining the size of a data structure in memory:
q)TestTable : ([sym:`symbol$(); date:`date$()] val:`float$()) q)upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2874) `TestTable q)(count -8!TestTable) % 1.5 59.33333
We are dividing the result of count -8!TestTable by 1.5 as the result is a better estimate of the number of bytes used by Kdb+ to represent the data table in memory. (Thanks to Manish Patel for this empirical finding.) This is still an estimate.
Another, more exact, way of determining how much memory a data structure occupies is by using
q)\w 118048 67634176 0 0j
We can compare the values before and after:
q)\w 117376 67634176 0 0j q)TestTable : ([sym:`symbol$(); date:`date$()] val:`float$()) q)upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2874) `TestTable q)\w 117984 67634176 0 0j q)upsert [`TestTable] (`USDJPY; 2009.02.10; 1.2874) `TestTable q)\w 117984 67634176 0 0j q)upsert [`TestTable] (`GBPUSD; 2009.02.10; 1.2874) `TestTable q)\w 118048 67634176 0 0j
This value is exact. We can see that a table with a single entry has resulted in 608 bytes of memory being occupied, which is significantly different from our estimate above. This is probably due to the meta-data and key, which occupy a lot of memory in relation to the table's size.
Joining strings with a given separator
Suppose you have a list of strings:
q)l : ("foo";"bar";"baz")
And you want to turn it into a comma-separated string:
q)"," sv l "foo,bar,baz"
sv is a built-in function; the name stands for "string from vector".
Profiler
nop : { []
}
one : { []
1
}
double : { [x]
2 * x
}
add : { [x; y]
x + y
}
// \d .profiler
preCall : { [functionName]
ProfilingResults ,: (`CALL, functionName, .z.T);
}
postCall : { [functionName]
ProfilingResults ,: (`CALL, functionName, .z.T);
}
getFunctionParameters : { [functionName]
value[value[functionName]][1]
}
getFunctionParametersAsString : { [functionName]
";" sv string[getFunctionParameters[functionName]]
}
wrapFunction : { [functionName]
functionNameString : string[functionName];
wrappedFunctionNameString : functionNameString, "_profiled";
parametersString : getFunctionParametersAsString[functionName];
renameFunctionCode : wrappedFunctionNameString, " : ", functionNameString;
value renameFunctionCode;
preCallProfilingCode : "preCall[`", functionNameString, "]";
postCallProfilingCode : "postCall[`", functionNameString, "]";
wrapFunctionCode : functionNameString, " : { [", parametersString, "] ", preCallProfilingCode, "; result : ", wrappedFunctionNameString, "[", parametersString, "]; ", postCallProfilingCode, "; result }";
value wrapFunctionCode;
wrapFunctionCode;
}
// \d .
// Retrieve the function names from the global namespace
functionNames : value "\\f"
ProfilingResults : ([] action:`symbol$(); functionName:`symbol$(); time:`time$() )
wrapFunction each (`one`double`add`nop);
Sorting one list by the corresponding elements of another list
Suppose that we have two lists:
L1 : 0.45 0.34 0.12 0.43 0.32 L2 : 3 3 2 1 8
We would like to sort L1 by the corresponding elements of L2. In other words, we want to get the result
0.43 0.12 0.45 0.34 0.32
because the corresponding elements of L2 would then appear in ascending order:
1 2 3 3 8
iasc achieves precisely that:
q)L1@iasc L2 0.43 0.12 0.45 0.34 0.32
How does this work? iasc sorts L2 but returns the original indices of its elements instead of the elements themselves:
q)iasc L2 3 2 0 1 4
Then @ produces the elements of L1 at these indices:
q)L1@(3 2 0 1 4) 0.43 0.12 0.45 0.34 0.32
Zeroing out the seconds and milliseconds in a date
Use
getDateMinute : { [dateTime]
dateTime - ((`time$dateTime) mod 60000)
}
For example:
getDateMinute[2010.04.23T01:02:03.456]
produces
2010.04.23 01:02:00.000
Left join
Suppose we would like to filter one table by another table. This is best illustrated by a toy example:
PeopleTable : ( [] name:`symbol$(); surname:`symbol$() ) PeopleTable ,: (`John; `vonNeumann) PeopleTable ,: (`Stephen; `Hawking) PeopleTable ,: (`Daniel; `Bernoulli)
MonthsOfBirthTable : ( [] name:`symbol$(); surname:`symbol$(); month:`symbol$() ) MonthsOfBirthTable ,: (`John; `Nash; `June) MonthsOfBirthTable ,: (`John; `vonNeumann; `December) MonthsOfBirthTable ,: (`Stephen; `Smale; `July) MonthsOfBirthTable ,: (`Stephen; `Hawking; `January) MonthsOfBirthTable ,: (`Daniel; `Bernoulli; `February) MonthsOfBirthTable ,: (`Jacob; `Bernoulli; `December)
Now we would like to select from MonthsOfBirthTable all the entries which correspond to people in PeopleTable.
The following works:
PeopleTable lj (`name`surname xkey MonthsOfBirthTable)
producing the following result:
name surname month --------------------------- John vonNeumann December Stephen Hawking January Daniel Bernoulli February
Note that we had to create a key for the right-hand table (MonthsOfBirthTable) with xkey. If our MonthsOfBirthTable were a keyed table in the first place, i.e.
MonthsOfBirthTable : ( [name:`symbol$(); surname:`symbol$()] month:`symbol$() )
we could get away with
PeopleTable lj MonthsOfBirthTable
If the right-hand table is in memory, key creation is fairly inexpensive. If it is stored on disk (partitioned and splayed, thus not keyed) then it is wise to avoid referencing the entire table in one go. It is better to partition the join appropriately (e.g. by date, as is usually the case) so each partition is mapped into memory.
What happense if we have multiple rows per person in the right-hand table? Consider the following example. As before,
PeopleTable : ( [] name:`symbol$(); surname:`symbol$() ) PeopleTable ,: (`John; `vonNeumann) PeopleTable ,: (`Stephen; `Hawking) PeopleTable ,: (`Daniel; `Bernoulli)
Now we would like to select all rows corresponding to those in PeopleTable from the following table:
ContributionsTable : ( [] name:`symbol$(); surname:`symbol$(); contribution:`symbol$() ) ContributionsTable ,: (`John; `Nash; `EquilibriumPointsInNPersonGames) ContributionsTable ,: (`John; `Nash; `TwoPersonCooperativeGames) ContributionsTable ,: (`John; `Nash; `RealAlgebraicManifolds) ContributionsTable ,: (`John; `vonNeumann; `GameTheory) ContributionsTable ,: (`John; `vonNeumann; `CellularAutomata) ContributionsTable ,: (`John; `vonNeumann; `UniversalConstructor) ContributionsTable ,: (`Stephen; `Smale; `SphereEversion) ContributionsTable ,: (`Stephen; `Smale; `PoincareConjectureForDimensionsGreaterThanOrEqualToFive) ContributionsTable ,: (`Stephen; `Smale; `SmaleHorseshoe) ContributionsTable ,: (`Stephen; `Hawking; `SingularitiesInGeneralRelativity) ContributionsTable ,: (`Stephen; `Hawking; `HawkingRadiation) ContributionsTable ,: (`Daniel; `Bernoulli; `CompoundMotions) ContributionsTable ,: (`Daniel; `Bernoulli; `Hydrodynamique) ContributionsTable ,: (`Jacob; `Bernoulli; `TranscendentalCurves) ContributionsTable ,: (`Jacob; `Bernoulli; `Isoperimetry) ContributionsTable ,: (`Jacob; `Bernoulli; `SeparableDifferentialEquations)
If we try, as before,
PeopleTable lj (`name`surname xkey ContributionsTable)
But this will only return the first row from ContributionsTable for John von Neumann, Stephen Hawking, and Daniel Bernoulli:
name surname contribution --------------------------------------------------- John vonNeumann GameTheory Stephen Hawking SingularitiesInGeneralRelativity Daniel Bernoulli CompoundMotions
Whereas we want all the rows. One way to achieve this is to xgroup ContributionsTable by name and surname first:
`name`surname xgroup ContributionsTable
name surname | contribution ------------------| -------------------------------------------------------------------------------------- John Nash | `EquilibriumPointsInNPersonGames`TwoPersonCooperativeGames`RealAlgebraicManifolds John vonNeumann| `GameTheory`CellularAutomata`UniversalConstructor Stephen Smale | `SphereEversion`PoincareConjectureForDimensionsGreaterThanOrEqualToFive`SmaleHorseshoe Stephen Hawking | `SingularitiesInGeneralRelativity`HawkingRadiation Daniel Bernoulli | `CompoundMotions`Hydrodynamique Jacob Bernoulli | `TranscendentalCurves`Isoperimetry`SeparableDifferentialEquations
Then do the left join. Then ungroup it back. All in one go:
ungroup (PeopleTable lj (`name`surname xgroup ContributionsTable))
Which gives us
name surname contribution --------------------------------------------------- John vonNeumann GameTheory John vonNeumann CellularAutomata John vonNeumann UniversalConstructor Stephen Hawking SingularitiesInGeneralRelativity Stephen Hawking HawkingRadiation Daniel Bernoulli CompoundMotions Daniel Bernoulli Hydrodynamique
as required.
We would like to thank Daniel Lister for his comments.
Functional select
You can read about the functional select here: https://code.kx.com/trac/wiki/QforMortals2/queries_q_sql#Functionalselect
Here we shall give some examples.
Remember that the functional select is
?[t;c;b;a]
where t is the table, c the contraints, b the by-phrase, and a the aggregations.
Example 0
?[QuotesTable; enlist 1b; 0b; cols[QuotesTable]!cols[QuotesTable]]
or the simpler
?[QuotesTable; enlist 1b; 0b; ()]
is equivalent to
select from QuotesTable
Example 1
select from QuotesTable where sym=`AUDCAD
is equivalent to
?[QuotesTable;(enlist (=; `sym; enlist `AUDCAD)); 0b; ()]
If the type of the column is `symbol$(), then sym=`AUDCAD corresponds to (enlist (=; `sym; enlist `AUDCAD)) above. Otherwise, we would use something like (enlist (=; `bid; 1.55)) (without the second enlist).
Example 2
This is a small variation on the previous example. We still want to
select from QuotesTable where sym=`AUDCAD
but we want to be more flexible. In this case, the table name (`QuotesTable) and "where" column name (`sym) are stored in variables:
tableName : `QuotesTable; columnName : `sym;
and our functional select becomes
?[tableName;(enlist (=; columnName; enlist `AUDCAD)); 0b; ()]
This works, just as before.
Example 3
select from QuotesTable where bid > 1.5
is equivalent to
?[QuotesTable;(enlist (>; `bid; 1.5)); 0b; ()]
Example 4
select from QuotesTable where bid <> 0n
is equivalent to
?[QuotesTable;(enlist (<>; `bid; 0n)); 0b; ()]
Example 5
select bid from QuotesTable where sym=`AUDCAD
is equivalent to
?[QuotesTable;(enlist (=; `sym; enlist `AUDCAD)); 0b; (enlist `bid)!(enlist `bid)]
Comment
One of the most powerful features of this syntax is the ability to dynamically select a column, for example:
col : `bid ?[QuotesTable;(enlist (=; `sym; enlist `AUDCAD)); 0b; (enlist `dataPoint)!(enlist col)]
Example 6
select bid, ask from QuotesTable where sym=`AUDCAD
is equivalent to
?[QuotesTable;(enlist (=; `sym; enlist `AUDCAD)); 0b; `bid`ask!`bid`ask]
Listing variables (including tables) and functions defined on an instance
\v
lists the variables (including tables) in the global namespace (`.).
\f
lists the functions in the global namespace.
Deleting variables (including tables) and functions defined on an instance
To delete a variable, use
delete MyTableName from `.
and to delete a function, use
delete myFunctionName from `.
do the trick. You can check that the variable (or function) was deleted by running \v (or \f).
Concatenating compound columns on selecting from tables
Suppose that we define
ExampleTable1 : ([] sym:`symbol$(); list1:(); list2:()); ExampleTable1 ,: (`foo; (0.12 0.32 0.43); (0.32 0.23 0.43)); ExampleTable1 ,: (`foo; (0.32 0.33 0.54); (0.65 0.23 0.32)); ExampleTable1 ,: (`foo; (0.93 0.92 0.90); (0.90 0.89 0.54));
i.e. we have
sym list1 list2 --------------------------------- foo 0.12 0.32 0.43 0.32 0.23 0.43 foo 0.32 0.33 0.54 0.65 0.23 0.32 foo 0.93 0.92 0.9 0.9 0.89 0.54
and we want to get
sym combinedList --------------------------------- foo 0.12 0.32 0.43 0.32 0.23 0.43 foo 0.32 0.33 0.54 0.65 0.23 0.32 foo 0.93 0.92 0.9 0.9 0.89 0.54
This is achieved as follows:
select sym, combinedList: (list1, 'list2) from ExampleTable1
Applying a function to each date in a given list
It is very often useful to partition queries by dates, especially if this reflects the way in which your tables are stores.
Here is a simple toy example to illustrate the syntax.
displayIncrementedDate : { [message; date; increment]
0N!message, string[date + increment];
1b
}
displayIncrementedDate["Date: ";;3] each (2010.05.11T13:06:00.000 + til 10)
This returns the result, 1111111111b, which is a list of booleans, and also displays
"Date: 2010.05.14T13:06:00.000" "Date: 2010.05.15T13:06:00.000" "Date: 2010.05.16T13:06:00.000" "Date: 2010.05.17T13:06:00.000" "Date: 2010.05.18T13:06:00.000" "Date: 2010.05.19T13:06:00.000" "Date: 2010.05.20T13:06:00.000" "Date: 2010.05.21T13:06:00.000" "Date: 2010.05.22T13:06:00.000" "Date: 2010.05.23T13:06:00.000"
Partitioning a time interval by hour
The following function does the trick:
partitionTimeIntervalByHour : { [beginTime; endTime]
(beginTime + ((til (ceiling[(endTime - beginTime) % 01:00:00.000])) * 01:00:00.000)), endTime
}
For example:
beginTime : 01:00:00.000; endTime : 03:00:00.000; partitionTimeIntervalByHour[beginTime; endTime] // Result: 01:00:00.000 02:00:00.000 03:00:00.000 beginTime : 01:00:00.000; endTime : 01:00:00.000; partitionTimeIntervalByHour[beginTime; endTime] // Result: enlist 01:00:00.000 beginTime : 01:30:55.123; endTime : 05:15:35.012; partitionTimeIntervalByHour[beginTime; endTime] // Result: 01:30:55.123 02:30:55.123 03:30:55.123 04:30:55.123 05:15:35.012
Processing the resulting intervals
This section is a sequel of the previous one. Suppose that we have partitioned our interval.
As before,
beginTime : 01:30:55.123; endTime : 05:15:35.012; times : partitionTimeIntervalByHour[beginTime; endTime];
Thus times has been set to
01:30:55.123 02:30:55.123 03:30:55.123 04:30:55.123 05:15:35.012
Now we want to process the resulting intervals, interval by interval, by applying, say,
processTimeInterval : { [beginTime; endTime]
0N!"from ", string[beginTime], " to ", string[endTime]
}
For example,
processTimeInterval[01:30:55.123;02:30:55.123]
will display the string "from 01:30:55.123 to 02:30:55.123" and also return it. Of course processTimeInterval could have been a lot smarter and more useful but this is not the point. The point is that we would like to apply processTimeInterval to each interval in our partition, i.e. we want "something" that will result in the following function calls:
processTimeInterval[01:30:55.123;02:30:55.123]; processTimeInterval[02:30:55.123;03:30:55.123]; processTimeInterval[03:30:55.123;04:30:55.123]; processTimeInterval[04:30:55.123;05:15:35.012];
Our first step is to produce the time intervals from times:
timeIntervals : -1_2#/:(0_til count times)_\:times;
Thus the variable timeIntervals has been set to:
(01:30:55.123 02:30:55.123;02:30:55.123 03:30:55.123;03:30:55.123 04:30:55.123;04:30:55.123 05:15:35.012)
Our next step is to apply processTimeInterval to each of the resulting time intervals:
results : (processTimeInterval .) each timeIntervals;
which produces the following results:
("from 01:30:55.123 to 02:30:55.123";"from 02:30:55.123 to 03:30:55.123";"from 03:30:55.123 to 04:30:55.123";"from 04:30:55.123 to 05:15:35.012")
We would like to thank Mahesh BP for his advice.
Supplying function arguments in a list
Suppose that you have the following function:
addUp : { [x; y] x + y }
You can, of course, call it as follows:
addUp[3; 5]
The result is 8. But what if you have
L : 3 5
And you want to do
addUp[L[0]; L[1]]
A more elegant way to do this is:
addUp . L
How to produce a string representation of a list of symbols
Suppose that we have
L : `foo`bar`baz
i.e. a list of symbols, and we would like to have
"`foo`bar`baz"
a string. This is particularly useful when one is constructing a query to be sent to a remote server. We can achieve this as follows:
raze "`", ' string L
Checking if a date is a weekend
isWeekend : { [date]
weekOffset : date - `week$date;
(weekOffset = 5) or (weekOffset = 6)
}
Upper join versus asof-join
Suppose that we have the following three tables:
Table1 : ([] dateTime:`date$(); dataPoint1:`float$()); Table1 ,: (2010.05.18; 3.57); Table1 ,: (2010.05.20; 3.58); Table1 ,: (2010.05.21; 3.58); Table1 ,: (2010.05.22; 3.59); Table1 ,: (2010.05.23; 3.60); Table2 : ([] dateTime:`date$(); dataPoint2:`float$()); Table2 ,: (2010.05.19; 1.23); Table2 ,: (2010.05.20; 1.24); Table2 ,: (2010.05.21; 1.25); Table2 ,: (2010.05.24; 1.26); Table2 ,: (2010.05.25; 1.25); Table3 : ([] dateTime:`date$(); dataPoint3:`float$()); Table3 ,: (2010.05.17; 3.23); Table3 ,: (2010.05.18; 3.24); Table3 ,: (2010.05.21; 3.25);
We are interested in aligning the time series. There are several ways to achieve this involving different kinds of joins. We are going to use this example to demonstrate the semantics of these joins.
asof-join
First, the asof-join:
aj[`dateTime;aj[`dateTime;Table1;Table2];Table3]
This produces the following result:
dateTime dataPoint1 dataPoint2 dataPoint3 2010.05.18 3.57 0n 3.24 2010.05.20 3.58 1.24 3.24 2010.05.21 3.58 1.25 3.25 2010.05.22 3.59 1.25 3.25 2010.05.23 3.6 1.25 3.25
For some purposes this might be what we need. Observe that the dateTimes are sorted and they are exactly those of Table1. We have lost some date-times — the ones that are in Table2 or Table3 but not in Table1.
Union join
This brings us to our next item: the union join. Consider this:
(uj/)1!'(Table1;Table2;Table3)
The result is:
dateTime | dataPoint1 dataPoint2 dataPoint3 ----------------------------------------------- 2010.05.18 | 3.57 0n 3.24 2010.05.20 | 3.58 1.24 0n 2010.05.21 | 3.58 1.25 3.25 2010.05.22 | 3.59 0n 0n 2010.05.23 | 3.6 0n 0n 2010.05.19 | 0n 1.23 0n 2010.05.24 | 0n 1.26 0n 2010.05.25 | 0n 1.25 0n 2010.05.17 | 0n 0n 3.23
This is not quite what we wanted. Sure, we have a union of dateTimes from Table1, Table2, and Table3, but they are no longer sorted and they are no longer filled. We need to do some more work:
fills `dateTime xasc (uj/)1!'(Table1;Table2;Table3)
In order to get:
dateTime | dataPoint1 dataPoint2 dataPoint3 ----------------------------------------------- 2010.05.17 | 0n 0n 3.23 2010.05.18 | 3.57 0n 3.24 2010.05.19 | 3.57 1.23 3.24 2010.05.20 | 3.58 1.24 3.24 2010.05.21 | 3.58 1.25 3.25 2010.05.22 | 3.59 1.25 3.25 2010.05.23 | 3.6 1.25 3.25 2010.05.24 | 3.6 1.26 3.25 2010.05.25 | 3.6 1.25 3.25
How efficient is this? Iteratively, we could achieve this in O(N1 + N2 + N3) operations. Looks like this query is suboptimal, as you do it in stages: (1) uj, (2) xasc, (3) fills. We would be interested in hearing a better solution!
Many thanks to Alex Donohue and Attila Vrabecz for their comments.
Multiplying out the columns of a table
Let us again consider the table we have just created:
Table1 : ([] dateTime:`date$(); dataPoint1:`float$()); Table1 ,: (2010.05.18; 3.57); Table1 ,: (2010.05.20; 3.58); Table1 ,: (2010.05.21; 3.58); Table1 ,: (2010.05.22; 3.59); Table1 ,: (2010.05.23; 3.60); Table2 : ([] dateTime:`date$(); dataPoint2:`float$()); Table2 ,: (2010.05.19; 1.23); Table2 ,: (2010.05.20; 1.24); Table2 ,: (2010.05.21; 1.25); Table2 ,: (2010.05.24; 1.26); Table2 ,: (2010.05.25; 1.25); Table3 : ([] dateTime:`date$(); dataPoint3:`float$()); Table3 ,: (2010.05.17; 3.23); Table3 ,: (2010.05.18; 3.24); Table3 ,: (2010.05.21; 3.25); Table : fills `dateTime xasc (uj/)1!'(Table1;Table2;Table3)
i.e.
dateTime | dataPoint1 dataPoint2 dataPoint3 ----------------------------------------------- 2010.05.17 | 0n 0n 3.23 2010.05.18 | 3.57 0n 3.24 2010.05.19 | 3.57 1.23 3.24 2010.05.20 | 3.58 1.24 3.24 2010.05.21 | 3.58 1.25 3.25 2010.05.22 | 3.59 1.25 3.25 2010.05.23 | 3.6 1.25 3.25 2010.05.24 | 3.6 1.26 3.25 2010.05.25 | 3.6 1.25 3.25
Our next goal is to multiply out the columns dataPoint1, dataPoint2, and dataPoint3.
Solution 1
One way to achieve this is:
NonKeyedTable : () xkey Table; Result : NonKeyedTable[`dataPoint1] * NonKeyedTable[`dataPoint2] * NonKeyedTable[`dataPoint3]; ResultTable : flip[`dateTime`dataPoint!(NonKeyedTable[`dateTime]; Result)]
and ResultTable becomes
dateTime dataPoint --------------------- 2010.05.17 0n 2010.05.18 0n 2010.05.19 14.227164 2010.05.20 14.383008 2010.05.21 14.54375 2010.05.22 14.584375 2010.05.23 14.625 2010.05.24 14.742 2010.05.25 14.625
Now suppose that we don't know the names of the columns to be added in advance. We just want to multiply together all the columns save for the first one. Here is one way to do this:
NonKeyedTable : () xkey Table; ColumnsToMultiply : NonKeyedTable[1_(cols NonKeyedTable)]; ResultTable : flip[`dateTime`dataPoint!(NonKeyedTable[`dateTime]; 1*/ColumnsToMultiply)]
The result is exactly the same. This time we didn't have to enumerate the columns explicitly.
Solution 2
This can be achieved a lot more succintly:
select dateTime, prd 1_flip 0!Table from Table
Courtesy of Attila Vrabecz.
Select with filtering
Suppose that you have a table TimeSeriesTable like this:
dateTime dataPoint ---------------------------------- 2010.05.19 00:00:00.061 1.23235 2010.05.19 00:00:00.063 1.23235 2010.05.19 00:00:00.067 1.23236 2010.05.19 00:00:00.087 1.23235 2010.05.19 00:00:00.143 1.23235 2010.05.19 00:00:00.144 1.23235 2010.05.19 00:00:00.147 1.23236 2010.05.19 00:00:00.238 1.23236 2010.05.19 00:00:00.341 1.21446 2010.05.19 00:00:00.345 1.21446 2010.05.19 00:00:00.346 1.21447 2010.05.19 00:00:00.347 1.21448 ...
You want to enable the user to provide a custom filter for selecting data from this table.
You can provide the following function:
getTimeSeries : { [filter]
select from TimeSeriesTable where filter[dateTime; dataPoint]
}
You can then prepare a useful filter:
timeIntervalTimeSeriesFilter : { [beginTime; endTime; dateTime; dataPoint]
(`time$dateTime) within (beginTime; endTime)
}
which the user can specialise and use as follows:
fltr : timeIntervalTimeSeriesFilter[00:30:00.000; 01:00:00.000] getTimeSeries[fltr]
Conjunction and disjunction of filters
Let us prototype this first.
isOdd : { [x] mod[x; 2] }
isSmallPrime : { [x] (x = 2) or (x = 3) or (x = 5) or (x = 7) }
Let's test the following.
a : 2;
As expected,
1b and/ ({ x[a] } each (isOdd; isSmallPrime))
returns 0b because 2 isn't odd (although it is a small prime) and
0b or/ ({ x[a] } each (isOdd; isSmallPrime))
returns 1b because it is a small prime (even though it isn't odd).
Similarly,
a : 3
1b and/ ({ x[a] } each (isOdd; isSmallPrime))
returns true because 3 is both odd and a small prime and
0b or/ ({ x[a] } each (isOdd; isSmallPrime))
returns true for the very same reason.
We are now ready to develop further our previous example and introduce conjunction and disjunction of filters:
andFilter : { [filters; dateTime; dataPoint]
1b and/ ({ [dt; dp; x] x[dt; dp] }[dateTime; dataPoint] each filters)
}
orFilter : { [filters; dateTime; dataPoint]
0b or/ ({ [dt; dp; x] x[dt; dp] }[dateTime; dataPoint] each filters)
}
Now we want the data points that are either in the interval [00:30:00.000; 00:31:00.000] or in the interval [00:01:00.000; 00:02:00.000]. So we construct the following filter:
fltr : andFilter[(timeIntervalTimeSeriesFilter[00:30:00.000; 00:31:00.000]; timeIntervalTimeSeriesFilter[00:01:00.000; 00:02:00.000])]
And we can apply it as before
getTimeSeries[fltr]
Selecting N values preceding or following a given date-time
Suppose you have the following table:
TestTable : ( [] dateTime:`datetime$(); dataPoint:`float$() ); TestTable ,: (2010.05.28T13:32:02.000; 1.32); TestTable ,: (2010.05.28T13:33:01.000; 1.52); TestTable ,: (2010.05.28T13:33:02.000; 1.66); TestTable ,: (2010.05.28T13:36:01.000; 1.73); TestTable ,: (2010.05.28T13:40:02.000; 1.74); TestTable ,: (2010.05.28T13:45:03.000; 1.75);
i.e.
dateTime dataPoint ---------------------------------- 2010.05.28 13:32:02.000 1.32 2010.05.28 13:33:01.000 1.52 2010.05.28 13:33:02.000 1.66 2010.05.28 13:36:01.000 1.73 2010.05.28 13:40:02.000 1.74 2010.05.28 13:45:03.000 1.75
For a given list of datetimes
ListOfDateTimes : 2010.05.28T13:32:00.000 2010.05.28T13:33:00.000 2010.05.28T13:34:00.000 2010.05.28T13:35:00.000 2010.05.28T13:36:00.000 2010.05.28T13:37:00.000 2010.05.28T13:38:00.000;
we would like to select the three (more generally, N) data points that precede each date-time and the three (more generally, N) data points that follow that date-time.
Here is how this can be achieved:
select ListOfDateTimes, dataPoint(dateTime bin ListOfDateTimes)-\:reverse til 3 from TestTable
which gives
ListOfDateTimes x --------------------------------------- 2010.05.28 13:32:00.000 0n 0n 0n 2010.05.28 13:33:00.000 0n 0n 1.32 2010.05.28 13:34:00.000 1.32 1.52 1.66 2010.05.28 13:35:00.000 1.32 1.52 1.66 2010.05.28 13:36:00.000 1.32 1.52 1.66 2010.05.28 13:37:00.000 1.52 1.66 1.73 2010.05.28 13:38:00.000 1.52 1.66 1.73
and
select ListOfDateTimes, dataPoint(dateTime bin ListOfDateTimes)+\:1+til 3 from TestTable
which gives
ListOfDateTimes x --------------------------------------- 2010.05.28 13:32:00.000 1.32 1.52 1.66 2010.05.28 13:33:00.000 1.52 1.66 1.73 2010.05.28 13:34:00.000 1.73 1.74 1.75 2010.05.28 13:35:00.000 1.73 1.74 1.75 2010.05.28 13:36:00.000 1.73 1.74 1.75 2010.05.28 13:37:00.000 1.74 1.75 0n 2010.05.28 13:38:00.000 1.74 1.75 0n
Many thanks to Attila Vrabecz and Ed Bierly.
Functions with more than 8 parameters
There is a restriction in q: a function can have at most eight parameters. This restriction is dictated by the bytecode.
Suppose we try this:
h : { [arg1;arg2;arg3;arg4;arg5;arg6;arg7;arg8;arg9]
"It worked!"
}
Actually, it didn't work, because we get:
An error occurred during execution of the query. The server sent the response: params Studio Hint: Possibly this error refers to too many parameters (8 max)
But there is a way around it: composing the function with plist:
f : (')[ { [args]
count args
};plist]
This is a very trivial function, of course, that simply returns the number of arguments. But it could do something smarter, of course.
It works:
q)f[1] 1 q)f[1;2] 2 q)f[1;2;3;4;5;6;7;8;9;10] 10Now we can redefine
has something like this:
h : (')[ { [args]
arg1 : args[0];
arg2 : args[1];
arg3 : args[2];
arg4 : args[3];
arg5 : args[4];
arg6 : args[5];
arg7 : args[6];
arg8 : args[7];
arg9 : args[8];
// Do stuff
};plist]
Many thanks to Ben Delo.
Interrupting a query without killing the q instance
Suppose you have a long-running query that you want to interrupt without killing the server. This can be done by issuing a SIGINT:
kill -2 12345
where 12345 is the relevant process ID which you can find out using ps or top. top very often comes in useful here as chances are the q instance in question will be maxing out on its CPU usage, so will appear right at the top of top.
You may have to wait for a minute or so until the signal is picked up. If several queries are queuing, you may have to repeat the command several times.
This requires that you have shell access to the relevant q server, which may not be the case. If indeed this is not the case, you may want to ask your system administrator to provide some infrastructure that would automate this task and enable you to access that infrastructure.
Adding a column to a table
Suppose you have your new column data in a list, myColumnData. The following will add it to the end of myTable:
myNewTable : update myColumn:myColumnData from myTable
To the beginning:
myNewTable : `myColumn xcols update myColumn:myColumnData from myTable
After N columns:
myNewTable : ((N sublist cols myTable), `myColumn) xcols update myColumn:myColumnData from myTable
Many thanks to Charlie Skelton for this contribution.
Applying a function to each value in a list
multiplyByFactor : { [factor; val] factor * val }
f : multiplyByFactor[3f];
f':[(1f; 2f; 3f; 4f; 5f)]
The result is:
3 6 9 12 15f
Updating tables
For example,
TestTable : ([] sym:`symbol$(); date:`date$(); val:`float$()); upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2874); upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2875); upsert [`TestTable] (`EURUSD; 2009.02.11; 1.2901);
creates
sym date val -------------------------- EURUSD 2009.02.10 1.2874 EURUSD 2009.02.10 1.2875 EURUSD 2009.02.11 1.2901
Now we can scale the val by 10 for date = 2009.02.10:
update val:val*10 from `TestTable where date = 2009.02.10
Since in the above update we used `TestTable rather than just TestTable, the table got modified in place rather than copied, and we can check that
TestTable
is now
sym date val -------------------------- EURUSD 2009.02.10 12.874 EURUSD 2009.02.10 12.875 EURUSD 2009.02.11 1.2901
Signalling errors and protected evaluation
It is straightforward to signal an error. Here is an example of a function that will always signal '`ouch:
dodgyFunction : { []
'`ouch
}
Indeed, when we try calling it from Studio,
dodgyFunction[]
we get
An error occurred during execution of the query. The server sent the response: ouch
There is an equivalent to try-catch in Q. It is called protected evaluation. We shall illustrate it by examples.
niceFunction : { []
123
}
dodgyFunction : { []
'`ouch
}
niceFunction[]
// Result: 123
dodgyFunction[]
// Result: ouch signal
niceResult : @[niceFunction; 0n; "failure"];
// Result: niceResult = 123
dodgyResult : @[dodgyFunction; 0n; "failure"];
// Result: dodgyResult = "failure"
For functions taking a single parameter (i.e. monadic functions) the syntax is identical:
monadicNiceFunction : { [x]
3 * x
}
monadicDodgyFunction : { [x]
'`ouch
}
monadicNiceFunction[5]
// Result: 15
monadicDodgyFunction[5]
// Result: ouch signal
niceResult : @[monadicNiceFunction; 5; "failure"];
// Result: niceResult = 15
dodgyResult : @[monadicDodgyFunction; 5; "failure"];
// Result: dodgyResult = "failure"
But for multivalent functions (diadic, triadic, etc., i.e. taking more than a single parameter), we need to use a slightly different syntax for protected evaluation, passing arguments in a list:
diadicNiceFunction : { [x; y]
x + y
}
diadicDodgyFunction : { [x; y]
'`ouch
}
diadicNiceFunction[3; 5]
// Result: 8
diadicDodgyFunction[3; 5]
// Result: ouch signal
niceResult : .[diadicNiceFunction; (3; 5); "failure"];
// Result: niceResult = 8
dodgyResult : .[diadicDodgyFunction; (3; 5); "failure"];
// Result: dodgyResult = "failure"
Setting multiple elements of a list in one go
Suppose we have
zzz : 10#0n
i.e.
0n 0n 0n 0n 0n 0n 0n 0n 0n 0n
We can set the first five elements to 1.0 as follows:
zzz[til[5]] : 5#1f
We can also set zzz[6] to 2.0 and zzz[8] to 3.0 in one go:
zzz[(6 8)] : (2 3f)
The end result is
1 1 1 1 1 0n 2 0n 3 0n
The syntax for specifying full path names in `:
It is
`:C:/my/path/and/file/name.csv
on Windows and
`:/my/path/and/file/name.csv
on *nix.
If your paths contain any non-kosher characters, such as dashes or spaces, you may wish to do this:
`":C:/my/path/and-file/name.csv"
However, when you come to run
`":C:/my/path/and-file/name.csv" set MyTable
you may see a "type" error. This is because of the order of evaluation (from right to left). You can correct it by bracketing:
(`":C:/my/path/and-file/name.csv") set MyTable
Unexplained examples
{x|y} . (1010110b; 0100010b)
/ 1110110b
(or) . (1010110b; 0100010b)
/ 1110110b
/ NOT
/ or . (1010110b; 0100010b)
/ <tt>(or)</tt> is a noun whereas <tt>or</tt> is a verb.
(or) . each[("aaa"; "bbb"; "abc"; "foo"; "bar") like] ("a*"; "b*")
/ 11101b
/ This only works with two arguments because or is diadic
each["pre" ,] ("fix";"view";"lude")
/ ("prefix";"preview";"prelude")
("enjoy";"invest";"postpone") ,\: "ment"
/ ("enjoyment";"investment";"postponement")
00000b(or)/(01000b;01010b;01001b)
/ 01011b
0b(or)/(01000b;01010b;01001b)
/ 01011b
enlist[`ColumnName] _ MyTable
/ ColumnName deleted from MyTable
l: `foo`bar`baz`bar`baz
l _ l?`bar
/ Deletes the first occurrence of bar
9 (1+)\ 2011.01.01
/ 2011.01.01 2011.01.02 2011.01.03 2011.01.04 2011.01.05 2011.01.06 2011.01.07 2011.01.08 2011.01.09 2011.01.10
{ 0N!x } each 9 (1+)\ 2011.01.01
firstDate: 2011.01.01;
lastDate: 2011.01.05;
(lastDate-firstDate) (1+)\ firstDate
`foo in (`foo`bar`baz;`foo;`bar`baz)
// 0b
`foo in '(`foo`bar`baz;`foo;`bar`baz)
// 110b
.z.t <-- UTC (GMT)
.z.T <-- server time
.z.t - .z.T
// Casting the column of a table
TestTable : ([] sym:`symbol$(); date:`date$(); val:`float$());
upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2874);
upsert [`TestTable] (`EURUSD; 2009.02.10; 1.2875);
upsert [`TestTable] (`EURUSD; 2009.02.11; 1.2901);
meta TestTable
// val is of type f
TestTable[`val] : `real$TestTable[`val]
meta TestTable
// val is now of type e
enlist[`minute] _ () xkey (select first date, first sym, first time, first bid, first ask by minute: (date + time.minute) from quotes where date within (firstDate; lastDate), sym=`XYZ)