Querying intervals efficiently in SQL + Python

homeblogmastodonblueskythingiverse



Update 21/2/2009: See my new solution to this problem here.


I have a problem where I need to be able to efficiently[1] find all of the intervals that contain a particular (integer) value. There are data-structures to do this efficiently, for example the interval tree. However, I have quite a lot of intervals, and I don't want to have to code up a complicated disk-based data structure, so I am constrained to use a database.

The solution I came up with is to assign each interval a particular numerical code, such that a query of a small set of such codes will guarantee finding each interval that contains a given value.

I'm not sure if this is novel. It seems a fairly obvious thing to do, so probably not. If not, can anyone tell me what it is called?


The code is this: I produce a trinary number from the binary representation of the lower and upper bounds of the interval. Starting with the most significant digit I map 0->1 and 1->2 until the first digit that varies between the lower and upper bounds. From there on I just produce 0s.

An example may make this clear:

Lower  00010101010
Upper  00010101111

Output 00021212000

To perform a query, I similarly convert the binary query number to trinary. The query set is then this trinary number with each least significant digit converted to zero in turn.

An example:

Query  00010101100

Output 00021212211
       00021212210
       00021212200
       00021212000
       00021210000
       00021200000
       00021000000
       00020000000
       00000000000

Here's the source code:

The source code includes code to test this idea using SQLite.


Update: This is hideous. Sorry. I am working on a much nicer scheme, stay tuned.

[1] Clarification: By efficiently I mean faster than O(n).




[æ]