Thursday, March 12, 2009

Generating a sequence of numbers using plain SQL query

SQL queries are usually used to get information from one or more table(s). But there are times it is useful to simply generate a sequence of numbers. The numbers can be joined with some another table, to create a complete list.

For example, suppose we have a log table named “log” containing only 2 columns: the day of the year starting from 0 and some counted value for that day. The 2 columns are named: “day” and “count”.

For things to be easy, lets assume the days in the log table can be from “0” to “99”.

Note, that there may be days in the log table that are not counted at all. For example, the log may look like:

Day Count
0 345
1 434
3 255
8 345
13 445

And so on…

Now, suppose we need to output a report, showing the entire log table, including days that has no counts in the log. That means, we need somehow to fill the “gaps” in the log table in order to show a full list of days from “0” to “99”.

In order to produce the report, we need to create a table containing a sequence of numbers from “0” to “99” and than join that table with our “log” table.

Generating a sequence of numbers in plain SQL query is not a trivial task, but it can be done, by taking advantage of the SQL UNION command and using the SQL property: Cartesian join also known as Cross join.

The idea is to create 2 tables:

  • One table contains numbers from 0 to 9.
  • Second table contains numbers: 0, 10, 20, … 90.

Then, cross join the tables and sum the join products.

The cross join looks like:

Table 1 Table 2 Sum
0 0 0
1 0 1
9 0 9
0 10 10
1 10 11
9 10 19
0 20 20
1 20 21
9 20 29

And so on until 99…

The SQL query looks like:

select t1.x + t2.x
from
(select 0 as `x` union
select 1 union
select 2 union
select 3 union
select 4 union
select 5 union
select 6 union
select 7 union
select 8 union
select 9) as t1,
(select 0 as `x` union
select 10 union
select 20 union
select 30 union
select 40 union
select 50 union
select 60 union
select 70 union
select 80 union
select 90) as t2
order by 1

Now, in order to show complete list of days and log counts for the log table in the example:
select all_days.day, log.count from
(select t1.x + t2.x as 'day'
from
(select 0 as `x` union
select 1 union
select 2 union
select 3 union
select 4 union
select 5 union
select 6 union
select 7 union
select 8 union
select 9) as t1,
(select 0 as `x` union
select 10 union
select 20 union
select 30 union
select 40 union
select 50 union
select 60 union
select 70 union
select 80 union
select 90) as t2
order by 1) as all_days left join log on all_days.day = log.day
Suppose you want a sequence of numbers that is different from 0 to 99. For example, you want a range of numbers between 450 to 480. You can easily change the query (the first one) to do it, by using simple add operation and using the WHERE clause:
select 450 + x from
(select t1.x + t2.x as `x`
from
(select 0 as `x` union
select 1 union
select 2 union
select 3 union
select 4 union
select 5 union
select 6 union
select 7 union
select 8 union
select 9) as t1,
(select 0 as `x` union
select 10 union
select 20 union
select 30 union
select 40 union
select 50 union
select 60 union
select 70 union
select 80 union
select 90) as t2
order by 1) as `numbers_sequence` where 450 + x <= 480
You can use this trick also if you need to generate a range of dates. All you have to do is to use the specific function of your SQL engine (mySQL, SQLServer etc’) that allows adding number of days to a date.

It is important to mention, that there are more solutions that can achieve the same effect:

  • Add the missing data in a stored procedure code.
  • Add the missing data in a your programming language code (Java, PHP, etc’).

Every method has its own advantages and disadvantages: Stored procedures are not always available, but very efficient and allows you to “speak” directly in the database terminology. Adding the data in your programming language may be a bit more complex. You also don’t always have the privilege of manipulating the data in a programming language. For example, if you are using a generic reporting system, that automatically plots the data of a Resultset. From the other hand, as was said before: you don’t always have the option of using stored procedures. If both: stored procedures and programming language are not available, you can always use the query on this example. It is simple, fast and leaves the solution on the domain of the SQL engine.

No comments:

Post a Comment