This is a discussion on Performance of complex query within the MySQL Database forums, part of the Database Forums category; Hello, I'm fairly new to SQL, and was wondering if anyone could help me with the performance of a ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Hello,
I'm fairly new to SQL, and was wondering if anyone could help me with the performance of a fairly complex query that I'm writing? I have some tables like this (please note, the details here are a simplification of my actual setup in order to make my question clearer, and as such they are written by hand without being checked for errors, so there may be the odd mistake): (Please view in fixed font) as +--------+-------------------+ | int id | varchar(255) name | +--------+-------------------+ bs +--------+----------+-------------------+ | int id | int id_a | varchar(255) name | +--------+----------+-------------------+ cs +--------+----------+-------------------+ | int id | int id_b | varchar(255) name | +--------+----------+-------------------+ ds +--------+----------+-------------+-------------+ | int id | int id_c | datetime t0 | datetime t1 | +--------+----------+-------------+-------------+ Such that there is a kind of tree structure A=>B=>C=>D. I want to create a query that will give me the number of instances of ds, given as.name, bs.name, cs.name, ds.t0, ds.t1, and to have this count spread across t0..t1 over intervals that may be specified in terms of years, months, days, hours, minutes and/or seconds, eg. procedure frequency_distribution(in a_name varchar(255), in b_name varchar(255), in c_name varchar(255), in t0 datetime, in t1 datetime, in year_interval int, in month_interval int, in day_interval int, in hour_interval int, in minute_interval int, in second_interval int) -> [datetime time, int number] eg, you could find the hourly distribution of ds for 3 hours starting from 2007-01-01: call frequency_distribution('a_name','b_name','c_name', '2007-01-01 00:00:00','2007-01-01 00:03:00',0,0,0,1,0,0) -> +---------------------+--------+ | time | number | +---------------------+--------+ | 2007-01-01 00:00:00 | 100 | | 2007-01-01 01:00:00 | 200 | | 2007-01-01 02:00:00 | 300 | +---------------------+--------+ So, I have created a procedure body like so: -- create temporary table t (t datetime, n int); declare d0 datetime; declare d1 datetime; set d0 = t0; while d0 < t1 do set d1 = adddate(d0,interval year_interval year); set d1 = adddate(d1,interval month_interval month); set d1 = adddate(d1,interval day_interval day); set d1 = adddate(d1,interval hour_interval hour); set d1 = adddate(d1,interval minute_interval minute); set d1 = adddate(d1,interval second_interval second); if d1 > t0 then set d1 = t1; end if; insert into t (t,n) select d0, count(*) from as a, bs b, cs c, ds d where a.name regexp a_name and b.a_id = a.id and b.name regexp b_name and c.b_id = b.id and c.name regexp c_name and d.c_id = c.id and d.t0 >= d0 and d.t1 < d1; set d0 = adddate(d0,interval year_interval year); set d0 = adddate(d0,interval month_interval month); set d0 = adddate(d0,interval day_interval day); set d0 = adddate(d0,interval hour_interval hour); set d0 = adddate(d0,interval minute_interval minute); set d0 = adddate(d0,interval second_interval second); end while; select t as 'time', n as 'number' from t; -- And this works correctly, however the problem is that it can be quite/ very slow, depending on the search criteria and the number of records matched. Typical tables sizes for tables as, bs and cs are around 10, whereas the table size for ds can be very large, say 5 million entries. Could anybody tell me if I am going about this the right way, or am I doing something very sub-optimal? Are there any ways I can optimise what I'm doing? Thanks a lot, Chris |