simple SQL select optimisation problem for sql wizzard ;-)

This is a discussion on simple SQL select optimisation problem for sql wizzard ;-) within the MySQL Database forums, part of the Database Forums category; Hello everybody, here is a simple SQL problem. Imagine a table which describes classes of students. -- -- Structure of the table `...


Go Back   Usenet Forums > Database Forums > MySQL Database

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 03-06-2006
paul
 
Posts: n/a
Default simple SQL select optimisation problem for sql wizzard ;-)

Hello everybody,

here is a simple SQL problem.

Imagine a table which describes classes of students.

--
-- Structure of the table `students`
--

CREATE TABLE `students` (
`IdStudent` int(10) unsigned NOT NULL auto_increment,
`IdClass` int(10) unsigned NOT NULL default '0',
`FirstName` char(30) NOT NULL default '',
PRIMARY KEY (`IdStudent`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=11 ;

--
-- Content of the table `students`
--

INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (1, 1,
'Pierre');
INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (2, 1,
'Paul');
INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (3, 1,
'Jacques');
INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (4, 2,
'Yann');
INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (5, 3,
'Damien');
INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (6, 3,
'Sam');
INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (7, 3,
'George');
INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (8, 3,
'Robert');
INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (9, 4,
'Michel');
INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (10, 4,
'Toto');

The goal is to get by the simplest and fastest way a maximum of N distinct
students of each class.

For N=1, a simple solution is : SELECT * FROM students GROUP BY IdClass

IdStudent IdClass FirstName
1 1 Pierre
4 2 Yann
5 3 Damien
9 4 Michel


For N=2 with the same data, it would give :

IdStudent IdClass FirstName
1 1 Pierre
2 1 Paul
4 2 Yann
5 3 Damien
6 3 Sam
9 4 Michel
10 4 Toto


For N=3, we would get:

IdStudent IdClass FirstName
1 1 Pierre
2 1 Paul
3 1 Jacques
4 2 Yann
5 3 Damien
6 3 Sam
7 3 George
9 4 Michel
10 4 Toto


Question :

What are the best solutions for N between 1 and 10?

I would like to avoid to do:

SELECT DISTINCT IdClass FROM student ;
For each IdClass "id" found do the following query:
SELECT * FROM student WHERE IdClass="id" LIMIT N ;

Thank you in advance for your help.

Paul.


Reply With Quote
  #2 (permalink)  
Old 03-06-2006
Bill Karwin
 
Posts: n/a
Default Re: simple SQL select optimisation problem for sql wizzard ;-)

"paul" <zerawarez@free.fr> wrote in message
news:440c73bd$0$27055$626a54ce@news.free.fr...
> The goal is to get by the simplest and fastest way a maximum of N distinct
> students of each class.


Note that using a single SQL query to generate the result set is not always
the best or fastest solution to every problem. Sometimes it's better to use
a procedural language to issue several queries to generate the full result
set. It's certainly more straightforward to most programmers, thus it is
easier to find someone who can implement and maintain the software.

The "Top-N per group" problem is tricky. Oracle has a pseudocolumn called
ROWNUM (which is offensive to relational purists) that can be used to
achieve this. MS SQL Server has its "SELECT TOP n ..." proprietary
extension to the SQL language.

MySQL has neither of these extensions, so there's a classic
correlated-subquery method:

SELECT s.*
FROM students AS s
WHERE 2 >= (SELECT COUNT(DISTINCT IdStudent) FROM students AS s2 WHERE
s.IdClass = s2.IdClass AND s2.IdStudent >= s.IdStudent)

Note that you didn't state whether you wanted any specific top 2 students
per class, so I'm assuming it's based on IdStudent.

Regards,
Bill K.


Reply With Quote
  #3 (permalink)  
Old 03-07-2006
paul
 
Posts: n/a
Default Re: simple SQL select optimisation problem for sql wizzard ;-)


"Bill Karwin" <bill@karwin.com> wrote in message
news:dui51h0s2d@enews3.newsguy.com...
> "paul" <zerawarez@free.fr> wrote in message
> news:440c73bd$0$27055$626a54ce@news.free.fr...
> > The goal is to get by the simplest and fastest way a maximum of N

distinct
> > students of each class.

>
> Note that using a single SQL query to generate the result set is not

always
> the best or fastest solution to every problem. Sometimes it's better to

use
> a procedural language to issue several queries to generate the full result
> set. It's certainly more straightforward to most programmers, thus it is
> easier to find someone who can implement and maintain the software.



You are right, but in my case, I want to avoid to do "Number Of Class"+1
queries to get my result set.

SELECT DISTINCT IdClass FROM student ;
For each IdClass "id" found do the following query:
SELECT * FROM student WHERE IdClass="id" LIMIT N ;



> The "Top-N per group" problem is tricky. Oracle has a pseudocolumn called
> ROWNUM (which is offensive to relational purists) that can be used to
> achieve this. MS SQL Server has its "SELECT TOP n ..." proprietary
> extension to the SQL language.


You are right (again ;), I did not know this ROWNUM functionnality. Thank
you.
There is a work around with user variables in SQL to simultate Rownum.

SET @a=0;
SELECT @a:=@a+1 as rownum, s.* from students;

But my need is a little more specific. I would need an array of user
variables, but it does not seem possible to have this in MySQL.
So I would be able to write something like:

SELECT s.IdStudent, s.IdClass, s.FirstName, @myArray[ s.IdClass ]:=
@myArray[ s.IdClass ] + 1
FROM student s
WHERE @myArray[ s.IdClass ] < N

Do you see a way to write this?


> MySQL has neither of these extensions, so there's a classic
> correlated-subquery method:
>
> SELECT s.*
> FROM students AS s
> WHERE 2 >= (SELECT COUNT(DISTINCT IdStudent) FROM students AS s2 WHERE
> s.IdClass = s2.IdClass AND s2.IdStudent >= s.IdStudent)
>
> Note that you didn't state whether you wanted any specific top 2 students
> per class, so I'm assuming it's based on IdStudent.


Yes, congratulation, It works! You are a wizzard! :) I did not know such
conditions in subqueries were possible.
Indeed, I don't mind which students are returned in each class.

But, in terms of efficiency, the subselect of your query seems to be
evaluated for each row in the table, isn't it?
So I fear this solution is even worse in terms of speed and load than the
one I wanted to avoid...

SELECT DISTINCT IdClass FROM student ;
For each IdClass "id" found do the following query:
SELECT * FROM student WHERE IdClass="id" LIMIT N ;

What do you think?


Thanks a lot for your help!
Best regards,

paul.


Reply With Quote
  #4 (permalink)  
Old 03-07-2006
Bill Karwin
 
Posts: n/a
Default Re: simple SQL select optimisation problem for sql wizzard ;-)

"paul" <zerawarez@free.fr> wrote in message
news:440d9a02$0$12856$626a54ce@news.free.fr...
>> SELECT s.*
>> FROM students AS s
>> WHERE 2 >= (SELECT COUNT(DISTINCT IdStudent) FROM students AS s2 WHERE
>> s.IdClass = s2.IdClass AND s2.IdStudent >= s.IdStudent)

>
> Yes, congratulation, It works! You are a wizzard! :) I did not know such
> conditions in subqueries were possible.


Yep! That's called a correlated subquery.

> But, in terms of efficiency, the subselect of your query seems to be
> evaluated for each row in the table, isn't it?


Right. A non-correlated subquery can be factored out the subquery, executed
once, and its results used repeatedly, for the rows of the outer query. But
a correlated subquery must be evaluated for each matching row in the outer
query. Make sure the relevant columns (IdClass and IdStudent) are indexed
and there's a good chance of being pretty efficient.

> So I fear this solution is even worse in terms of speed and load than the
> one I wanted to avoid...


Not necessarily; try both methods, do some timing measurements, and see
which one works best.

And I agree with Peter Coffin's point. Don't get too hung up on optimizing
every query. Don't spend days working on a query that will be run
infrequently and with no strong requirement for speed.

Regards,
Bill K.


Reply With Quote
  #5 (permalink)  
Old 03-08-2006
paul
 
Posts: n/a
Default Re: simple SQL select optimisation problem for sql wizzard ;-)

> Why? I mean, sure, if you're going to be running this job several times
> per minute, then some attention may be necessary, but it looks like
> you're running a class list. Who cares if it takes 30 second and 200
> queries instead of four seconds and one query, when the job's only going
> to run a couple of times per year?


The problem I am exposing is conceptually the same as the one I am facing in
my developpements. I reassure you, it has nothing to do with students or
classes... It was just a way to make the problem look easier ;)
Thanks.

paul.


Reply With Quote
  #6 (permalink)  
Old 03-08-2006
paul
 
Posts: n/a
Default Re: simple SQL select optimisation problem for sql wizzard ;-)


"Bill Karwin" <bill@karwin.com> wrote in message
news:dukimd0id0@enews1.newsguy.com...
> And I agree with Peter Coffin's point. Don't get too hung up on

optimizing
> every query. Don't spend days working on a query that will be run
> infrequently and with no strong requirement for speed.


Unfortunately, I do have strong requirement for speed. And this query might
be triggered often ;)
Do you have an idea of work around to implement a dynamic array of MySQL
user variables as described here below?

There is a work around with user variables in MySQL to simultate Rownum.

SET @a=0;
SELECT @a:=@a+1 as rownum, s.* from students;

But my need is a little more specific. I would need an array of user
variables, but it does not seem possible to have this in MySQL.
So I would be able to write something like:

SELECT s.IdStudent, s.IdClass, s.FirstName, @myArray[ s.IdClass ]:=
@myArray[ s.IdClass ] + 1
FROM student s
WHERE @myArray[ s.IdClass ] < N

Do you see a way to write this?

Thanks a lot for your kind help.
Regards,

paul.


Reply With Quote
  #7 (permalink)  
Old 03-08-2006
Bill Karwin
 
Posts: n/a
Default Re: simple SQL select optimisation problem for sql wizzard ;-)

"paul" <zerawarez@free.fr> wrote in message
news:440eb90b$0$11675$626a54ce@news.free.fr...
> There is a work around with user variables in MySQL to simultate Rownum.
> Do you have an idea of work around to implement a dynamic array of
> MySQL user variables ...?


There's often a way to use pre-initialized data to achieve performance.

You could create a new column in the table, and make sure it contains the
ordinal number of the student in his/her class; this is the equivalent of a
rownum per group.

ALTER TABLE students ADD COLUMN ordinal INTEGER UNSIGNED NOT NULL;
SET @a = 0; UPDATE students SET ordinal = @a := @a + 1 WHERE IdClass = 1;
SET @a = 0; UPDATE students SET ordinal = @a := @a + 1 WHERE IdClass = 2;
SET @a = 0; UPDATE students SET ordinal = @a := @a + 1 WHERE IdClass = 3;
SET @a = 0; UPDATE students SET ordinal = @a := @a + 1 WHERE IdClass = 4;
... repeat for each class to initialize the ordinals ...

Now you can get the top N per group very easily:

SELECT * FROM students WHERE ordinal <= 2;

Update this sequence of numbers anytime you insert or delete records from a
group.
For instance, after adding or removing a student in class 3, re-initialize
the ordinals for that class:

SET @a = 0; UPDATE students SET ordinal = @a := @a + 1 WHERE IdClass = 3;

Regards,
Bill K.


Reply With Quote
  #8 (permalink)  
Old 03-09-2006
paul
 
Posts: n/a
Default Re: simple SQL select optimisation problem for sql wizzard ;-)


"Bill Karwin" <bill@karwin.com> wrote in message
news:dung3k01m9g@enews4.newsguy.com...
> There's often a way to use pre-initialized data to achieve performance.
> [...]


Your solution is great. It fits well to my needs. Thanks a lot Bill.
You are very good at finding solutions! Congratulation!
Best regards,

paul.



Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 06:07 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0