This is a discussion on simple SQL select optimisation problem for sql wizzard ;-) within the MySQL Database forums, part of the Database Forums category; Hello everybody, here is a simple SQL problem. Imagine a table which describes classes of students. -- -- Structure of the table `...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Hello everybody,
here is a simple SQL problem. Imagine a table which describes classes of students. -- -- Structure of the table `students` -- CREATE TABLE `students` ( `IdStudent` int(10) unsigned NOT NULL auto_increment, `IdClass` int(10) unsigned NOT NULL default '0', `FirstName` char(30) NOT NULL default '', PRIMARY KEY (`IdStudent`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=11 ; -- -- Content of the table `students` -- INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (1, 1, 'Pierre'); INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (2, 1, 'Paul'); INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (3, 1, 'Jacques'); INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (4, 2, 'Yann'); INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (5, 3, 'Damien'); INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (6, 3, 'Sam'); INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (7, 3, 'George'); INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (8, 3, 'Robert'); INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (9, 4, 'Michel'); INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (10, 4, 'Toto'); The goal is to get by the simplest and fastest way a maximum of N distinct students of each class. For N=1, a simple solution is : SELECT * FROM students GROUP BY IdClass IdStudent IdClass FirstName 1 1 Pierre 4 2 Yann 5 3 Damien 9 4 Michel For N=2 with the same data, it would give : IdStudent IdClass FirstName 1 1 Pierre 2 1 Paul 4 2 Yann 5 3 Damien 6 3 Sam 9 4 Michel 10 4 Toto For N=3, we would get: IdStudent IdClass FirstName 1 1 Pierre 2 1 Paul 3 1 Jacques 4 2 Yann 5 3 Damien 6 3 Sam 7 3 George 9 4 Michel 10 4 Toto Question : What are the best solutions for N between 1 and 10? I would like to avoid to do: SELECT DISTINCT IdClass FROM student ; For each IdClass "id" found do the following query: SELECT * FROM student WHERE IdClass="id" LIMIT N ; Thank you in advance for your help. Paul. |
|
|||
|
"paul" <zerawarez@free.fr> wrote in message
news:440c73bd$0$27055$626a54ce@news.free.fr... > The goal is to get by the simplest and fastest way a maximum of N distinct > students of each class. Note that using a single SQL query to generate the result set is not always the best or fastest solution to every problem. Sometimes it's better to use a procedural language to issue several queries to generate the full result set. It's certainly more straightforward to most programmers, thus it is easier to find someone who can implement and maintain the software. The "Top-N per group" problem is tricky. Oracle has a pseudocolumn called ROWNUM (which is offensive to relational purists) that can be used to achieve this. MS SQL Server has its "SELECT TOP n ..." proprietary extension to the SQL language. MySQL has neither of these extensions, so there's a classic correlated-subquery method: SELECT s.* FROM students AS s WHERE 2 >= (SELECT COUNT(DISTINCT IdStudent) FROM students AS s2 WHERE s.IdClass = s2.IdClass AND s2.IdStudent >= s.IdStudent) Note that you didn't state whether you wanted any specific top 2 students per class, so I'm assuming it's based on IdStudent. Regards, Bill K. |
|
|||
|
"Bill Karwin" <bill@karwin.com> wrote in message news:dui51h0s2d@enews3.newsguy.com... > "paul" <zerawarez@free.fr> wrote in message > news:440c73bd$0$27055$626a54ce@news.free.fr... > > The goal is to get by the simplest and fastest way a maximum of N distinct > > students of each class. > > Note that using a single SQL query to generate the result set is not always > the best or fastest solution to every problem. Sometimes it's better to use > a procedural language to issue several queries to generate the full result > set. It's certainly more straightforward to most programmers, thus it is > easier to find someone who can implement and maintain the software. You are right, but in my case, I want to avoid to do "Number Of Class"+1 queries to get my result set. SELECT DISTINCT IdClass FROM student ; For each IdClass "id" found do the following query: SELECT * FROM student WHERE IdClass="id" LIMIT N ; > The "Top-N per group" problem is tricky. Oracle has a pseudocolumn called > ROWNUM (which is offensive to relational purists) that can be used to > achieve this. MS SQL Server has its "SELECT TOP n ..." proprietary > extension to the SQL language. You are right (again ;), I did not know this ROWNUM functionnality. Thank you. There is a work around with user variables in SQL to simultate Rownum. SET @a=0; SELECT @a:=@a+1 as rownum, s.* from students; But my need is a little more specific. I would need an array of user variables, but it does not seem possible to have this in MySQL. So I would be able to write something like: SELECT s.IdStudent, s.IdClass, s.FirstName, @myArray[ s.IdClass ]:= @myArray[ s.IdClass ] + 1 FROM student s WHERE @myArray[ s.IdClass ] < N Do you see a way to write this? > MySQL has neither of these extensions, so there's a classic > correlated-subquery method: > > SELECT s.* > FROM students AS s > WHERE 2 >= (SELECT COUNT(DISTINCT IdStudent) FROM students AS s2 WHERE > s.IdClass = s2.IdClass AND s2.IdStudent >= s.IdStudent) > > Note that you didn't state whether you wanted any specific top 2 students > per class, so I'm assuming it's based on IdStudent. Yes, congratulation, It works! You are a wizzard! :) I did not know such conditions in subqueries were possible. Indeed, I don't mind which students are returned in each class. But, in terms of efficiency, the subselect of your query seems to be evaluated for each row in the table, isn't it? So I fear this solution is even worse in terms of speed and load than the one I wanted to avoid... SELECT DISTINCT IdClass FROM student ; For each IdClass "id" found do the following query: SELECT * FROM student WHERE IdClass="id" LIMIT N ; What do you think? Thanks a lot for your help! Best regards, paul. |
|
|||
|
"paul" <zerawarez@free.fr> wrote in message
news:440d9a02$0$12856$626a54ce@news.free.fr... >> SELECT s.* >> FROM students AS s >> WHERE 2 >= (SELECT COUNT(DISTINCT IdStudent) FROM students AS s2 WHERE >> s.IdClass = s2.IdClass AND s2.IdStudent >= s.IdStudent) > > Yes, congratulation, It works! You are a wizzard! :) I did not know such > conditions in subqueries were possible. Yep! That's called a correlated subquery. > But, in terms of efficiency, the subselect of your query seems to be > evaluated for each row in the table, isn't it? Right. A non-correlated subquery can be factored out the subquery, executed once, and its results used repeatedly, for the rows of the outer query. But a correlated subquery must be evaluated for each matching row in the outer query. Make sure the relevant columns (IdClass and IdStudent) are indexed and there's a good chance of being pretty efficient. > So I fear this solution is even worse in terms of speed and load than the > one I wanted to avoid... Not necessarily; try both methods, do some timing measurements, and see which one works best. And I agree with Peter Coffin's point. Don't get too hung up on optimizing every query. Don't spend days working on a query that will be run infrequently and with no strong requirement for speed. Regards, Bill K. |
|
|||
|
> Why? I mean, sure, if you're going to be running this job several times
> per minute, then some attention may be necessary, but it looks like > you're running a class list. Who cares if it takes 30 second and 200 > queries instead of four seconds and one query, when the job's only going > to run a couple of times per year? The problem I am exposing is conceptually the same as the one I am facing in my developpements. I reassure you, it has nothing to do with students or classes... It was just a way to make the problem look easier ;) Thanks. paul. |
|
|||
|
"Bill Karwin" <bill@karwin.com> wrote in message news:dukimd0id0@enews1.newsguy.com... > And I agree with Peter Coffin's point. Don't get too hung up on optimizing > every query. Don't spend days working on a query that will be run > infrequently and with no strong requirement for speed. Unfortunately, I do have strong requirement for speed. And this query might be triggered often ;) Do you have an idea of work around to implement a dynamic array of MySQL user variables as described here below? There is a work around with user variables in MySQL to simultate Rownum. SET @a=0; SELECT @a:=@a+1 as rownum, s.* from students; But my need is a little more specific. I would need an array of user variables, but it does not seem possible to have this in MySQL. So I would be able to write something like: SELECT s.IdStudent, s.IdClass, s.FirstName, @myArray[ s.IdClass ]:= @myArray[ s.IdClass ] + 1 FROM student s WHERE @myArray[ s.IdClass ] < N Do you see a way to write this? Thanks a lot for your kind help. Regards, paul. |
|
|||
|
"paul" <zerawarez@free.fr> wrote in message
news:440eb90b$0$11675$626a54ce@news.free.fr... > There is a work around with user variables in MySQL to simultate Rownum. > Do you have an idea of work around to implement a dynamic array of > MySQL user variables ...? There's often a way to use pre-initialized data to achieve performance. You could create a new column in the table, and make sure it contains the ordinal number of the student in his/her class; this is the equivalent of a rownum per group. ALTER TABLE students ADD COLUMN ordinal INTEGER UNSIGNED NOT NULL; SET @a = 0; UPDATE students SET ordinal = @a := @a + 1 WHERE IdClass = 1; SET @a = 0; UPDATE students SET ordinal = @a := @a + 1 WHERE IdClass = 2; SET @a = 0; UPDATE students SET ordinal = @a := @a + 1 WHERE IdClass = 3; SET @a = 0; UPDATE students SET ordinal = @a := @a + 1 WHERE IdClass = 4; ... repeat for each class to initialize the ordinals ... Now you can get the top N per group very easily: SELECT * FROM students WHERE ordinal <= 2; Update this sequence of numbers anytime you insert or delete records from a group. For instance, after adding or removing a student in class 3, re-initialize the ordinals for that class: SET @a = 0; UPDATE students SET ordinal = @a := @a + 1 WHERE IdClass = 3; Regards, Bill K. |
|
|||
|
"Bill Karwin" <bill@karwin.com> wrote in message news:dung3k01m9g@enews4.newsguy.com... > There's often a way to use pre-initialized data to achieve performance. > [...] Your solution is great. It fits well to my needs. Thanks a lot Bill. You are very good at finding solutions! Congratulation! Best regards, paul. |