Help writing SQL statement in PHP script

This is a discussion on Help writing SQL statement in PHP script within the PHP Language forums, part of the PHP Programming Forums category; On May 15, 12:27 am, Corey Jansen <ccj9...@gmail.com> wrote: > Jerry's approach results in ...


Go Back   Usenet Forums > PHP Programming Forums > PHP Language

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #21 (permalink)  
Old 05-15-2008
vkayute@gmail.com
 
Posts: n/a
Default Re: Help writing SQL statement in PHP script

On May 15, 12:27 am, Corey Jansen <ccj9...@gmail.com> wrote:
> Jerry's approach results in a "cartesian explosion."


Which is exactly the problem database normalization is designed to
prevent.

If only Mr. Stuckle had listened to what 10 people told him already.
Obstinacy is his best policy it seems :)

Yet another lesson in "Why You Should Use Proper Database Design."
Reply With Quote
  #22 (permalink)  
Old 05-15-2008
The Natural Philosopher
 
Posts: n/a
Default Re: Help writing SQL statement in PHP script

Corey Jansen wrote:
> I tried both queries, and the result is Jerry's method produces very
> strange results. The normalized approach posted by petersprc does give
> the expected result though.
>
> For a table containing a a few thousand records with duplicates, Jerry's
> query returned 200 million rows (yes 200 million) after running for
> about 2 minutes. That's more rows than there were in the original table.
>


Those that can, do. Those that can't, teach. ;-)

If I had a tenner for every 'theoretically correct' approach that has
resulted in hgue software size, or machine overhead, or just plain not
working..



> I copied the query directly into a test case.
>
> DROP PROCEDURE IF EXISTS setup;
>
> DELIMITER //
>
> CREATE PROCEDURE setup ()
> BEGIN
> DECLARE i INT DEFAULT 0;
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (entry_id int,
> cat_id int);
> WHILE i < 10000 DO
> INSERT INTO test VALUES (2, 30),
> (2, 35), (3, 30), (3, 35);
> SET i = i + 1;
> END WHILE;
> END;
>
> //
>
> DELIMITER ;
>
> CALL setup();
>
> DROP TABLE IF EXISTS result;
>
> CREATE TABLE result AS
> SELECT a.entry_id
> FROM test a
> INNER JOIN test b
> ON a.entry_id = b.entry_id
> WHERE a.cat_id = 30
> AND b.cat_id = 35;
>
> The output is:
>
> Query OK, 200000000 rows affected (2 min 6.35 sec)
> Records: 200000000 Duplicates: 0 Warnings: 0
>
> Jerry's approach results in a "cartesian explosion."


I'll remember that phrase...
>

Reply With Quote
  #23 (permalink)  
Old 05-15-2008
vkayute@gmail.com
 
Posts: n/a
Default Re: Help writing SQL statement in PHP script

On May 14, 7:21 pm, Mike Lahey <mikey6...@yahoo.com> wrote:
> Jerry Stuckle wrote:
>
> > No argument.

>
> > But that was an additional condition the poster required - not the
> > original op. And that's what makes it incorrect.

>
> Uniqueness is a consequence of the relationship the OP wanted to model.
> Best practice is to create an index, which is the correct solution, as
> has been pointed out several times.
>
> You should properly normalize your DB instead of working around a broken
> design as you're arguing for.


Amen. Any proposed solution that skips this step is incomplete. One
shouldn't rely on a broken data model and expect to get good results.

> The OP wanted to indicate membership in a group. A membership relation
> does not contain duplicates.


Yes, by definition, a membership set has no dups. To take another
example, it wouldn't be proper for a student to belong to the same
class twice. (He could repeat the course, but that wouldn't be the
same class would it.)

Using a flawed db design creates all sorts of inconsistencies which
are better to avoid when developing robust systems.

Jerry's suggested query blows up when faced with duplicates, so you
can see how easy it is to fall into this trap.
Reply With Quote
  #24 (permalink)  
Old 05-15-2008
Jerry Stuckle
 
Posts: n/a
Default Re: Help writing SQL statement in PHP script

Corey Jansen wrote:
> I tried both queries, and the result is Jerry's method produces very
> strange results. The normalized approach posted by petersprc does give
> the expected result though.
>
> For a table containing a a few thousand records with duplicates, Jerry's
> query returned 200 million rows (yes 200 million) after running for
> about 2 minutes. That's more rows than there were in the original table.
>
> I copied the query directly into a test case.
>
> DROP PROCEDURE IF EXISTS setup;
>
> DELIMITER //
>
> CREATE PROCEDURE setup ()
> BEGIN
> DECLARE i INT DEFAULT 0;
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (entry_id int,
> cat_id int);
> WHILE i < 10000 DO
> INSERT INTO test VALUES (2, 30),
> (2, 35), (3, 30), (3, 35);
> SET i = i + 1;
> END WHILE;
> END;
>
> //
>
> DELIMITER ;
>
> CALL setup();
>
> DROP TABLE IF EXISTS result;
>
> CREATE TABLE result AS
> SELECT a.entry_id
> FROM test a
> INNER JOIN test b
> ON a.entry_id = b.entry_id
> WHERE a.cat_id = 30
> AND b.cat_id = 35;
>
> The output is:
>
> Query OK, 200000000 rows affected (2 min 6.35 sec)
> Records: 200000000 Duplicates: 0 Warnings: 0
>
> Jerry's approach results in a "cartesian explosion."
>


Then you have a broken database server. You need to report that as a
bug to MySQL ASAP. A lot of people depend self-join queries like this!

This works fine (sorry about the line wraps):

<?php

$link = mysql_connect('localhost', 'root', 'vps11131') or die("Can't
connect: " . mysql_error());
$db = mysql_select_db('test');

// Clear table if it existed
mysql_query('DROP TABLE IF EXISTS test');
mysql_query('CREATE TABLE test (groupid INT NOT NULL, ' .
'userid INT NOT NULL, PRIMARY KEY(groupid, userid))');

// Insert 10K rows of data
for ($i = 1; $i <= 100; $i++)
for ($j = 1; $j<= 100; $j++)
mysql_query("INSERT INTO test(groupid, userid) VALUES($i, $j)");

// Now lets get rid of some of the data so we have meaningful results
mysql_query('DELETE FROM test WHERE groupid=32 AND MOD(userid, 3) > 0');
mysql_query('DELETE FROM test WHERE groupid=38 AND MOD(userid, 4) > 0');

// Pull the matching data from the table

$result = mysql_query('SELECT a.userid AS userid ' .
'FROM test a ' .
'INNER JOIN test b ' .
'ON a.userid = b.userid ' .
'WHERE a.groupid = 32 ' .
'AND b.groupid = 35');
echo 'Rows found: ' . mysql_num_rows($result) . "\n";
while ($data = mysql_fetch_array($result))
echo $data['userid'] . " ";
mysql_close();
?>

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Reply With Quote
  #25 (permalink)  
Old 05-15-2008
Jerry Stuckle
 
Posts: n/a
Default Re: Help writing SQL statement in PHP script

vkayute@gmail.com wrote:
> On May 15, 12:27 am, Corey Jansen <ccj9...@gmail.com> wrote:
>> Jerry's approach results in a "cartesian explosion."

>
> Which is exactly the problem database normalization is designed to
> prevent.
>
> If only Mr. Stuckle had listened to what 10 people told him already.
> Obstinacy is his best policy it seems :)
>
> Yet another lesson in "Why You Should Use Proper Database Design."
>


I'm not arguing about proper database design. My only comment is it is
IMPOSSIBLE to determine if the database is normalized or not from the
given information. There could be one or more additional columns to
determine uniqueness, for instance.

And people wonder why I send folks to comp.databases.mysql for MySQL
questions - that's where the REAL experts hang out.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Reply With Quote
  #26 (permalink)  
Old 05-15-2008
Jerry Stuckle
 
Posts: n/a
Default Re: Help writing SQL statement in PHP script

The Natural Philosopher wrote:
> Corey Jansen wrote:
>> I tried both queries, and the result is Jerry's method produces very
>> strange results. The normalized approach posted by petersprc does give
>> the expected result though.
>>
>> For a table containing a a few thousand records with duplicates,
>> Jerry's query returned 200 million rows (yes 200 million) after
>> running for about 2 minutes. That's more rows than there were in the
>> original table.
>>

>
> Those that can, do. Those that can't, teach. ;-)
>


And those who can't teach become philosophers.

> If I had a tenner for every 'theoretically correct' approach that has
> resulted in hgue software size, or machine overhead, or just plain not
> working..
>
>


If I had a tenner for every good comment you made, I'd be broke.
However, if I had ten cents for every stupid remark you made, I could
retire.

>
>> I copied the query directly into a test case.
>>
>> DROP PROCEDURE IF EXISTS setup;
>>
>> DELIMITER //
>>
>> CREATE PROCEDURE setup ()
>> BEGIN
>> DECLARE i INT DEFAULT 0;
>> DROP TABLE IF EXISTS test;
>> CREATE TABLE test (entry_id int,
>> cat_id int);
>> WHILE i < 10000 DO
>> INSERT INTO test VALUES (2, 30),
>> (2, 35), (3, 30), (3, 35);
>> SET i = i + 1;
>> END WHILE;
>> END;
>>
>> //
>>
>> DELIMITER ;
>>
>> CALL setup();
>>
>> DROP TABLE IF EXISTS result;
>>
>> CREATE TABLE result AS
>> SELECT a.entry_id
>> FROM test a
>> INNER JOIN test b
>> ON a.entry_id = b.entry_id
>> WHERE a.cat_id = 30
>> AND b.cat_id = 35;
>>
>> The output is:
>>
>> Query OK, 200000000 rows affected (2 min 6.35 sec)
>> Records: 200000000 Duplicates: 0 Warnings: 0
>>
>> Jerry's approach results in a "cartesian explosion."

>
> I'll remember that phrase...
>>

>


ROFLMAO. Never heard of a cartesian product?

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Reply With Quote
  #27 (permalink)  
Old 05-15-2008
Jerry Stuckle
 
Posts: n/a
Default Re: Help writing SQL statement in PHP script

vkayute@gmail.com wrote:
> On May 14, 7:21 pm, Mike Lahey <mikey6...@yahoo.com> wrote:
>> Jerry Stuckle wrote:
>>
>>> No argument.
>>> But that was an additional condition the poster required - not the
>>> original op. And that's what makes it incorrect.

>> Uniqueness is a consequence of the relationship the OP wanted to model.
>> Best practice is to create an index, which is the correct solution, as
>> has been pointed out several times.
>>
>> You should properly normalize your DB instead of working around a broken
>> design as you're arguing for.

>
> Amen. Any proposed solution that skips this step is incomplete. One
> shouldn't rely on a broken data model and expect to get good results.
>


No arguments. But based on the information given, we cannot say the
database was not normalized.

>> The OP wanted to indicate membership in a group. A membership relation
>> does not contain duplicates.

>
> Yes, by definition, a membership set has no dups. To take another
> example, it wouldn't be proper for a student to belong to the same
> class twice. (He could repeat the course, but that wouldn't be the
> same class would it.)
>


It depends. For instance, you could have an additional column -
privileges. Things like "read", "post", "upload" to determine the
rights the user has.

> Using a flawed db design creates all sorts of inconsistencies which
> are better to avoid when developing robust systems.
>
> Jerry's suggested query blows up when faced with duplicates, so you
> can see how easy it is to fall into this trap.
>


My query does not blow up with there are duplicates. It works perfectly
well. But Peter's fails in that case.

And people wonder why I refer MySQL questions to comp.databases.mysql -
where the real experts hang out.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Reply With Quote
  #28 (permalink)  
Old 05-16-2008
Mitch Sherman
 
Posts: n/a
Default Re: Help writing SQL statement in PHP script

On Thu, 15 May 2008 11:55:29 -0400, Jerry Stuckle
<jstucklex@attglobal.net> wrote:
>vkayute@gmail.com wrote:
>> On May 15, 12:27 am, Corey Jansen <ccj9...@gmail.com> wrote:
>>> Jerry's approach results in a "cartesian explosion."

>>
>> Which is exactly the problem database normalization is designed to
>> prevent.
>>
>> If only Mr. Stuckle had listened to what 10 people told him already.
>> Obstinacy is his best policy it seems :)
>>
>> Yet another lesson in "Why You Should Use Proper Database Design."
>>

>
>I'm not arguing about proper database design. My only comment is it is
>IMPOSSIBLE to determine if the database is normalized or not from the
>given information.


That doesn't mean that the relation can't be normalized first. That
seems to be the critical point you're missing.

You seem to arguing that it's better to build on a potentially flawed
database design rather than get it right first, which is terrible
advice.

> There could be one or more additional columns to determine uniqueness, for instance.
>
>And people wonder why I send folks to comp.databases.mysql for MySQL
>questions - that's where the REAL experts hang out.


This is a pointless hypothetical. If you have N columns, you can still
maintain uniqueness across those columns. That doesn't require
duplicate rows any more than the original problem which had only 2
columns.

Mitch
Reply With Quote
  #29 (permalink)  
Old 05-16-2008
Mitch Sherman
 
Posts: n/a
Default Re: Help writing SQL statement in PHP script

On Thu, 15 May 2008 11:50:57 -0400, Jerry Stuckle
<jstucklex@attglobal.net> wrote:
>Corey Jansen wrote:
>>
>> Jerry's approach results in a "cartesian explosion."

>
>Then you have a broken database server. You need to report that as a
>bug to MySQL ASAP. A lot of people depend self-join queries like this!


Not at all, this is a bug in your query. It produced the same result
here. MySQL did exactly what you told it to do. You seem desperate to
avoid acknowledging this, resorting even to making up fictitious MySQL
bug reports.

The problem is you are self-joining using a condition that isn't
unique and lacks a primary key reference. Sometimes this is what you
want, but that is not the case in the original problem.

Let me spell it out for you. Let's say you have rows A through F that
contain the following values:

A: (2, 30)
B: (2, 35)
C: (2, 30)
D: (2, 35)
E: (2, 30)
F: (2, 35)

There are only 6 rows in the table. Your query, however, will produce
more than 6 matches. This is because rows A, C, and E can each be
paired a total of 3 times. The result of the inner join is:

(A, B), (A, D), (A, F)
(C, B), (C, D), (C, F)
(E, B), (E, D), (E, F)

Now, here's how it looks in SQL:

-- Create the table with 6 rows --

DROP TABLE IF EXISTS test;
CREATE TABLE test (entry_id int, cat_id int);
INSERT INTO test (entry_id, cat_id) values
(2, 30), (2, 35), (2, 30), (2, 35), (2, 30),
(2, 35);

-- Run the query --

SELECT a.entry_id FROM test a INNER JOIN test b
ON a.entry_id = b.entry_id WHERE a.entry_id =
b.entry_id AND a.cat_id = 30 AND b.cat_id = 35;

The result of your query is:

9 rows in set (0.00 sec)

This gets worse as your table gets bigger. You end up with the
"cartesian explosion" in the test case that you are denying exists.

>
>This works fine (sorry about the line wraps):
>
><?php
>
>$link = mysql_connect('localhost', 'root', 'vps11131') or die("Can't
>connect: " . mysql_error());
>$db = mysql_select_db('test');
>
>// Clear table if it existed
>mysql_query('DROP TABLE IF EXISTS test');
>mysql_query('CREATE TABLE test (groupid INT NOT NULL, ' .
> 'userid INT NOT NULL, PRIMARY KEY(groupid, userid))');


Your script doesn't test the same scenario at all. The table you
created is guaranteed not to have any duplicates because you defined a
PRIMARY KEY. This is exactly what you've been arguing against doing
all this time, so you've basically demonstrated why uniqueness is a
good thing.

Mitch
Reply With Quote
  #30 (permalink)  
Old 05-16-2008
Jerry Stuckle
 
Posts: n/a
Default Re: Help writing SQL statement in PHP script

Mitch Sherman wrote:
> On Thu, 15 May 2008 11:55:29 -0400, Jerry Stuckle
> <jstucklex@attglobal.net> wrote:
>> vkayute@gmail.com wrote:
>>> On May 15, 12:27 am, Corey Jansen <ccj9...@gmail.com> wrote:
>>>> Jerry's approach results in a "cartesian explosion."
>>> Which is exactly the problem database normalization is designed to
>>> prevent.
>>>
>>> If only Mr. Stuckle had listened to what 10 people told him already.
>>> Obstinacy is his best policy it seems :)
>>>
>>> Yet another lesson in "Why You Should Use Proper Database Design."
>>>

>> I'm not arguing about proper database design. My only comment is it is
>> IMPOSSIBLE to determine if the database is normalized or not from the
>> given information.

>
> That doesn't mean that the relation can't be normalized first. That
> seems to be the critical point you're missing.
>


No, the critical point YOU'RE MISSING is that the table may be
normalized - AND STILL HAVE DUPLICATES IN THESE COLUMNS.

That is the critical point!

> You seem to arguing that it's better to build on a potentially flawed
> database design rather than get it right first, which is terrible
> advice.
>


No, I'm not. There is nothing flawed about a design which has three
columns (of which these are only two) determining the primary key (or
other unique value).

>> There could be one or more additional columns to determine uniqueness, for instance.
>>
>> And people wonder why I send folks to comp.databases.mysql for MySQL
>> questions - that's where the REAL experts hang out.

>
> This is a pointless hypothetical. If you have N columns, you can still
> maintain uniqueness across those columns. That doesn't require
> duplicate rows any more than the original problem which had only 2
> columns.
>
> Mitch
>


No, it is not pointlessly hypothetical. It is very germane to this
situation. We do not have all of the information - the complete
database design, usage, etc.

The other column(s) may not be germane to the problem, so the original
op did not list them. That is quite common - and correct - as it does
not confuse the issue at hand with irrelevant data. There may very well
have been 2 columns - or 20 columns or even 200 columns. You don't know
which is correct.

For instance, here's a table which could very well be the case:

userid groupid permission
1 1 read
1 1 write
1 1 delete
1 2 read
1 3 read

This is a commonly used design. The permission column is not pertinent
to the original ops question - so it wouldn't be listed. But Peter's
query will fail if it looks for someone who is a member if groups 1 and
2. The correct query works in this case just fine.

My God, I've never seen someone so insistent about making false
assumptions about someone else's code - and so stubborn about sticking
to a bad suggestion.

I really suggest you learn some more advanced sql - actually, the
correct answer isn't even advanced level. I'm not sure it even makes
intermediate level.

The correct query works 100% of the time - whether there are duplicates
or not.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 05:54 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0