This is a discussion on Help writing SQL statement in PHP script within the PHP Language forums, part of the PHP Programming Forums category; Jerry Stuckle wrote: > Chuck Cheeze wrote: >> This might be in the wrong group, but... >> >&...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Jerry Stuckle wrote:
> Chuck Cheeze wrote: >> This might be in the wrong group, but... >> >> Here is an example of my data: >> >> entry_id cat_id >> 1 20 >> 2 25 >> 3 30 >> 4 25 >> 5 35 >> 6 25 >> 2 30 >> 2 35 >> 3 35 >> >> As you can see, entry_id's 2 and 3 both belong to cat_id 30 and 35 >> >> I have captured the cat_id's 30 and 35 with my script, so I need all >> entry_id's that belong to BOTH cat_id 30 and 35. >> >> I tried "Select entry_id from myTable where cat_id = '30' and cat_id = >> '35' but obviously that is incorrect. >> >> Can someone help? Thanks... >> > > This isn't a PHP question - it's a database question. You need a group > for your database (i.e. if it's MySQL, comp.databases.mysql). > > You'll get good answers there because that's where the SQL experts hang > out. The answers posted here so far are incorrect. > That of course includes Jerry's. I cant remember which paradox he is quiting.. Anyway the classic one is 'everything I say is false, Is this statement true or false? When dealing with Jerry, it is not a hypothetical question. |
|
|||
|
Captain Paralytic wrote:
> On 12 May, 02:36, Mike Lahey <mikey6...@yahoo.com> wrote: >> Jerry Stuckle wrote: >>> You'll get good answers there because that's where the SQL experts hang >>> out. The answers posted here so far are incorrect. >> Pay attention to the posts. Peter's solution will work. > > It may work, but that does not make it the "correct" way to do it. The only correct solution is the one that is issued with a certificate of Papal infallibility. I.e. the one that comes from the self appointed God/pope of comp.lang.PHP, i.e. Jerry. For the rest of us, what works is good enough. |
|
|||
|
> > Not necessarily. What happens if he has two entries with (2,30)? It > will fail. > > Nothing in the description of the problem prohibits such an occurrence. > That was addressed in the response, but it looks like you missed it. You make the mistake of assuming the data is not normalized. I think the point of Peter's post was to ensure that the data was indeed normalized. |
|
|||
|
Mike Lahey wrote:
> >> >> Not necessarily. What happens if he has two entries with (2,30)? It >> will fail. >> >> Nothing in the description of the problem prohibits such an occurrence. >> > > That was addressed in the response, but it looks like you missed it. > > You make the mistake of assuming the data is not normalized. I think the > point of Peter's post was to ensure that the data was indeed normalized. > Which, if it is not normalized, will not work. As I said - there was nothing in the original problem description to prohibit it. And the correct answer doesn't require unique entries. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |
|
|||
|
On Tue, 13 May 2008 12:02:59 -0400, Jerry Stuckle
<jstucklex@attglobal.net> wrote: >Mike Lahey wrote: >> That was addressed in the response, but it looks like you missed it. >> >> You make the mistake of assuming the data is not normalized. I think the >> point of Peter's post was to ensure that the data was indeed normalized. > >Which, if it is not normalized, will not work. > >As I said - there was nothing in the original problem description to >prohibit it. And the correct answer doesn't require unique entries. That's impossible if you have a unique index as the poster stated. In a classic one-to-many (1:N) relationship, you shouldn't have any duplicates. The recommendation to normalize your data is sound advice which you seem to be disagreeing with. The query you're suggesting will incorrectly overstate the number of unique entries matched if there are duplicates. You should use "SELECT DISTINCT" in that case. Mitch |
|
|||
|
On May 14, 3:32 am, Mitch Sherman <mitch.sher...@hush.ai> wrote:
> That's impossible if you have a unique index as the poster stated. > > In a classic one-to-many (1:N) relationship, you shouldn't have any > duplicates. The recommendation to normalize your data is sound advice > which you seem to be disagreeing with. > > The query you're suggesting will incorrectly overstate the number of > unique entries matched if there are duplicates. You should use "SELECT > DISTINCT" in that case. > > Mitch I have to agree. It's always easier to get the most out of your database when you've normalized it properly. For those who are interested, MySQL has a good introduction to this on their development portal: http://dev.mysql.com/tech-resources/...alization.html |
|
|||
|
Mitch Sherman wrote:
> On Tue, 13 May 2008 12:02:59 -0400, Jerry Stuckle > <jstucklex@attglobal.net> wrote: >> Mike Lahey wrote: >>> That was addressed in the response, but it looks like you missed it. >>> >>> You make the mistake of assuming the data is not normalized. I think the >>> point of Peter's post was to ensure that the data was indeed normalized. >> Which, if it is not normalized, will not work. >> >> As I said - there was nothing in the original problem description to >> prohibit it. And the correct answer doesn't require unique entries. > > That's impossible if you have a unique index as the poster stated. > > In a classic one-to-many (1:N) relationship, you shouldn't have any > duplicates. The recommendation to normalize your data is sound advice > which you seem to be disagreeing with. > > The query you're suggesting will incorrectly overstate the number of > unique entries matched if there are duplicates. You should use "SELECT > DISTINCT" in that case. > > Mitch > No argument. But that was an additional condition the poster required - not the original op. And that's what makes it incorrect. It may be very possible to have a perfectly normalized database but duplicate columns here - there could be a third column which is also part of the primary key, but not pertinent to this question, so was not asked. For instance, you might have: Game HomeTeam VisitingTeam 1 5 6 2 7 8 3 5 8 4 3 4 5 5 6 You want to know which teams played both team 6 and team 8. The Game column is not pertinent to the question - but is critical to the game design, because teams often play each other more than once. In this case your query would fail. The one in comp.databases.mysql would work correctly. Which is why I directed him there. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |
|
|||
|
Jerry Stuckle wrote:
> > No argument. > > But that was an additional condition the poster required - not the > original op. And that's what makes it incorrect. Uniqueness is a consequence of the relationship the OP wanted to model. Best practice is to create an index, which is the correct solution, as has been pointed out several times. You should properly normalize your DB instead of working around a broken design as you're arguing for. > It may be very possible to have a perfectly normalized database but > duplicate columns here - there could be a third column which is also > part of the primary key, but not pertinent to this question, so was not > asked. > > For instance, you might have: > > Game HomeTeam VisitingTeam > 1 5 6 > 2 7 8 > 3 5 8 > 4 3 4 > 5 5 6 > > You want to know which teams played both team 6 and team 8. The Game > column is not pertinent to the question - but is critical to the game > design, because teams often play each other more than once. This is not the same relationship. Teams can play each other more than once, but a well-formed set cannot contain the same element twice. Pay attention to the problem description which stated that entry_id's belong to cat_id's. In your example however, HomeTeam and VisitingTeam have parity instead. Re-read the original question: Chuck Cheeze wrote: > Here is an example of my data: > > entry_id cat_id > 1 20 > 2 25 > 3 30 > 4 25 > 5 35 > 6 25 > 2 30 > 2 35 > 3 35 > > As you can see, entry_id's 2 and 3 both belong to cat_id 30 and 35 > > I have captured the cat_id's 30 and 35 with my script, so I need all > entry_id's that belong to BOTH cat_id 30 and 35. The OP wanted to indicate membership in a group. A membership relation does not contain duplicates. Your query wont work in your example because it doesn't report which visitors played those teams, since each team can either be at home or away. You've changed the problem, hence you need a new query. |
|
|||
|
Mike Lahey wrote:
> Jerry Stuckle wrote: >> >> No argument. >> >> But that was an additional condition the poster required - not the >> original op. And that's what makes it incorrect. > > Uniqueness is a consequence of the relationship the OP wanted to model. > Best practice is to create an index, which is the correct solution, as > has been pointed out several times. > > You should properly normalize your DB instead of working around a broken > design as you're arguing for. > Believe me - I know all about database design. I've been doing it for over 20 years. >> It may be very possible to have a perfectly normalized database but >> duplicate columns here - there could be a third column which is also >> part of the primary key, but not pertinent to this question, so was >> not asked. >> >> For instance, you might have: >> >> Game HomeTeam VisitingTeam >> 1 5 6 >> 2 7 8 >> 3 5 8 >> 4 3 4 >> 5 5 6 >> >> You want to know which teams played both team 6 and team 8. The Game >> column is not pertinent to the question - but is critical to the game >> design, because teams often play each other more than once. > > This is not the same relationship. Teams can play each other more than > once, but a well-formed set cannot contain the same element twice. > Ah, but it is. There is nothing in the ops statement that prohibits such a construct. It is completely normalized and correct. > Pay attention to the problem description which stated that entry_id's > belong to cat_id's. In your example however, HomeTeam and VisitingTeam > have parity instead. Re-read the original question: > I am paying attention to the problem description. But you're reading more into it than exists. > Chuck Cheeze wrote: > > Here is an example of my data: > > > > entry_id cat_id > > 1 20 > > 2 25 > > 3 30 > > 4 25 > > 5 35 > > 6 25 > > 2 30 > > 2 35 > > 3 35 > > > > As you can see, entry_id's 2 and 3 both belong to cat_id 30 and 35 > > > > I have captured the cat_id's 30 and 35 with my script, so I need all > > entry_id's that belong to BOTH cat_id 30 and 35. > > The OP wanted to indicate membership in a group. A membership relation > does not contain duplicates. Your query wont work in your example > because it doesn't report which visitors played those teams, since each > team can either be at home or away. > > You've changed the problem, hence you need a new query. > That depends on a lot of things. For instance, what if that third column represents a site the group belongs to (to keep it on the web, anyway). And the query from comp.databases.mysql works correctly - in both cases. Yours may work - but it isn't the best. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |
|
|||
|
I tried both queries, and the result is Jerry's method produces very
strange results. The normalized approach posted by petersprc does give the expected result though. For a table containing a a few thousand records with duplicates, Jerry's query returned 200 million rows (yes 200 million) after running for about 2 minutes. That's more rows than there were in the original table. I copied the query directly into a test case. DROP PROCEDURE IF EXISTS setup; DELIMITER // CREATE PROCEDURE setup () BEGIN DECLARE i INT DEFAULT 0; DROP TABLE IF EXISTS test; CREATE TABLE test (entry_id int, cat_id int); WHILE i < 10000 DO INSERT INTO test VALUES (2, 30), (2, 35), (3, 30), (3, 35); SET i = i + 1; END WHILE; END; // DELIMITER ; CALL setup(); DROP TABLE IF EXISTS result; CREATE TABLE result AS SELECT a.entry_id FROM test a INNER JOIN test b ON a.entry_id = b.entry_id WHERE a.cat_id = 30 AND b.cat_id = 35; The output is: Query OK, 200000000 rows affected (2 min 6.35 sec) Records: 200000000 Duplicates: 0 Warnings: 0 Jerry's approach results in a "cartesian explosion." -- Corey Jansen ccj9100@gmail.com |