Select unique based on 4 field values

This is a discussion on Select unique based on 4 field values within the MySQL Database forums, part of the Database Forums category; I have 'inherited' a database that has a lot of redundant info. The (only) table consists of 10 char(1) ...


Go Back   Usenet Forums > Database Forums > MySQL Database

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 06-26-2007
semi_evil@inbox.com
 
Posts: n/a
Default Select unique based on 4 field values

I have 'inherited' a database that has a lot of redundant info. The
(only) table consists of 10 char(1) fields followed by 4 unsigned
floats. There's over 3 million records in it and a quick study learned
that while there are no full doubles in it several of the char
combinations have the same related float values. Average selections
take between 2 and 5 seconds at the moment, which leads to user
complaints.

The char fields can be reduced to a more comprehensive form. Before I
remove the data I have to assert my theory is correct.

How can I make a selection that leaves those records for which all of
the 4 floats are equal to those in other records out?

Maybe I am not explaining this very clearly, I am not that familiar
with this material. Perhaps an example helps explain better:

row1 a b c d e f g h i j 0.123 0.456 0.78 0.90
row2 k l m n o p q r s t 0.123 0.456 0.78 0.90 <-- leave out

The table is MyISAM, no indices, no primary key. There are no rows for
which all 10 char fields are equal. I am aware their are additional
ways to improve performance, I'd first like to get rid of (what I
think amounts to) 70 to 80% redundancy. The floats have up to 6
decimals, they all are in the range 0.0 - 1.0
Thanks for your kind assistance.

Reply With Quote
  #2 (permalink)  
Old 06-26-2007
lark
 
Posts: n/a
Default Re: Select unique based on 4 field values

== Quote from semi_evil ( semi_evil@inbox.com)'s article
> I have 'inherited' a database that has a lot of redundant info. The
> (only) table consists of 10 char(1) fields followed by 4 unsigned
> floats. There's over 3 million records in it and a quick study learned
> that while there are no full doubles in it several of the char
> combinations have the same related float values. Average selections
> take between 2 and 5 seconds at the moment, which leads to user
> complaints.
> The char fields can be reduced to a more comprehensive form. Before I
> remove the data I have to assert my theory is correct.
> How can I make a selection that leaves those records for which all of
> the 4 floats are equal to those in other records out?
> Maybe I am not explaining this very clearly, I am not that familiar
> with this material. Perhaps an example helps explain better:
> row1 a b c d e f g h i j 0.123 0.456 0.78 0.90
> row2 k l m n o p q r s t 0.123 0.456 0.78 0.90 <-- leave out
> The table is MyISAM, no indices, no primary key. There are no rows for
> which all 10 char fields are equal. I am aware their are additional
> ways to improve performance, I'd first like to get rid of (what I
> think amounts to) 70 to 80% redundancy. The floats have up to 6
> decimals, they all are in the range 0.0 - 1.0
> Thanks for your kind assistance.



what do you want to do with them? delete them or leave them out of your result sets!!!
--
POST BY: lark with PHP News Reader
Reply With Quote
  #3 (permalink)  
Old 06-26-2007
Paul Lautman
 
Posts: n/a
Default Re: Select unique based on 4 field values

semi_evil@inbox.com wrote:
> I have 'inherited' a database that has a lot of redundant info. The
> (only) table consists of 10 char(1) fields followed by 4 unsigned
> floats. There's over 3 million records in it and a quick study learned
> that while there are no full doubles in it several of the char
> combinations have the same related float values. Average selections
> take between 2 and 5 seconds at the moment, which leads to user
> complaints.
>
> The char fields can be reduced to a more comprehensive form. Before I
> remove the data I have to assert my theory is correct.
>
> How can I make a selection that leaves those records for which all of
> the 4 floats are equal to those in other records out?
>
> Maybe I am not explaining this very clearly, I am not that familiar
> with this material. Perhaps an example helps explain better:
>
> row1 a b c d e f g h i j 0.123 0.456 0.78 0.90
> row2 k l m n o p q r s t 0.123 0.456 0.78 0.90 <-- leave out
>
> The table is MyISAM, no indices, no primary key. There are no rows for
> which all 10 char fields are equal. I am aware their are additional
> ways to improve performance, I'd first like to get rid of (what I
> think amounts to) 70 to 80% redundancy. The floats have up to 6
> decimals, they all are in the range 0.0 - 1.0
> Thanks for your kind assistance.


Can you add a primary key? If so add an auto-increment primary key column
and then you can use the Strawberry Query to do what you want. You haven't
given any names for the columns so I am going to assume that the table is
called bad_table the float columns are f1, f2, f3, f4 and the primary key
you will add is called id:

SELECT
`b2`.`*`
FROM `bad_table` `b1`
LEFT JOIN `bad_table` `b2` ON `b1`.`f`1` = `b2`.`f`1` AND `b1`.`f`2` =
`b2`.`f`2` AND `b1`.`f`3` = `b2`.`f`3` AND `b1`.`f`4` = `b2`.`f`4` AND
`b1`.`id` < `b2`.`id`
WHERE `b1`.`id` IS NULL


Reply With Quote
  #4 (permalink)  
Old 06-28-2007
semi_evil@inbox.com
 
Posts: n/a
Default Re: Select unique based on 4 field values

On Jun 26, 10:41 pm, "Paul Lautman" <paul.laut...@btinternet.com>
wrote:

>
> Can you add a primary key? If so add an auto-increment primary key column
> and then you can use the Strawberry Query to do what you want. You haven't
> given any names for the columns so I am going to assume that the table is
> called bad_table the float columns are f1, f2, f3, f4 and the primary key
> you will add is called id:
>
> SELECT
> `b2`.`*`
> FROM `bad_table` `b1`
> LEFT JOIN `bad_table` `b2` ON `b1`.`f`1` = `b2`.`f`1` AND `b1`.`f`2` =
> `b2`.`f`2` AND `b1`.`f`3` = `b2`.`f`3` AND `b1`.`f`4` = `b2`.`f`4` AND
> `b1`.`id` < `b2`.`id`
> WHERE `b1`.`id` IS NULL


That worked! Funny name. The table shrunk about 75%, I've combined all
the char fields into a single field now and added some PHP logic to
map user selections to a corresponding db row. Performance is
acceptable now.

Thanks for the help.

Reply With Quote
  #5 (permalink)  
Old 06-28-2007
strawberry
 
Posts: n/a
Default Re: Select unique based on 4 field values

On Jun 28, 6:36 am, semi_e...@inbox.com wrote:
> On Jun 26, 10:41 pm, "Paul Lautman" <paul.laut...@btinternet.com>
> wrote:
>
>
>
> > Can you add a primary key? If so add an auto-increment primary key column
> > and then you can use the Strawberry Query to do what you want. You haven't
> > given any names for the columns so I am going to assume that the table is
> > called bad_table the float columns are f1, f2, f3, f4 and the primary key
> > you will add is called id:

>
> > SELECT
> > `b2`.`*`
> > FROM `bad_table` `b1`
> > LEFT JOIN `bad_table` `b2` ON `b1`.`f`1` = `b2`.`f`1` AND `b1`.`f`2` =
> > `b2`.`f`2` AND `b1`.`f`3` = `b2`.`f`3` AND `b1`.`f`4` = `b2`.`f`4` AND
> > `b1`.`id` < `b2`.`id`
> > WHERE `b1`.`id` IS NULL

>
> That worked! Funny name. The table shrunk about 75%, I've combined all
> the char fields into a single field now and added some PHP logic to
> map user selections to a corresponding db row. Performance is
> acceptable now.
>
> Thanks for the help.



What's 'funny' about it? ;-)

Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 09:19 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0