This is a discussion on Select unique based on 4 field values within the MySQL Database forums, part of the Database Forums category; I have 'inherited' a database that has a lot of redundant info. The (only) table consists of 10 char(1) ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I have 'inherited' a database that has a lot of redundant info. The
(only) table consists of 10 char(1) fields followed by 4 unsigned floats. There's over 3 million records in it and a quick study learned that while there are no full doubles in it several of the char combinations have the same related float values. Average selections take between 2 and 5 seconds at the moment, which leads to user complaints. The char fields can be reduced to a more comprehensive form. Before I remove the data I have to assert my theory is correct. How can I make a selection that leaves those records for which all of the 4 floats are equal to those in other records out? Maybe I am not explaining this very clearly, I am not that familiar with this material. Perhaps an example helps explain better: row1 a b c d e f g h i j 0.123 0.456 0.78 0.90 row2 k l m n o p q r s t 0.123 0.456 0.78 0.90 <-- leave out The table is MyISAM, no indices, no primary key. There are no rows for which all 10 char fields are equal. I am aware their are additional ways to improve performance, I'd first like to get rid of (what I think amounts to) 70 to 80% redundancy. The floats have up to 6 decimals, they all are in the range 0.0 - 1.0 Thanks for your kind assistance. |
|
|||
|
== Quote from semi_evil ( semi_evil@inbox.com)'s article
> I have 'inherited' a database that has a lot of redundant info. The > (only) table consists of 10 char(1) fields followed by 4 unsigned > floats. There's over 3 million records in it and a quick study learned > that while there are no full doubles in it several of the char > combinations have the same related float values. Average selections > take between 2 and 5 seconds at the moment, which leads to user > complaints. > The char fields can be reduced to a more comprehensive form. Before I > remove the data I have to assert my theory is correct. > How can I make a selection that leaves those records for which all of > the 4 floats are equal to those in other records out? > Maybe I am not explaining this very clearly, I am not that familiar > with this material. Perhaps an example helps explain better: > row1 a b c d e f g h i j 0.123 0.456 0.78 0.90 > row2 k l m n o p q r s t 0.123 0.456 0.78 0.90 <-- leave out > The table is MyISAM, no indices, no primary key. There are no rows for > which all 10 char fields are equal. I am aware their are additional > ways to improve performance, I'd first like to get rid of (what I > think amounts to) 70 to 80% redundancy. The floats have up to 6 > decimals, they all are in the range 0.0 - 1.0 > Thanks for your kind assistance. what do you want to do with them? delete them or leave them out of your result sets!!! -- POST BY: lark with PHP News Reader |
|
|||
|
semi_evil@inbox.com wrote:
> I have 'inherited' a database that has a lot of redundant info. The > (only) table consists of 10 char(1) fields followed by 4 unsigned > floats. There's over 3 million records in it and a quick study learned > that while there are no full doubles in it several of the char > combinations have the same related float values. Average selections > take between 2 and 5 seconds at the moment, which leads to user > complaints. > > The char fields can be reduced to a more comprehensive form. Before I > remove the data I have to assert my theory is correct. > > How can I make a selection that leaves those records for which all of > the 4 floats are equal to those in other records out? > > Maybe I am not explaining this very clearly, I am not that familiar > with this material. Perhaps an example helps explain better: > > row1 a b c d e f g h i j 0.123 0.456 0.78 0.90 > row2 k l m n o p q r s t 0.123 0.456 0.78 0.90 <-- leave out > > The table is MyISAM, no indices, no primary key. There are no rows for > which all 10 char fields are equal. I am aware their are additional > ways to improve performance, I'd first like to get rid of (what I > think amounts to) 70 to 80% redundancy. The floats have up to 6 > decimals, they all are in the range 0.0 - 1.0 > Thanks for your kind assistance. Can you add a primary key? If so add an auto-increment primary key column and then you can use the Strawberry Query to do what you want. You haven't given any names for the columns so I am going to assume that the table is called bad_table the float columns are f1, f2, f3, f4 and the primary key you will add is called id: SELECT `b2`.`*` FROM `bad_table` `b1` LEFT JOIN `bad_table` `b2` ON `b1`.`f`1` = `b2`.`f`1` AND `b1`.`f`2` = `b2`.`f`2` AND `b1`.`f`3` = `b2`.`f`3` AND `b1`.`f`4` = `b2`.`f`4` AND `b1`.`id` < `b2`.`id` WHERE `b1`.`id` IS NULL |
|
|||
|
On Jun 26, 10:41 pm, "Paul Lautman" <paul.laut...@btinternet.com>
wrote: > > Can you add a primary key? If so add an auto-increment primary key column > and then you can use the Strawberry Query to do what you want. You haven't > given any names for the columns so I am going to assume that the table is > called bad_table the float columns are f1, f2, f3, f4 and the primary key > you will add is called id: > > SELECT > `b2`.`*` > FROM `bad_table` `b1` > LEFT JOIN `bad_table` `b2` ON `b1`.`f`1` = `b2`.`f`1` AND `b1`.`f`2` = > `b2`.`f`2` AND `b1`.`f`3` = `b2`.`f`3` AND `b1`.`f`4` = `b2`.`f`4` AND > `b1`.`id` < `b2`.`id` > WHERE `b1`.`id` IS NULL That worked! Funny name. The table shrunk about 75%, I've combined all the char fields into a single field now and added some PHP logic to map user selections to a corresponding db row. Performance is acceptable now. Thanks for the help. |
|
|||
|
On Jun 28, 6:36 am, semi_e...@inbox.com wrote:
> On Jun 26, 10:41 pm, "Paul Lautman" <paul.laut...@btinternet.com> > wrote: > > > > > Can you add a primary key? If so add an auto-increment primary key column > > and then you can use the Strawberry Query to do what you want. You haven't > > given any names for the columns so I am going to assume that the table is > > called bad_table the float columns are f1, f2, f3, f4 and the primary key > > you will add is called id: > > > SELECT > > `b2`.`*` > > FROM `bad_table` `b1` > > LEFT JOIN `bad_table` `b2` ON `b1`.`f`1` = `b2`.`f`1` AND `b1`.`f`2` = > > `b2`.`f`2` AND `b1`.`f`3` = `b2`.`f`3` AND `b1`.`f`4` = `b2`.`f`4` AND > > `b1`.`id` < `b2`.`id` > > WHERE `b1`.`id` IS NULL > > That worked! Funny name. The table shrunk about 75%, I've combined all > the char fields into a single field now and added some PHP logic to > map user selections to a corresponding db row. Performance is > acceptable now. > > Thanks for the help. What's 'funny' about it? ;-) |