This is a discussion on Optimization problem within the MySQL Database forums, part of the Database Forums category; I'm moving from MS SQL Server to MySQL (which I've never used before) and I'm running into ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I'm moving from MS SQL Server to MySQL (which I've never used before)
and I'm running into some problems with a few of my stored procedures. (I'm not a DBA or an expert on data-mining, so please excuse me if I get any terminology wrong.) I have a number of queries that are used for data-mining, basically just counting how often one tag is associated with another. Matching two tags together takes 4 seconds to run in SQL Server and 27 seconds in MySQL, for the same query on the same development machine. Matching three tags (two antecedent and one consequent) takes 8 seconds in SQL Server and 13 minutes in MySQL. Clearly SQL Server is optimizing my query for me: it doesn't grow exponentially when I add more factors, but with MySQL it does. I've looked in the Execution Plan in SQL Server but can't make sense of it. This is the slowest portion of my actual query (the simplest one: one antecedent tag and one consequent). The iq table is a temp table that matches every tag (as ant_tag) with every other tag (con_tag). I need to find out how often a film with each ant_tag also has each con_tag (lnk_film_tag is where they're linked). SELECT iq.ant_tag, iq.con_tag, COUNT(*) as prob_count FROM iq INNER JOIN film_operational.lnk_film_tag film_list_1 ON iq.ant_tag = film_list_1.tag_ID INNER JOIN film_operational.lnk_film_tag film_list_2 ON iq.con_tag = film_list_2.tag_ID AND film_list_1.film_ID = film_list_2.film_ID GROUP BY iq.ant_tag, iq.con_tag; Running EXPLAIN tells me that a full index scan is run on film_list_1 each time, for 107824 rows. Is there a way to avoid this from happening or is it unavoidable for this type of query? If it is unavoidable, why is it that the runtime doesn't grow exponentially when I add factors (more antecedent tags) in MS SQL Server and it does in MySQL? It seems like MS SQL Server is optimizing the queries for me. Any other tips on how to improve performance? Thanks! |
|
|||
|
If anyone's interested, I found the problem: MySQL handles self-joins
really badly. Copying the table to a temp table and joining that back to the original table cuts the runtime exponentially. dan@rottytooth.com wrote: > I'm moving from MS SQL Server to MySQL (which I've never used before) > and I'm running into some problems with a few of my stored procedures. > > (I'm not a DBA or an expert on data-mining, so please excuse me if I > get any terminology wrong.) > > I have a number of queries that are used for data-mining, basically > just counting how often one tag is associated with another. Matching > two tags together takes 4 seconds to run in SQL Server and 27 seconds > in MySQL, for the same query on the same development machine. Matching > three tags (two antecedent and one consequent) takes 8 seconds in SQL > Server and 13 minutes in MySQL. Clearly SQL Server is optimizing my > query for me: it doesn't grow exponentially when I add more factors, > but with MySQL it does. I've looked in the Execution Plan in SQL Server > but can't make sense of it. > > This is the slowest portion of my actual query (the simplest one: one > antecedent tag and one consequent). > > The iq table is a temp table that matches every tag (as ant_tag) with > every other tag (con_tag). I need to find out how often a film with > each ant_tag also has each con_tag (lnk_film_tag is where they're > linked). > > SELECT iq.ant_tag, iq.con_tag, COUNT(*) as prob_count > FROM iq > INNER JOIN film_operational.lnk_film_tag film_list_1 > ON iq.ant_tag = film_list_1.tag_ID > INNER JOIN film_operational.lnk_film_tag film_list_2 > ON iq.con_tag = film_list_2.tag_ID > AND film_list_1.film_ID = film_list_2.film_ID > GROUP BY iq.ant_tag, iq.con_tag; > > Running EXPLAIN tells me that a full index scan is run on film_list_1 > each time, for 107824 rows. Is there a way to avoid this from happening > or is it unavoidable for this type of query? > > If it is unavoidable, why is it that the runtime doesn't grow > exponentially when I add factors (more antecedent tags) in MS SQL > Server and it does in MySQL? It seems like MS SQL Server is optimizing > the queries for me. > > Any other tips on how to improve performance? > > Thanks! |