Bluehost.com Web Hosting $6.95

Optimization problem

This is a discussion on Optimization problem within the MySQL Database forums, part of the Database Forums category; I'm moving from MS SQL Server to MySQL (which I've never used before) and I'm running into ...


Go Back   Usenet Forums > Database Forums > MySQL Database

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 12-17-2006
dan@rottytooth.com
 
Posts: n/a
Default Optimization problem

I'm moving from MS SQL Server to MySQL (which I've never used before)
and I'm running into some problems with a few of my stored procedures.

(I'm not a DBA or an expert on data-mining, so please excuse me if I
get any terminology wrong.)

I have a number of queries that are used for data-mining, basically
just counting how often one tag is associated with another. Matching
two tags together takes 4 seconds to run in SQL Server and 27 seconds
in MySQL, for the same query on the same development machine. Matching
three tags (two antecedent and one consequent) takes 8 seconds in SQL
Server and 13 minutes in MySQL. Clearly SQL Server is optimizing my
query for me: it doesn't grow exponentially when I add more factors,
but with MySQL it does. I've looked in the Execution Plan in SQL Server
but can't make sense of it.

This is the slowest portion of my actual query (the simplest one: one
antecedent tag and one consequent).

The iq table is a temp table that matches every tag (as ant_tag) with
every other tag (con_tag). I need to find out how often a film with
each ant_tag also has each con_tag (lnk_film_tag is where they're
linked).

SELECT iq.ant_tag, iq.con_tag, COUNT(*) as prob_count
FROM iq
INNER JOIN film_operational.lnk_film_tag film_list_1
ON iq.ant_tag = film_list_1.tag_ID
INNER JOIN film_operational.lnk_film_tag film_list_2
ON iq.con_tag = film_list_2.tag_ID
AND film_list_1.film_ID = film_list_2.film_ID
GROUP BY iq.ant_tag, iq.con_tag;

Running EXPLAIN tells me that a full index scan is run on film_list_1
each time, for 107824 rows. Is there a way to avoid this from happening
or is it unavoidable for this type of query?

If it is unavoidable, why is it that the runtime doesn't grow
exponentially when I add factors (more antecedent tags) in MS SQL
Server and it does in MySQL? It seems like MS SQL Server is optimizing
the queries for me.

Any other tips on how to improve performance?

Thanks!

Reply With Quote
  #2 (permalink)  
Old 12-27-2006
dan@rottytooth.com
 
Posts: n/a
Default Re: Optimization problem

If anyone's interested, I found the problem: MySQL handles self-joins
really badly. Copying the table to a temp table and joining that back
to the original table cuts the runtime exponentially.

dan@rottytooth.com wrote:
> I'm moving from MS SQL Server to MySQL (which I've never used before)
> and I'm running into some problems with a few of my stored procedures.
>
> (I'm not a DBA or an expert on data-mining, so please excuse me if I
> get any terminology wrong.)
>
> I have a number of queries that are used for data-mining, basically
> just counting how often one tag is associated with another. Matching
> two tags together takes 4 seconds to run in SQL Server and 27 seconds
> in MySQL, for the same query on the same development machine. Matching
> three tags (two antecedent and one consequent) takes 8 seconds in SQL
> Server and 13 minutes in MySQL. Clearly SQL Server is optimizing my
> query for me: it doesn't grow exponentially when I add more factors,
> but with MySQL it does. I've looked in the Execution Plan in SQL Server
> but can't make sense of it.
>
> This is the slowest portion of my actual query (the simplest one: one
> antecedent tag and one consequent).
>
> The iq table is a temp table that matches every tag (as ant_tag) with
> every other tag (con_tag). I need to find out how often a film with
> each ant_tag also has each con_tag (lnk_film_tag is where they're
> linked).
>
> SELECT iq.ant_tag, iq.con_tag, COUNT(*) as prob_count
> FROM iq
> INNER JOIN film_operational.lnk_film_tag film_list_1
> ON iq.ant_tag = film_list_1.tag_ID
> INNER JOIN film_operational.lnk_film_tag film_list_2
> ON iq.con_tag = film_list_2.tag_ID
> AND film_list_1.film_ID = film_list_2.film_ID
> GROUP BY iq.ant_tag, iq.con_tag;
>
> Running EXPLAIN tells me that a full index scan is run on film_list_1
> each time, for 107824 rows. Is there a way to avoid this from happening
> or is it unavoidable for this type of query?
>
> If it is unavoidable, why is it that the runtime doesn't grow
> exponentially when I add factors (more antecedent tags) in MS SQL
> Server and it does in MySQL? It seems like MS SQL Server is optimizing
> the queries for me.
>
> Any other tips on how to improve performance?
>
> Thanks!


Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT +1. The time now is 03:19 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0