This is a discussion on Settle a debate - to index or not to index? within the MySQL Database forums, part of the Database Forums category; I have 2 tables: mysql> desc a; +---------+------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +---------+------------------+------+-----+---------+-------+ | item_id | int(10) unsigned | NO | | 0 | | +---------+------------------+------+-----+---------+-------+ ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I have 2 tables:
mysql> desc a; +---------+------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +---------+------------------+------+-----+---------+-------+ | item_id | int(10) unsigned | NO | | 0 | | +---------+------------------+------+-----+---------+-------+ 1 row in set (0.00 sec) mysql> select count(*) from a; +----------+ | count(*) | +----------+ | 38364373 | +----------+ 1 row in set (0.00 sec) mysql> desc b; +---------+---------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +---------+---------------------+------+-----+---------+----------------+ | item_id | int(10) unsigned | NO | PRI | NULL | auto_increment | | col2 | int(10) unsigned | NO | MUL | | | +---------+---------------------+------+-----+---------+----------------+ 5 rows in set (0.01 sec) mysql> show index from b; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | b | 0 | PRIMARY | 1 | item_id | A | 60688657 | NULL | NULL | | BTREE | | | b | 1 | col2 | 1 | col2 | A | NULL | NULL | NULL | | BTREE | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ 3 rows in set (0.00 sec) mysql> select count(*) from b; +----------+ | count(*) | +----------+ | 60688657 | +----------+ 1 row in set (0.00 sec) There are no duplicates in table a. I want to run the following query: SELECT col2 FROM a JOIN b USING(`item_id`) I am arguing that an index (a primary key) on the only column in table a will not make any difference whatsoever. Our DBA is telling me otherwise, claiming that w/o an index "it will go thru the whole table every time it can't find a matching row." It is not clear to me what he is trying to say; I disagree that it will be any slower to run the query w/o an index on table a. I see mysql doing the following: going thru each row in table a using natural row order and joining table b on it. On the join it uses the index on `item_id` in table b to see if there is a match and selects col2 if found. I do not see how an index on table a could optimize this process. The DBA also claims that because it would do lookups out of order, it would have to "seek" more. I don't think one can assume which way it would be faster - it all depends on how the data is physically laid out on disk, which we don't know, so no assumptions can be made either way. He also mentioned having to move the row pointer more; I do not know enough about the innards of mysql to make a comment on this. EXPLAIN seems to support my reasoning: I created a copy of table a and added a primary key on it; EXPLAIN says it will scan the same number of rows in both cases - all of them: mysql> EXPLAIN SELECT col2 FROM a JOIN b USING(`item_id`); +----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+ | 1 | SIMPLE | a | ALL | NULL | NULL | NULL | NULL | 38364373 | | | 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4 | a.item_id | 1 | Using where | +----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+ 2 rows in set (0.00 sec) mysql> EXPLAIN SELECT col2 FROM a_indexed JOIN b USING(`item_id`); +----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+ | 1 | SIMPLE | a_indexed | index | PRIMARY | PRIMARY | 4 | NULL | 38364373 | Using index | | 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4 | a_indexed.item_id | 1 | Using where | +----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+ 2 rows in set (0.01 sec) The DBA said that EXPLAIN is just an approximation. While I agree with this statement, I don't see how it applies to this problem. Am I wrong? Am I right? Please enlighten me (with a detailed technical explanation). P.S. The names of the tables & columns have been changed for privacy. |
|
|||
|
Duke Ionescu wrote:
> I have 2 tables: > > mysql> desc a; > +---------+------------------+------+-----+---------+-------+ > | Field | Type | Null | Key | Default | Extra | > +---------+------------------+------+-----+---------+-------+ > | item_id | int(10) unsigned | NO | | 0 | | > +---------+------------------+------+-----+---------+-------+ > 1 row in set (0.00 sec) > > mysql> select count(*) from a; > +----------+ > | count(*) | > +----------+ > | 38364373 | > +----------+ > 1 row in set (0.00 sec) > > mysql> desc b; > +---------+---------------------+------+-----+---------+----------------+ > | Field | Type | Null | Key | Default | Extra > | > +---------+---------------------+------+-----+---------+----------------+ > | item_id | int(10) unsigned | NO | PRI | NULL | auto_increment > | > | col2 | int(10) unsigned | NO | MUL | | > | > +---------+---------------------+------+-----+---------+----------------+ > 5 rows in set (0.01 sec) > > mysql> show index from b; > +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ > | Table | Non_unique | Key_name | Seq_in_index | Column_name | > Collation | Cardinality | Sub_part | Packed | Null | Index_type | > Comment | > +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ > | b | 0 | PRIMARY | 1 | item_id | A > | 60688657 | NULL | NULL | | BTREE | | > | b | 1 | col2 | 1 | col2 | A > | NULL | NULL | NULL | | BTREE | | > +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ > 3 rows in set (0.00 sec) > > mysql> select count(*) from b; > +----------+ > | count(*) | > +----------+ > | 60688657 | > +----------+ > 1 row in set (0.00 sec) > > There are no duplicates in table a. I want to run the following query: > SELECT col2 FROM a JOIN b USING(`item_id`) > > I am arguing that an index (a primary key) on the only column in table > a will not make any difference whatsoever. Our DBA is telling me > otherwise, claiming that w/o an index "it will go thru the whole table > every time it can't find a matching row." It is not clear to me what > he is trying to say; I disagree that it will be any slower to run the > query w/o an index on table a. > > I see mysql doing the following: going thru each row in table a using > natural row order and joining table b on it. On the join it uses the > index on `item_id` in table b to see if there is a match and selects > col2 if found. I do not see how an index on table a could optimize > this process. > > The DBA also claims that because it would do lookups out of order, it > would have to "seek" more. I don't think one can assume which way it > would be faster - it all depends on how the data is physically laid out > on disk, which we don't know, so no assumptions can be made either way. > He also mentioned having to move the row pointer more; I do not know > enough about the innards of mysql to make a comment on this. > > EXPLAIN seems to support my reasoning: I created a copy of table a and > added a primary key on it; EXPLAIN says it will scan the same number of > rows in both cases - all of them: > mysql> EXPLAIN SELECT col2 FROM a JOIN b USING(`item_id`); > +----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+ > | id | select_type | table | type | possible_keys | key | key_len > | ref | rows | Extra | > +----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+ > | 1 | SIMPLE | a | ALL | NULL | NULL | NULL > | NULL | 38364373 | | > | 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4 > | a.item_id | 1 | Using where | > +----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+ > 2 rows in set (0.00 sec) > > mysql> EXPLAIN SELECT col2 FROM a_indexed JOIN b USING(`item_id`); > +----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+ > | id | select_type | table | type | possible_keys | key | > key_len | ref | rows | Extra | > +----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+ > | 1 | SIMPLE | a_indexed | index | PRIMARY | PRIMARY | 4 > | NULL | 38364373 | Using index | > | 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4 > | a_indexed.item_id | 1 | Using where | > +----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+ > 2 rows in set (0.01 sec) > > The DBA said that EXPLAIN is just an approximation. While I agree with > this statement, I don't see how it applies to this problem. > > Am I wrong? Am I right? Please enlighten me (with a detailed technical > explanation). > > > P.S. The names of the tables & columns have been changed for privacy. > I would argue that your DBA is correct, When you select from a_indexed, an index would allow you to select in numeric order. As the join values would now be in numeric order, there would potentially be less I/O on table b. For instance - with an index it could select in order of 1,2,3,4, etc. But without the index it could select in the order of 25694, 30125, 81320, 301976, etc. The first would be able to use many of the rows already buffered by the index in b. The second would have to select several different blocks from the index for b. Now - if your buffers are big enough to hold all the indexes, than probably MySQLr would eventually find everything in the buffer (you'd only have to read each block of the index once). But if your buffers are too small, the second may have to re-read the same block several times. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |
![]() |
| Thread Tools | |
| Display Modes | |
|
|