This is a discussion on SELECT...LIMIT that does a partial select? within the MySQL Database forums, part of the Database Forums category; I am in big trouble with MySQL 5. I have a huge table, about 25 GIGABYTES, that is a cache ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I am in big trouble with MySQL 5.
I have a huge table, about 25 GIGABYTES, that is a cache for hard to compute items. Each row has two timestamps, last updated and last retrieved timestamp, some metadata, key and a BLOB. I would like to clean that table based on timestamps. I used to clean it periodically, but doing a big DELETE...WHERE query locks it up for HOURS and that means downtime for my website, lost money, pissed off tutors, etc. So, I thought, I would write a process that would do SELECT id FROM bincache WHERE ... LIMIT 100 and that would run quickly to give me 100 IDs, that would then DELETE at my leisure without locking up the table. Repeat ad infinitum. Well, I just learned that LIMIT 100 is applied after ALL select work is done, which means that it would have to search gigabytes of data and return a many gigabyte result, only to trim it to 100 rows at the last moment. Even SELECT..LIMIT 1 takes forever and I had to kill it. How can I get around this awful problem. i |
|
|||
|
Ignoramus24559 wrote:
> I am in big trouble with MySQL 5. > > I have a huge table, about 25 GIGABYTES, that is a cache for hard to > compute items. Each row has two timestamps, last updated and last > retrieved timestamp, some metadata, key and a BLOB. > > I would like to clean that table based on timestamps. I used to clean > it periodically, but doing a big DELETE...WHERE query locks it up for > HOURS and that means downtime for my website, lost money, pissed off > tutors, etc. > > So, I thought, I would write a process that would do > > SELECT id FROM bincache WHERE ... LIMIT 100 > > and that would run quickly to give me 100 IDs, that would then DELETE > at my leisure without locking up the table. Repeat ad infinitum. > > Well, I just learned that LIMIT 100 is applied after ALL select work > is done, which means that it would have to search gigabytes of data > and return a many gigabyte result, only to trim it to 100 rows at the > last moment. > > Even SELECT..LIMIT 1 takes forever and I had to kill it. > > How can I get around this awful problem. If you ORDER BY something that is indexed then it will read the table in the order of the index and stop after it has match 100 rows, it should be very fast. -- Brian Wakem Email: http://homepage.ntlworld.com/b.wakem/myemail.png |
|
|||
|
On Fri, 15 Dec 2006 19:25:22 +0000, Brian Wakem <no@email.com> wrote:
> Ignoramus24559 wrote: > >> I am in big trouble with MySQL 5. >> >> I have a huge table, about 25 GIGABYTES, that is a cache for hard to >> compute items. Each row has two timestamps, last updated and last >> retrieved timestamp, some metadata, key and a BLOB. >> >> I would like to clean that table based on timestamps. I used to clean >> it periodically, but doing a big DELETE...WHERE query locks it up for >> HOURS and that means downtime for my website, lost money, pissed off >> tutors, etc. >> >> So, I thought, I would write a process that would do >> >> SELECT id FROM bincache WHERE ... LIMIT 100 >> >> and that would run quickly to give me 100 IDs, that would then DELETE >> at my leisure without locking up the table. Repeat ad infinitum. >> >> Well, I just learned that LIMIT 100 is applied after ALL select work >> is done, which means that it would have to search gigabytes of data >> and return a many gigabyte result, only to trim it to 100 rows at the >> last moment. >> >> Even SELECT..LIMIT 1 takes forever and I had to kill it. >> >> How can I get around this awful problem. > > > If you ORDER BY something that is indexed then it will read the table in the > order of the index and stop after it has match 100 rows, it should be very > fast. > > Brian, this is beautiful and seems to work. At least when I am making my selects without having a lot of entries cleaned in the beginning of the index. I am a little concerned that this select...order by would become slower if it goes through the index, first scrolling through thousands of entries that already are devoid of deletable items. Anyway, I will see what I can do... i |
|
|||
|
Ignoramus24559 <ignoramus24559@NOSPAM.24559.invalid> wrote:
> I am in big trouble with MySQL 5. > > I have a huge table, about 25 GIGABYTES, that is a cache for hard to > compute items. Each row has two timestamps, last updated and last > retrieved timestamp, some metadata, key and a BLOB. > > I would like to clean that table based on timestamps. I used to clean > it periodically, but doing a big DELETE...WHERE query locks it up for > HOURS and that means downtime for my website, lost money, pissed off > tutors, etc. First: this design does not seem to be very clever. You have 25GB worth of "hard to compute" binary stuff? Why do you put it in the database anyway? Wouldn't it be better to keep that in plain old files? On the web server(s)? That should scale much better. Plus, filesystems are quite good at dealing with binary data. Better than most database engines. Next: you use a MyISAM table for that. Not good. With variable length records and heavy fluctuation you will experience tablespace fragmentation very soon. You would be better off with an InnoDB table. I can also imagine a set of MyISAM tables, bundled by the MERGE engine. If you partition your data by timestamp, cleaning out old entries would become as simple as CREATE fresh_table, ALTER merge_table, DROP oldest_table. > So, I thought, I would write a process that would do > > SELECT id FROM bincache WHERE ... LIMIT 100 > > and that would run quickly to give me 100 IDs, that would then DELETE > at my leisure without locking up the table. Locking is a MyISAM only problem. Also you can do DELETE ... LIMIT. No need to break that in two steps. However, deleting from a big table is expensive: 1. the rows to be deleted must be found 2. rows must be deleted from tablespace (this may cause tablespace reorganisation) 3. indexes must be updated If you delete big part of a table (say: 80% of the records) you will be better off with the following approach: 1. create a new table with the same layout 2. populate the new table with INSERT .... SELECT with all records you plan to keep 3. rename tables (this is an atomic operation) 4. drop the old table > Well, I just learned that LIMIT 100 is applied after ALL select work > is done, This clearly depends on your query. If MySQL does not need to materialize the result set (i.e. for sorting), it stops just after it processed the requested number of rows. XL -- Axel Schwenke, Senior Software Developer, MySQL AB Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/ MySQL User Forums: http://forums.mysql.com/ |
|
|||
|
Axel Schwenke wrote: > Ignoramus24559 <ignoramus24559@NOSPAM.24559.invalid> wrote: > > I am in big trouble with MySQL 5. > > > > I have a huge table, about 25 GIGABYTES, that is a cache for hard to > > compute items. Each row has two timestamps, last updated and last > > retrieved timestamp, some metadata, key and a BLOB. > > > > I would like to clean that table based on timestamps. I used to clean > > it periodically, but doing a big DELETE...WHERE query locks it up for > > HOURS and that means downtime for my website, lost money, pissed off > > tutors, etc. > > First: this design does not seem to be very clever. You have 25GB > worth of "hard to compute" binary stuff? Why do you put it in the > database anyway? Wouldn't it be better to keep that in plain old > files? On the web server(s)? That should scale much better. Plus, > filesystems are quite good at dealing with binary data. Better than > most database engines. > > Next: you use a MyISAM table for that. Not good. With variable length > records and heavy fluctuation you will experience tablespace > fragmentation very soon. You would be better off with an InnoDB table. > I can also imagine a set of MyISAM tables, bundled by the MERGE engine. > If you partition your data by timestamp, cleaning out old entries would > become as simple as CREATE fresh_table, ALTER merge_table, DROP > oldest_table. > > > So, I thought, I would write a process that would do > > > > SELECT id FROM bincache WHERE ... LIMIT 100 > > > > and that would run quickly to give me 100 IDs, that would then DELETE > > at my leisure without locking up the table. > > Locking is a MyISAM only problem. > > Also you can do DELETE ... LIMIT. No need to break that in two steps. > However, deleting from a big table is expensive: > > 1. the rows to be deleted must be found > 2. rows must be deleted from tablespace (this may cause tablespace > reorganisation) > 3. indexes must be updated > > If you delete big part of a table (say: 80% of the records) you will be > better off with the following approach: > > 1. create a new table with the same layout > 2. populate the new table with INSERT .... SELECT with all records > you plan to keep > 3. rename tables (this is an atomic operation) > 4. drop the old table > > > Well, I just learned that LIMIT 100 is applied after ALL select work > > is done, > > This clearly depends on your query. If MySQL does not need to > materialize the result set (i.e. for sorting), it stops just after > it processed the requested number of rows. > > > XL > -- > Axel Schwenke, Senior Software Developer, MySQL AB > > Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/ > MySQL User Forums: http://forums.mysql.com/ IMNSHO, using partitioned tables would be a much better alternative. I have administered real databases in the multi-terabyte range with this approach. create tablespace p0... create tablespace p1 etc... CREATE TABLE t1 ( id INT, year_col INT ) PARTITION BY RANGE (year_col) ( PARTITION p0 VALUES LESS THAN (1991), PARTITION p1 VALUES LESS THAN (1995), PARTITION p2 VALUES LESS THAN (1999) ); You can add a new partition p3 to this table for storing values less then 2002 as follows: ALTER TABLE t1 ADD PARTITION (PARTITION p3 VALUES LESS THAN (2002)); Then look at how to drop the older partitions. One thing you need to do is to also partition your indexes (it if is allowed --I have not exhausted the docs on this topic...) such that if you drop the oldest partition - you do so on the indexes as well. Someone once said in this forum that in todays storage array technology it is not necessary to seperate data and indexes onto seperate filesystems (or disk drives depending on your platform). I would have to disagree because the problem is not necessarily on the storage array, but the shear volume of I/O's the OS can queue up on a single logical device. No OS handles hundreds or thousands of I/O's per second to a single-LVM very well unless your "SYSTEM" is designed properly. You must continue to "DESIGN" your system to match the requirements - don't just throw it together based on random pieces of comments (including mine) and expect it to perform optimally. If someone says `it "should" perform if you....`, RUN!!! Unless they have designed it before in a proven production environment, then they are just guessing. It may be a very good guess, but it is a guess nevertheless... Database System designers, IMHO, should have DBA, SA, Network and Storage Administration skills, because to design these types of systems, you must have an intimate, working knowlege of all of these disciplines and MUST work very closely with their counterparts in these areas. Database Consultant. |
|
|||
|
"onedbguru" <onedbguru@yahoo.com> wrote:
> > Axel Schwenke wrote: [Fullquote, including signature] Please don't fullquote! > IMNSHO, using partitioned tables would be a much better alternative. I > have administered real databases in the multi-terabyte range with this > approach. > > create tablespace p0... > create tablespace p1 etc... > > CREATE TABLE t1 ( > id INT, > year_col INT > ) > PARTITION BY RANGE (year_col) ( > PARTITION p0 VALUES LESS THAN (1991), > PARTITION p1 VALUES LESS THAN (1995), > PARTITION p2 VALUES LESS THAN (1999) > ); Did you notice the word "mysql" in this groups name? Your suggestion does not work with MySQL. In fact MySQL 5.1 does support partitions - but it is currently BETA and the syntax is slightly different. MERGE tables (as suggested by me) are the poor mans approximation of partitions, available in stable MySQL editions. However, the OPs problem is, that DELETE on a large table a) may take a long time and b) locks that table His original approach (deleting small chunks) tackles a). Using an InnoDB table would eliminate b). Partitions won't solve any of those in the first place - only if DELETE is implemented as DROP PARTITION. XL -- Axel Schwenke, Senior Software Developer, MySQL AB Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/ MySQL User Forums: http://forums.mysql.com/ |
|
|||
|
Axel Schwenke wrote:
> "onedbguru" <onedbguru@yahoo.com> wrote: > > > > IMNSHO, using partitioned tables would be a much better alternative. > > I have administered real databases in the multi-terabyte range with > > this approach. > > > > create tablespace p0... > > create tablespace p1 etc... > > > > CREATE TABLE t1 ( > > id INT, > > year_col INT > > ) > > PARTITION BY RANGE (year_col) ( > > PARTITION p0 VALUES LESS THAN (1991), > > PARTITION p1 VALUES LESS THAN (1995), > > PARTITION p2 VALUES LESS THAN (1999) > > ); > > Did you notice the word "mysql" in this groups name? Your suggestion > does not work with MySQL. In fact MySQL 5.1 does support partitions - > but it is currently BETA and the syntax is slightly different. > MERGE tables (as suggested by me) are the poor mans approximation of > partitions, available in stable MySQL editions. UM, no. I've been using them since 3.x, and they work great for huge amounts of data. |