SELECT...LIMIT that does a partial select?

This is a discussion on SELECT...LIMIT that does a partial select? within the MySQL Database forums, part of the Database Forums category; I am in big trouble with MySQL 5. I have a huge table, about 25 GIGABYTES, that is a cache ...


Go Back   Usenet Forums > Database Forums > MySQL Database

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 12-15-2006
Ignoramus24559
 
Posts: n/a
Default SELECT...LIMIT that does a partial select?

I am in big trouble with MySQL 5.

I have a huge table, about 25 GIGABYTES, that is a cache for hard to
compute items. Each row has two timestamps, last updated and last
retrieved timestamp, some metadata, key and a BLOB.

I would like to clean that table based on timestamps. I used to clean
it periodically, but doing a big DELETE...WHERE query locks it up for
HOURS and that means downtime for my website, lost money, pissed off
tutors, etc.

So, I thought, I would write a process that would do

SELECT id FROM bincache WHERE ... LIMIT 100

and that would run quickly to give me 100 IDs, that would then DELETE
at my leisure without locking up the table. Repeat ad infinitum.

Well, I just learned that LIMIT 100 is applied after ALL select work
is done, which means that it would have to search gigabytes of data
and return a many gigabyte result, only to trim it to 100 rows at the
last moment.

Even SELECT..LIMIT 1 takes forever and I had to kill it.

How can I get around this awful problem.

i
Reply With Quote
  #2 (permalink)  
Old 12-15-2006
Brian Wakem
 
Posts: n/a
Default Re: SELECT...LIMIT that does a partial select?

Ignoramus24559 wrote:

> I am in big trouble with MySQL 5.
>
> I have a huge table, about 25 GIGABYTES, that is a cache for hard to
> compute items. Each row has two timestamps, last updated and last
> retrieved timestamp, some metadata, key and a BLOB.
>
> I would like to clean that table based on timestamps. I used to clean
> it periodically, but doing a big DELETE...WHERE query locks it up for
> HOURS and that means downtime for my website, lost money, pissed off
> tutors, etc.
>
> So, I thought, I would write a process that would do
>
> SELECT id FROM bincache WHERE ... LIMIT 100
>
> and that would run quickly to give me 100 IDs, that would then DELETE
> at my leisure without locking up the table. Repeat ad infinitum.
>
> Well, I just learned that LIMIT 100 is applied after ALL select work
> is done, which means that it would have to search gigabytes of data
> and return a many gigabyte result, only to trim it to 100 rows at the
> last moment.
>
> Even SELECT..LIMIT 1 takes forever and I had to kill it.
>
> How can I get around this awful problem.



If you ORDER BY something that is indexed then it will read the table in the
order of the index and stop after it has match 100 rows, it should be very
fast.


--
Brian Wakem
Email: http://homepage.ntlworld.com/b.wakem/myemail.png
Reply With Quote
  #3 (permalink)  
Old 12-15-2006
Ignoramus24559
 
Posts: n/a
Default Re: SELECT...LIMIT that does a partial select?

On Fri, 15 Dec 2006 19:25:22 +0000, Brian Wakem <no@email.com> wrote:
> Ignoramus24559 wrote:
>
>> I am in big trouble with MySQL 5.
>>
>> I have a huge table, about 25 GIGABYTES, that is a cache for hard to
>> compute items. Each row has two timestamps, last updated and last
>> retrieved timestamp, some metadata, key and a BLOB.
>>
>> I would like to clean that table based on timestamps. I used to clean
>> it periodically, but doing a big DELETE...WHERE query locks it up for
>> HOURS and that means downtime for my website, lost money, pissed off
>> tutors, etc.
>>
>> So, I thought, I would write a process that would do
>>
>> SELECT id FROM bincache WHERE ... LIMIT 100
>>
>> and that would run quickly to give me 100 IDs, that would then DELETE
>> at my leisure without locking up the table. Repeat ad infinitum.
>>
>> Well, I just learned that LIMIT 100 is applied after ALL select work
>> is done, which means that it would have to search gigabytes of data
>> and return a many gigabyte result, only to trim it to 100 rows at the
>> last moment.
>>
>> Even SELECT..LIMIT 1 takes forever and I had to kill it.
>>
>> How can I get around this awful problem.

>
>
> If you ORDER BY something that is indexed then it will read the table in the
> order of the index and stop after it has match 100 rows, it should be very
> fast.
>
>


Brian, this is beautiful and seems to work. At least when I am making
my selects without having a lot of entries cleaned in the beginning of
the index. I am a little concerned that this select...order by would
become slower if it goes through the index, first scrolling through
thousands of entries that already are devoid of deletable items.

Anyway, I will see what I can do...

i
Reply With Quote
  #4 (permalink)  
Old 12-16-2006
Axel Schwenke
 
Posts: n/a
Default Re: SELECT...LIMIT that does a partial select?

Ignoramus24559 <ignoramus24559@NOSPAM.24559.invalid> wrote:
> I am in big trouble with MySQL 5.
>
> I have a huge table, about 25 GIGABYTES, that is a cache for hard to
> compute items. Each row has two timestamps, last updated and last
> retrieved timestamp, some metadata, key and a BLOB.
>
> I would like to clean that table based on timestamps. I used to clean
> it periodically, but doing a big DELETE...WHERE query locks it up for
> HOURS and that means downtime for my website, lost money, pissed off
> tutors, etc.


First: this design does not seem to be very clever. You have 25GB
worth of "hard to compute" binary stuff? Why do you put it in the
database anyway? Wouldn't it be better to keep that in plain old
files? On the web server(s)? That should scale much better. Plus,
filesystems are quite good at dealing with binary data. Better than
most database engines.

Next: you use a MyISAM table for that. Not good. With variable length
records and heavy fluctuation you will experience tablespace
fragmentation very soon. You would be better off with an InnoDB table.
I can also imagine a set of MyISAM tables, bundled by the MERGE engine.
If you partition your data by timestamp, cleaning out old entries would
become as simple as CREATE fresh_table, ALTER merge_table, DROP
oldest_table.

> So, I thought, I would write a process that would do
>
> SELECT id FROM bincache WHERE ... LIMIT 100
>
> and that would run quickly to give me 100 IDs, that would then DELETE
> at my leisure without locking up the table.


Locking is a MyISAM only problem.

Also you can do DELETE ... LIMIT. No need to break that in two steps.
However, deleting from a big table is expensive:

1. the rows to be deleted must be found
2. rows must be deleted from tablespace (this may cause tablespace
reorganisation)
3. indexes must be updated

If you delete big part of a table (say: 80% of the records) you will be
better off with the following approach:

1. create a new table with the same layout
2. populate the new table with INSERT .... SELECT with all records
you plan to keep
3. rename tables (this is an atomic operation)
4. drop the old table

> Well, I just learned that LIMIT 100 is applied after ALL select work
> is done,


This clearly depends on your query. If MySQL does not need to
materialize the result set (i.e. for sorting), it stops just after
it processed the requested number of rows.


XL
--
Axel Schwenke, Senior Software Developer, MySQL AB

Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
MySQL User Forums: http://forums.mysql.com/
Reply With Quote
  #5 (permalink)  
Old 12-17-2006
onedbguru
 
Posts: n/a
Default Re: SELECT...LIMIT that does a partial select?


Axel Schwenke wrote:
> Ignoramus24559 <ignoramus24559@NOSPAM.24559.invalid> wrote:
> > I am in big trouble with MySQL 5.
> >
> > I have a huge table, about 25 GIGABYTES, that is a cache for hard to
> > compute items. Each row has two timestamps, last updated and last
> > retrieved timestamp, some metadata, key and a BLOB.
> >
> > I would like to clean that table based on timestamps. I used to clean
> > it periodically, but doing a big DELETE...WHERE query locks it up for
> > HOURS and that means downtime for my website, lost money, pissed off
> > tutors, etc.

>
> First: this design does not seem to be very clever. You have 25GB
> worth of "hard to compute" binary stuff? Why do you put it in the
> database anyway? Wouldn't it be better to keep that in plain old
> files? On the web server(s)? That should scale much better. Plus,
> filesystems are quite good at dealing with binary data. Better than
> most database engines.
>
> Next: you use a MyISAM table for that. Not good. With variable length
> records and heavy fluctuation you will experience tablespace
> fragmentation very soon. You would be better off with an InnoDB table.
> I can also imagine a set of MyISAM tables, bundled by the MERGE engine.
> If you partition your data by timestamp, cleaning out old entries would
> become as simple as CREATE fresh_table, ALTER merge_table, DROP
> oldest_table.
>
> > So, I thought, I would write a process that would do
> >
> > SELECT id FROM bincache WHERE ... LIMIT 100
> >
> > and that would run quickly to give me 100 IDs, that would then DELETE
> > at my leisure without locking up the table.

>
> Locking is a MyISAM only problem.
>
> Also you can do DELETE ... LIMIT. No need to break that in two steps.
> However, deleting from a big table is expensive:
>
> 1. the rows to be deleted must be found
> 2. rows must be deleted from tablespace (this may cause tablespace
> reorganisation)
> 3. indexes must be updated
>
> If you delete big part of a table (say: 80% of the records) you will be
> better off with the following approach:
>
> 1. create a new table with the same layout
> 2. populate the new table with INSERT .... SELECT with all records
> you plan to keep
> 3. rename tables (this is an atomic operation)
> 4. drop the old table
>
> > Well, I just learned that LIMIT 100 is applied after ALL select work
> > is done,

>
> This clearly depends on your query. If MySQL does not need to
> materialize the result set (i.e. for sorting), it stops just after
> it processed the requested number of rows.
>
>
> XL
> --
> Axel Schwenke, Senior Software Developer, MySQL AB
>
> Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
> MySQL User Forums: http://forums.mysql.com/


IMNSHO, using partitioned tables would be a much better alternative. I
have administered real databases in the multi-terabyte range with this
approach.

create tablespace p0...
create tablespace p1 etc...

CREATE TABLE t1 (
id INT,
year_col INT
)
PARTITION BY RANGE (year_col) (
PARTITION p0 VALUES LESS THAN (1991),
PARTITION p1 VALUES LESS THAN (1995),
PARTITION p2 VALUES LESS THAN (1999)
);
You can add a new partition p3 to this table for storing values less
then 2002 as follows:

ALTER TABLE t1 ADD PARTITION (PARTITION p3 VALUES LESS THAN (2002));

Then look at how to drop the older partitions.

One thing you need to do is to also partition your indexes (it if is
allowed --I have not exhausted the docs on this topic...) such that if
you drop the oldest partition - you do so on the indexes as well.

Someone once said in this forum that in todays storage array technology
it is not necessary to seperate data and indexes onto seperate
filesystems (or disk drives depending on your platform). I would have
to disagree because the problem is not necessarily on the storage
array, but the shear volume of I/O's the OS can queue up on a single
logical device. No OS handles hundreds or thousands of I/O's per
second to a single-LVM very well unless your "SYSTEM" is designed
properly. You must continue to "DESIGN" your system to match the
requirements - don't just throw it together based on random pieces of
comments (including mine) and expect it to perform optimally.

If someone says `it "should" perform if you....`, RUN!!! Unless they
have designed it before in a proven production environment, then they
are just guessing. It may be a very good guess, but it is a guess
nevertheless...

Database System designers, IMHO, should have DBA, SA, Network and
Storage Administration skills, because to design these types of
systems, you must have an intimate, working knowlege of all of these
disciplines and MUST work very closely with their counterparts in these
areas.

Database Consultant.

Reply With Quote
  #6 (permalink)  
Old 12-17-2006
Axel Schwenke
 
Posts: n/a
Default Re: SELECT...LIMIT that does a partial select?

"onedbguru" <onedbguru@yahoo.com> wrote:
>
> Axel Schwenke wrote:


[Fullquote, including signature]

Please don't fullquote!

> IMNSHO, using partitioned tables would be a much better alternative. I
> have administered real databases in the multi-terabyte range with this
> approach.
>
> create tablespace p0...
> create tablespace p1 etc...
>
> CREATE TABLE t1 (
> id INT,
> year_col INT
> )
> PARTITION BY RANGE (year_col) (
> PARTITION p0 VALUES LESS THAN (1991),
> PARTITION p1 VALUES LESS THAN (1995),
> PARTITION p2 VALUES LESS THAN (1999)
> );


Did you notice the word "mysql" in this groups name? Your suggestion
does not work with MySQL. In fact MySQL 5.1 does support partitions -
but it is currently BETA and the syntax is slightly different.
MERGE tables (as suggested by me) are the poor mans approximation of
partitions, available in stable MySQL editions.

However, the OPs problem is, that DELETE on a large table

a) may take a long time and
b) locks that table

His original approach (deleting small chunks) tackles a). Using an
InnoDB table would eliminate b). Partitions won't solve any of those
in the first place - only if DELETE is implemented as DROP PARTITION.


XL
--
Axel Schwenke, Senior Software Developer, MySQL AB

Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
MySQL User Forums: http://forums.mysql.com/
Reply With Quote
  #7 (permalink)  
Old 12-17-2006
Wayne M. Poe
 
Posts: n/a
Default Re: SELECT...LIMIT that does a partial select?

Axel Schwenke wrote:
> "onedbguru" <onedbguru@yahoo.com> wrote:
> >
> > IMNSHO, using partitioned tables would be a much better alternative.
> > I have administered real databases in the multi-terabyte range with
> > this approach.
> >
> > create tablespace p0...
> > create tablespace p1 etc...
> >
> > CREATE TABLE t1 (
> > id INT,
> > year_col INT
> > )
> > PARTITION BY RANGE (year_col) (
> > PARTITION p0 VALUES LESS THAN (1991),
> > PARTITION p1 VALUES LESS THAN (1995),
> > PARTITION p2 VALUES LESS THAN (1999)
> > );

>
> Did you notice the word "mysql" in this groups name? Your suggestion
> does not work with MySQL. In fact MySQL 5.1 does support partitions -
> but it is currently BETA and the syntax is slightly different.
> MERGE tables (as suggested by me) are the poor mans approximation of
> partitions, available in stable MySQL editions.


UM, no. I've been using them since 3.x, and they work great for huge
amounts of data.


Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 09:17 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0