This is a discussion on Weird performance bottle-neck with large tables. within the MySQL Database forums, part of the Database Forums category; I am running into something really weird with MySQL. Bare in mind that the following is not a real-world ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I am running into something really weird with MySQL. Bare in mind that
the following is not a real-world application but a test environment that I have set up which is meant to only imitate another real-world application. So, please don't respond with recommendations of how to do this same thing differently. :) I have created 10 tables each with 1 million to 10 million rows. Each table has 1 million more than the previous. Each table contains two 64-bit integers which together form the primary key and a data field which is of type TEXT. I then have a program that spawns 50 connections to the first table. Each thread then sends an equal distribution of INSERTs, SELECTs and UPDATEs. I then monitor the average query time of each type of query. After finishing with the first table, I then move to the next table and so on until all tables have been tested. On my machine, I am seeing query times in the range of .6 ms/query. There is little change in the values until I reach the larger tables. Suddenly, at a point which changes each time, the query time jumps up in the range of 30 ms/query. Sometimes this occurs at the 7 million table, sometimes higher or lower. There are times when it never happens at all and the .6 ms/query timing continues even in the 10 million row table. It never happens on the smaller tables though. I then setup a test case where once it popped, it re-ran the tests on the smaller tables. Once the query time has jumped to the 20-30 ms point, queries on the small tables also take 20-30 ms. So, it has to be something that is happening in mysql, not just the given table. Once I let it run for a while, eventually the query times drop down low again. When I run "SHOW PROCESSLIST" while the test is running, I see only a couple of queries running/waiting. Once the timing has popped, the process list shows lots of queries, most of which are waiting. Generally the waiting reason is either "Locked" or "System Lock". I am running 3.23.51 on Linux 2.4.32. The tables are MyISAM. There is plenty of memory. In fact, during the test my swap is completely unused. So it can't be a swapping issue. At least, it can't be a system swapping issue. It could very well be a swapping issue inside of mysql. If anyone could provide any light on this situation, I would greatly appreciate it. Thank you. |
|
|||
|
jinxidoru@byu.net wrote:
> I am running into something really weird with MySQL. Bare in mind that > the following is not a real-world application but a test environment > that I have set up which is meant to only imitate another real-world > application. So, please don't respond with recommendations of how to > do this same thing differently. :) > > I have created 10 tables each with 1 million to 10 million rows. Each > table has 1 million more than the previous. Each table contains two > 64-bit integers which together form the primary key and a data field > which is of type TEXT. I then have a program that spawns 50 > connections to the first table. Each thread then sends an equal > distribution of INSERTs, SELECTs and UPDATEs. I then monitor the > average query time of each type of query. After finishing with the > first table, I then move to the next table and so on until all tables > have been tested. > > On my machine, I am seeing query times in the range of .6 ms/query. > There is little change in the values until I reach the larger tables. > Suddenly, at a point which changes each time, the query time jumps up > in the range of 30 ms/query. Sometimes this occurs at the 7 million > table, sometimes higher or lower. There are times when it never > happens at all and the .6 ms/query timing continues even in the 10 > million row table. It never happens on the smaller tables though. > > I then setup a test case where once it popped, it re-ran the tests on > the smaller tables. Once the query time has jumped to the 20-30 ms > point, queries on the small tables also take 20-30 ms. So, it has to > be something that is happening in mysql, not just the given table. > Once I let it run for a while, eventually the query times drop down low > again. > > When I run "SHOW PROCESSLIST" while the test is running, I see only a > couple of queries running/waiting. Once the timing has popped, the > process list shows lots of queries, most of which are waiting. > Generally the waiting reason is either "Locked" or "System Lock". > > I am running 3.23.51 That's a very old version of MySQL. It might not happen in 4.0, 4.1 or 5.0. > on Linux 2.4.32. That's old too, I haven't used a 2.4 kernel for a while. > The tables are MyISAM. There is > plenty of memory. In fact, during the test my swap is completely > unused. So it can't be a swapping issue. At least, it can't be a > system swapping issue. It could very well be a swapping issue inside > of mysql. But do all the indexes of all the tables fit in the key buffer? I suggest you first check this, raise it if necessary, then if the problem persists, first upgrade MySQL, then your kernel. -- Brian Wakem Email: http://homepage.ntlworld.com/b.wakem/myemail.png |
|
|||
|
> That's a very old version of MySQL. It might not happen in 4.0, 4.1 or 5.0. Yeah, I'm in the process of trying the same test on 5.0 > That's old too, I haven't used a 2.4 kernel for a while. I agree, but because of certain project requirements, this is not changeable. > But do all the indexes of all the tables fit in the key buffer? > I suggest you first check this, raise it if necessary, then if the problem > persists, first upgrade MySQL, then your kernel. I've tried increasing the key buffer, and it had no noticeable effect. Also, if that were the case then I probably wouldn't see the timing increase on the smaller tables when I returned to them. |
|
|||
|
jinxidoru wrote:
> >> That's a very old version of MySQL. It might not happen in 4.0, 4.1 or >> 5.0. > > Yeah, I'm in the process of trying the same test on 5.0 > >> That's old too, I haven't used a 2.4 kernel for a while. > > I agree, but because of certain project requirements, this is not > changeable. > >> But do all the indexes of all the tables fit in the key buffer? >> I suggest you first check this, raise it if necessary, then if the >> problem persists, first upgrade MySQL, then your kernel. > > I've tried increasing the key buffer, and it had no noticeable effect. > Also, if that were the case then I probably wouldn't see the timing > increase on the smaller tables when I returned to them. Yes you would as the key buffer would contain the indexes of the larger tables as the smaller tables' indexes would have been pushed out of the buffer. -- Brian Wakem Email: http://homepage.ntlworld.com/b.wakem/myemail.png |
|
|||
|
> Yes you would as the key buffer would contain the indexes of the larger
> tables as the smaller tables' indexes would have been pushed out of the > buffer. But, once a couple queries have been run on the smaller tables, wouldn't the index for the smaller tables be put back into the cache. I could see this being a problem when we're talking about just a few individual queries, but my test case involves thousands of queries. |
|
|||
|
jinxidoru wrote:
>> Yes you would as the key buffer would contain the indexes of the larger >> tables as the smaller tables' indexes would have been pushed out of the >> buffer. > > But, once a couple queries have been run on the smaller tables, > wouldn't the index for the smaller tables be put back into the cache. > I could see this being a problem when we're talking about just a few > individual queries, but my test case involves thousands of queries. As I understand it, only the index blocks that are required for the queries you execute would get cached in the key buffer, so if you keep querying different parts of the table you might not get any buffer hits. With MySQL 5.0 you can use 'LOAD INDEX INTO CACHE' (http://dev.mysql.com/doc/refman/5.0/en/load-index.html) to make sure all the index is cached beforehand, but of course you still need a key buffer big enough to fit them all in. -- Brian Wakem Email: http://homepage.ntlworld.com/b.wakem/myemail.png |
|
|||
|
jinxidoru@byu.net wrote:
> I have created 10 tables each with 1 million to 10 million rows. Each > table has 1 million more than the previous. Each table contains two > 64-bit integers which together form the primary key and a data field > which is of type TEXT. I then have a program that spawns 50 > connections to the first table. Each thread then sends an equal > distribution of INSERTs, SELECTs and UPDATEs. I then monitor the > average query time of each type of query. After finishing with the > first table, I then move to the next table and so on until all tables > have been tested. So this is kind of "Performance vs. Data Set Size" test. How do you access the rows? Random access, uniform distribution? This will not simulate real workload then. Real data has hotspots and generally a nonuniform access pattern. > On my machine, I am seeing query times in the range of .6 ms/query. > There is little change in the values until I reach the larger tables. > Suddenly, at a point which changes each time, the query time jumps up > in the range of 30 ms/query. Sometimes this occurs at the 7 million > table, sometimes higher or lower. There are times when it never > happens at all and the .6 ms/query timing continues even in the 10 > million row table. It never happens on the smaller tables though. Looks like some cache gets too small. Since you are using MyISAM, there is only the key_buffer inside MySQL. If you access the data files as well (that is: fields not in indexes), those accesses will be cached by the operating system. The fluctuation could be explained by the random access pattern of your test. But linear progression is not very useful. You should better increase your data set sizes in geometric progression. Common are powers of two. > I then setup a test case where once it popped, it re-ran the tests on > the smaller tables. Once the query time has jumped to the 20-30 ms > point, queries on the small tables also take 20-30 ms. So, it has to > be something that is happening in mysql, not just the given table. No. It's just how caches work. The key_buffer is not instantly flushed and refilled, once you go back to a smaller table. Same goes for caches outside MySQL. Some caches may also be dirty and need to get flushed first. > Once I let it run for a while, eventually the query times drop down low > again. Right. As soon as the caches are "hot" again, the performance increases. > When I run "SHOW PROCESSLIST" while the test is running, I see only a > couple of queries running/waiting. Once the timing has popped, the > process list shows lots of queries, most of which are waiting. > Generally the waiting reason is either "Locked" or "System Lock". Locking comes from your operation mix. MyISAM tables can be read or written mutually exclusive. So locking happens all the time. Though you cannot see it unless there is a backlog of queries in the server. As long as the server can answer your questions faster than your testing environment asks new ones, everything will be fine. If you want to understand the phenomenon, you have to closely monitor both MySQL and your operating system. I.e. the efficiency of the key_buffer can be estimated by looking at the key_read_requests and key_reads counters from SHOW STATUS. With BLOBs in MyISAM tables you will experience table fragmentation. Some UPDATEs will create new records, leaving a hole where the old record was. Eventually some holes will be filled later, but overall the table size will increase and make your caches less usefull and performance drop. XL -- Axel Schwenke, Senior Software Developer, MySQL AB Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/ MySQL User Forums: http://forums.mysql.com/ |