This is a discussion on slow query on big table within the MySQL Database forums, part of the Database Forums category; I have a table with more than 3,000,000 records. Frequently, I have to do an update on this ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I have a table with more than 3,000,000 records. Frequently, I have to
do an update on this table that is taking a long time and I'd like some advice as to what I can do to fix the problem. Let's call the table students. The primary key is an auto-indent ID which is unique for each student. However, the table itself basically works to connect students with groups and campaigns they are associated with. Because of this a student might exist in the table more than once. When I do this update I want it to change information in each of the instances of that student, so instead of updating based on the unique id in Students I have to use an identifier that is common to the student regardless of which campaign or group they belong to. Let's say this is their phone number, which is an integer for the purposes of this question. I realize this is a poor example, but for the purposes of the question it'll work. So, the table is indexed not only on the unique identifier, but on the phone number as well. When I do a SHOW INDEX on this table it shows me both of these. However, when I do an explain on both, the unique index shows me that it will have to inspect 1 row while the phone number shows me that it will have to inspect all of the rows in the table. Because this table is big, this takes a long time. So, I guess my question is, why isn't the phone number as efficient as the unique indentifier? Is it because it could exist multiple times in the database? Is there any way to make it more efficient? is the only solution to this problem to make the phone number a unique key for this table and having associations between groups and campaigns done elsewhere? Thanks for your help. I've been researching this for a while, and any assistance would be greatly appreciated. |
|
|||
|
lambelly@gmail.com wrote:
> I have a table with more than 3,000,000 records. Frequently, I have to > do an update on this table that is taking a long time and I'd like some > advice as to what I can do to fix the problem. > > Let's call the table students. The primary key is an auto-indent ID > which is unique for each student. However, the table itself basically > works to connect students with groups and campaigns they are associated > with. Because of this a student might exist in the table more than > once. When I do this update I want it to change information in each of > the instances of that student, so instead of updating based on the > unique id in Students I have to use an identifier that is common to the > student regardless of which campaign or group they belong to. Let's say > this is their phone number, which is an integer for the purposes of > this question. I realize this is a poor example, but for the purposes > of the question it'll work. > > So, the table is indexed not only on the unique identifier, but on the > phone number as well. When I do a SHOW INDEX on this table it shows me > both of these. However, when I do an explain on both, the unique index > shows me that it will have to inspect 1 row while the phone number > shows me that it will have to inspect all of the rows in the table. > Because this table is big, this takes a long time. > > So, I guess my question is, why isn't the phone number as efficient as > the unique indentifier? Is it because it could exist multiple times in > the database? Is there any way to make it more efficient? is the only > solution to this problem to make the phone number a unique key for this > table and having associations between groups and campaigns done > elsewhere? > > Thanks for your help. I've been researching this for a while, and any > assistance would be greatly appreciated. > updates and insert speeds are affected by the number of indexes you have, is there any reason you can't just use the autoincrement as the primary key index, and drop the other indexes (if you do you will see a lot faster update). |
|
|||
|
Kim Hunter wrote:
> > is there any reason you can't just use the autoincrement as the primary > key index, and drop the other indexes (if you do you will see a lot > faster update). I could, but I'd have to redesign the table structure. As it is, a student might be in the table more than once and I need to update this information on every instance of this student in the student table. So, the student might be in there maybe two or three times. If I were to use the autoincrement I'd have to move the associations with groups and campaigns into a different table. Which might not be a bad idea. |
|
|||
|
lambelly@gmail.com wrote:
> Kim Hunter wrote: > >>is there any reason you can't just use the autoincrement as the primary >>key index, and drop the other indexes (if you do you will see a lot >>faster update). > > > I could, but I'd have to redesign the table structure. As it is, a > student might be in the table more than once and I need to update this > information on every instance of this student in the student table. So, > the student might be in there maybe two or three times. > > If I were to use the autoincrement I'd have to move the associations > with groups and campaigns into a different table. Which might not be a > bad idea. > First of all, a phone number will be a char field, not an int. And chars fields always take longer to compare than ints. However, you have a bigger problem. You shouldn't have anyone in there more than once. (And what happens if you get two students sharing an apartment - and therefore the same phone number?) You need to normalize your database. Have one entry for a student in this table, and use another table with two columns - a student id and an organization id. Yes, you'll have to do a little work to do it this way, but it will cut down your table size, allow you to search in the (much quicker) auto increment field and make everything a lot faster. And google for tutorials on "database normalization". You'll probably find other places you can make this more efficient. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |
|
|||
|
Jerry Stuckle wrote:
> You need to normalize your database. Yeah. I didn't design the database and kind of adopted it when I got this job. When I talk about normalizing it my boss doesn't really want to do that because of the changes we'd have to make to the software, which would be complicated. So, I'm hoping that there are solutions which will make both of us happy because right now we're hampered with large problems. However, it looks like all we can do is fix the database properly. |
|
|||
|
lambelly@gmail.com wrote:
> Jerry Stuckle wrote: > > >>You need to normalize your database. > > > Yeah. I didn't design the database and kind of adopted it when I got > this job. When I talk about normalizing it my boss doesn't really want > to do that because of the changes we'd have to make to the software, > which would be complicated. So, I'm hoping that there are solutions > which will make both of us happy because right now we're hampered with > large problems. However, it looks like all we can do is fix the > database properly. > With the current design, your options are going to be very limited. There probably isn't going to be a lot you can do to speed it up significantly. You can play with the MySQL tuning parameters some, and might get some gains. But I suspect since you're main problem is on an UPDATE you won't gain a lot. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |