This is a discussion on Speeding up massive (millions of rows) insertions and adding new INNODB indexes within the MySQL Database forums, part of the Database Forums category; Hello everyone, This is my first time posting (been reading for years) on Usenet so please apologize and do tell ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Hello everyone,
This is my first time posting (been reading for years) on Usenet so please apologize and do tell if I'm doing something wrong. I'm trying to speed up 2 massive operations; below is the details on them and what have been tried up to now. 1 - Inserting millions of rows in a database. ---------------------------------------------------------------- The scenario is a production system to which new tables are added and are filled by a program using the existing data. On some systems this means 5-10 millions of new records to insert. What have been done up to now is wrapping up the 1000 inserts at a time around BEING; COMMIT; statements and disabling the mysql log completely. We have also tried using the INSERT INTO TABLE1 FIELD1, FIELD2, FIELD3 VALUES (XXX,XXX,XXX)(XXX,XXX,XXX) .... (XXX,XXX,XXX) syntax instead of doing one INSERT per record, and didn't see much gain with that. 2 - Adding new indexes on a huge INNODB table ------------------------------------------------------------ The table in question have around 12 million rows. We have tweaked innodb_buffer_pool_size to a much larger value than what it is usually and have seen a good improvement with that. Index creation may take 20-30 mins if my guess is good, but that is still a while since I have to create 4 indexes in this setting. Thanks for your time having a look at this ! |
|
|||
|
On 25 Jan, 04:06, "mastersox" <cool...@gmail.com> wrote: > Hello everyone, > > This is my first time posting (been reading for years) on Usenet so > please apologize and do tell if I'm doing something wrong. > > I'm trying to speed up 2 massive operations; below is the details on > them and what have been tried up to now. > > 1 - Inserting millions of rows in a database. > ---------------------------------------------------------------- > The scenario is a production system to which new tables > are added and are filled by a program using the existing data. > On some systems this means 5-10 millions of new records to insert. > > What have been done up to now is wrapping up the 1000 inserts > at a time around BEING; COMMIT; statements and disabling the > mysql log completely. > > We have also tried using the > INSERT INTO TABLE1 FIELD1, FIELD2, FIELD3 VALUES > (XXX,XXX,XXX)(XXX,XXX,XXX) .... (XXX,XXX,XXX) > syntax instead of doing one INSERT per record, and didn't see much gain > with that. Where does the "existing data" come from? Have you tried LOAD DATA? > > 2 - Adding new indexes on a huge INNODB table > ------------------------------------------------------------ > The table in question have around 12 million rows. We have > tweaked > innodb_buffer_pool_size > to a much larger value than what it is usually and have seen > a good improvement with that. Index creation may take > 20-30 mins if my guess is good, but that is still a while since > I have to create 4 indexes in this setting. > > Thanks for your time having a look at this ! |
|
|||
|
Existing data is already in the database, it doesn't need to be
inserted. All the inserts are generated by a c++ program. I haven't experimented with this at all. Is the speed gain significant enough to justify dumping the inserts in a file then loading the data using LOAD DATA ? On Jan 25, 5:02 am, "Captain Paralytic" <paul_laut...@yahoo.com> wrote: > On 25 Jan, 04:06, "mastersox" <cool...@gmail.com> wrote: > > > Hello everyone, > > > This is my first time posting (been reading for years) on Usenet so > > please apologize and do tell if I'm doing something wrong. > > > I'm trying to speed up 2 massive operations; below is the details on > > them and what have been tried up to now. > > > 1 - Inserting millions of rows in a database. > > ---------------------------------------------------------------- > > The scenario is a production system to which new tables > > are added and are filled by a program using the existing data. > > On some systems this means 5-10 millions of new records to insert. > > > What have been done up to now is wrapping up the 1000 inserts > > at a time around BEING; COMMIT; statements and disabling the > > mysql log completely. > > > We have also tried using the > > INSERT INTO TABLE1 FIELD1, FIELD2, FIELD3 VALUES > > (XXX,XXX,XXX)(XXX,XXX,XXX) .... (XXX,XXX,XXX) > > syntax instead of doing one INSERT per record, and didn't see much gain > > with that.Where does the "existing data" come from? Have you tried LOAD DATA? > > > > > 2 - Adding new indexes on a huge INNODB table > > ------------------------------------------------------------ > > The table in question have around 12 million rows. We have > > tweaked > > innodb_buffer_pool_size > > to a much larger value than what it is usually and have seen > > a good improvement with that. Index creation may take > > 20-30 mins if my guess is good, but that is still a while since > > I have to create 4 indexes in this setting. > > > Thanks for your time having a look at this ! |
|
|||
|
On 25 Jan, 14:57, "mastersox" <cool...@gmail.com> wrote: > Existing data is already in the database, it doesn't need to be > inserted. All the inserts are generated by a c++ program. > > I haven't experimented with this at all. Is the speed gain > significant enough to justify dumping the inserts in a file > then loading the data using LOAD DATA ? > > On Jan 25, 5:02 am, "Captain Paralytic" <paul_laut...@yahoo.com> wrote: > > > > > On 25 Jan, 04:06, "mastersox" <cool...@gmail.com> wrote: > > > > Hello everyone, > > > > This is my first time posting (been reading for years) on Usenet so > > > please apologize and do tell if I'm doing something wrong. > > > > I'm trying to speed up 2 massive operations; below is the details on > > > them and what have been tried up to now. > > > > 1 - Inserting millions of rows in a database. > > > ---------------------------------------------------------------- > > > The scenario is a production system to which new tables > > > are added and are filled by a program using the existing data. > > > On some systems this means 5-10 millions of new records to insert. > > > > What have been done up to now is wrapping up the 1000 inserts > > > at a time around BEING; COMMIT; statements and disabling the > > > mysql log completely. > > > > We have also tried using the > > > INSERT INTO TABLE1 FIELD1, FIELD2, FIELD3 VALUES > > > (XXX,XXX,XXX)(XXX,XXX,XXX) .... (XXX,XXX,XXX) > > > syntax instead of doing one INSERT per record, and didn't see much gain > > > with that.Where does the "existing data" come from? Have you tried LOAD DATA? > > > > 2 - Adding new indexes on a huge INNODB table > > > ------------------------------------------------------------ > > > The table in question have around 12 million rows. We have > > > tweaked > > > innodb_buffer_pool_size > > > to a much larger value than what it is usually and have seen > > > a good improvement with that. Index creation may take > > > 20-30 mins if my guess is good, but that is still a while since > > > I have to create 4 indexes in this setting. > > > > Thanks for your time having a look at this !- Hide quoted text -- Show quoted text - First of all a request. Please don't top post. OK now I'm confused!!! In your first post you said: "The scenario is a production system to which new tables are added and are filled by a program using the existing data. On some systems this means 5-10 millions of new records to insert." Now you are saying that the "existing data" is already there and doesn't need to be inserted! If the existing data is what means that 5-10 millions of records need to be inserted, how can they also not need to be inserted. Would you care to sort your thoughts out and try describe the scenario once again? |
|
|||
|
On 25 jan, 10:08, "Captain Paralytic" <paul_laut...@yahoo.com> wrote: > On 25 Jan, 14:57, "mastersox" <cool...@gmail.com> wrote: > > > > > Existing data is already in the database, it doesn't need to be > > inserted. All the inserts are generated by a c++ program. > > > I haven't experimented with this at all. Is the speed gain > > significant enough to justify dumping the inserts in a file > > then loading the data using LOAD DATA ? > > > On Jan 25, 5:02 am, "Captain Paralytic" <paul_laut...@yahoo.com> wrote: > > > > On 25 Jan, 04:06, "mastersox" <cool...@gmail.com> wrote: > > > > > Hello everyone, > > > > > This is my first time posting (been reading for years) on Usenet so > > > > please apologize and do tell if I'm doing something wrong. > > > > > I'm trying to speed up 2 massive operations; below is the details on > > > > them and what have been tried up to now. > > > > > 1 - Inserting millions of rows in a database. > > > > ---------------------------------------------------------------- > > > > The scenario is a production system to which new tables > > > > are added and are filled by a program using the existing data. > > > > On some systems this means 5-10 millions of new records to insert. > > > > > What have been done up to now is wrapping up the 1000 inserts > > > > at a time around BEING; COMMIT; statements and disabling the > > > > mysql log completely. > > > > > We have also tried using the > > > > INSERT INTO TABLE1 FIELD1, FIELD2, FIELD3 VALUES > > > > (XXX,XXX,XXX)(XXX,XXX,XXX) .... (XXX,XXX,XXX) > > > > syntax instead of doing one INSERT per record, and didn't see much gain > > > > with that.Where does the "existing data" come from? Have you tried LOAD DATA? > > > > > 2 - Adding new indexes on a huge INNODB table > > > > ------------------------------------------------------------ > > > > The table in question have around 12 million rows. We have > > > > tweaked > > > > innodb_buffer_pool_size > > > > to a much larger value than what it is usually and have seen > > > > a good improvement with that. Index creation may take > > > > 20-30 mins if my guess is good, but that is still a while since > > > > I have to create 4 indexes in this setting. > > > > > Thanks for your time having a look at this !- Hide quoted text -- Show quoted text -First of all a request. Please don't top post. > > OK now I'm confused!!! > > In your first post you said: "The scenario is a production system to > which new tables are added and are filled by a program using the > existing data. On some systems this means 5-10 millions of new records > to insert." > > Now you are saying that the "existing data" is already there and > doesn't need to be inserted! > If the existing data is what means that 5-10 millions of records need > to be inserted, how can they also not need to be inserted. > > Would you care to sort your thoughts out and try describe the scenario > once again? When you say don't top post, you mean I shouldn't reply at the top of the email, right ? Thanks for helping me doing this right. Second, the 'existing data' is the data that is already in the database. The new tables data are built from what are in the other tables. What it is really is something like this: Existing tables: TRANSACTION TR_MOPS TR_TARIF TR_SERVICE New tables are: TR_TARIF_TAX TR_SERVICE_TAX TR_TARIF_COMP TR_SERVICE_COMP Those are all INNODB So for each entry in TRANSACTION, there is 1 ... n entries in TR_TARIF, TR_SERVICE AND TR_MOPS. They are linked with TR_DATE, TR_NUM. It's the same for the new tables, there can be 1 ... n entries for each of the tables they are linked with. So for a particular system, it means there are 10+ millions of new records to put in those tables. What I'm trying to do is increase the raw speed of insertion. All the inserts are done on the same connection, and there is a COMMIT; BEGIN; statement at every 1000 inserts. I think I have optimized all that could be in the program that does the job of 'creating' those new records and inserting them in the database. I am trying to optimize mysql's my.cnf for maximum insertion speed. The best gain we've found for now is to stop query logging. When the data will get inserted, the server will be completely idle so I can change my.cnf and restart mysql as many times as needed, like with some beefed up the memory amounts allocation in some buffers. I am wondering if increasing the size of INNODB_BUFFER_POOL_SIZE will have a significant impact on insertion speed, from 512mb to 1024 or 1536. I also wonder if any other my.cnf settings could help with raw insertion speed. Or maybe putting all this data to insert in a file that could be loaded with LOAD DATA? This is running on a quad-cpu 3gb ram IBM rackmount unit. Thanks! |
|
|||
|
On 26 Jan, 01:07, "mastersox" <cool...@gmail.com> wrote: > On 25 jan, 10:08, "Captain Paralytic" <paul_laut...@yahoo.com> wrote: > > > > > > > On 25 Jan, 14:57, "mastersox" <cool...@gmail.com> wrote: > > > > Existing data is already in the database, it doesn't need to be > > > inserted. All the inserts are generated by a c++ program. > > > > I haven't experimented with this at all. Is the speed gain > > > significant enough to justify dumping the inserts in a file > > > then loading the data using LOAD DATA ? > > > > On Jan 25, 5:02 am, "Captain Paralytic" <paul_laut...@yahoo.com> wrote: > > > > > On 25 Jan, 04:06, "mastersox" <cool...@gmail.com> wrote: > > > > > > Hello everyone, > > > > > > This is my first time posting (been reading for years) on Usenet so > > > > > please apologize and do tell if I'm doing something wrong. > > > > > > I'm trying to speed up 2 massive operations; below is the details on > > > > > them and what have been tried up to now. > > > > > > 1 - Inserting millions of rows in a database. > > > > > ---------------------------------------------------------------- > > > > > The scenario is a production system to which new tables > > > > > are added and are filled by a program using the existing data. > > > > > On some systems this means 5-10 millions of new records to insert. > > > > > > What have been done up to now is wrapping up the 1000 inserts > > > > > at a time around BEING; COMMIT; statements and disabling the > > > > > mysql log completely. > > > > > > We have also tried using the > > > > > INSERT INTO TABLE1 FIELD1, FIELD2, FIELD3 VALUES > > > > > (XXX,XXX,XXX)(XXX,XXX,XXX) .... (XXX,XXX,XXX) > > > > > syntax instead of doing one INSERT per record, and didn't see much gain > > > > > with that.Where does the "existing data" come from? Have you tried LOAD DATA? > > > > > > 2 - Adding new indexes on a huge INNODB table > > > > > ------------------------------------------------------------ > > > > > The table in question have around 12 million rows. We have > > > > > tweaked > > > > > innodb_buffer_pool_size > > > > > to a much larger value than what it is usually and have seen > > > > > a good improvement with that. Index creation may take > > > > > 20-30 mins if my guess is good, but that is still a while since > > > > > I have to create 4 indexes in this setting. > > > > > > Thanks for your time having a look at this !- Hide quoted text -- Show quoted text -First of all a request. Please don't top post. > > > OK now I'm confused!!! > > > In your first post you said: "The scenario is a production system to > > which new tables are added and are filled by a program using the > > existing data. On some systems this means 5-10 millions of new records > > to insert." > > > Now you are saying that the "existing data" is already there and > > doesn't need to be inserted! > > If the existing data is what means that 5-10 millions of records need > > to be inserted, how can they also not need to be inserted. > > > Would you care to sort your thoughts out and try describe the scenario > > once again?When you say don't top post, you mean I shouldn't reply at the top of > the email, right ? Thanks for helping me doing this right. > > Second, the 'existing data' is the data that is already in the > database. > The new tables data are built from what are in the other tables. > > What it is really is something like this: > > Existing tables: > TRANSACTION > TR_MOPS > TR_TARIF > TR_SERVICE > > New tables are: > TR_TARIF_TAX > TR_SERVICE_TAX > TR_TARIF_COMP > TR_SERVICE_COMP > > Those are all INNODB > > So for each entry in TRANSACTION, there is 1 ... n entries in TR_TARIF, > > TR_SERVICE AND TR_MOPS. They are linked with TR_DATE, TR_NUM. > > It's the same for the new tables, there can be 1 ... n entries for each > > of the tables they are linked with. > > So for a particular system, it means there are 10+ millions of new > records > to put in those tables. > > What I'm trying to do is increase the raw speed of insertion. All the > inserts > are done on the same connection, and there is a COMMIT; BEGIN; > statement > at every 1000 inserts. I think I have optimized all that could be in > the program > that does the job of 'creating' those new records and inserting them in > the database. > > I am trying to optimize mysql's my.cnf for maximum insertion speed. The > > best gain we've found for now is to stop query logging. > > When the data will get inserted, the server will be completely idle so > I can change my.cnf and restart mysql as many times as needed, like > with some > beefed up the memory amounts allocation in some buffers. I am > wondering if increasing the size of INNODB_BUFFER_POOL_SIZE will > have a significant impact on insertion speed, from 512mb to 1024 or > 1536. > > I also wonder if any other my.cnf settings could help with raw > insertion speed. > > Or maybe putting all this data to insert in a file that could be loaded > with LOAD DATA? > > This is running on a quad-cpu 3gb ram IBM rackmount unit. > > Thanks!- Hide quoted text -- Show quoted text - If the data in the new tables are built from data that is in the other existing tables, why are you taking the data out of MySQL at all? Why not use INSERT ... SELECT statements to let SQL copy the data for you? Then you have only one INSERT statement per table to be populated? |
|
|||
|
On Jan 26, 7:51 am, "Captain Paralytic" <paul_laut...@yahoo.com> wrote:
> On 26 Jan, 01:07, "mastersox" <cool...@gmail.com> wrote: > > > > > On 25 jan, 10:08, "Captain Paralytic" <paul_laut...@yahoo.com> wrote: > > > > On 25 Jan, 14:57, "mastersox" <cool...@gmail.com> wrote: > > > > > Existing data is already in the database, it doesn't need to be > > > > inserted. All the inserts are generated by a c++ program. > > > > > I haven't experimented with this at all. Is the speed gain > > > > significant enough to justify dumping the inserts in a file > > > > then loading the data using LOAD DATA ? > > > > > On Jan 25, 5:02 am, "Captain Paralytic" <paul_laut...@yahoo.com> wrote: > > > > > > On 25 Jan, 04:06, "mastersox" <cool...@gmail.com> wrote: > > > > > > > Hello everyone, > > > > > > > This is my first time posting (been reading for years) on Usenet so > > > > > > please apologize and do tell if I'm doing something wrong. > > > > > > > I'm trying to speed up 2 massive operations; below is the details on > > > > > > them and what have been tried up to now. > > > > > > > 1 - Inserting millions of rows in a database. > > > > > > ---------------------------------------------------------------- > > > > > > The scenario is a production system to which new tables > > > > > > are added and are filled by a program using the existing data. > > > > > > On some systems this means 5-10 millions of new records to insert. > > > > > > > What have been done up to now is wrapping up the 1000 inserts > > > > > > at a time around BEING; COMMIT; statements and disabling the > > > > > > mysql log completely. > > > > > > > We have also tried using the > > > > > > INSERT INTO TABLE1 FIELD1, FIELD2, FIELD3 VALUES > > > > > > (XXX,XXX,XXX)(XXX,XXX,XXX) .... (XXX,XXX,XXX) > > > > > > syntax instead of doing one INSERT per record, and didn't see much gain > > > > > > with that.Where does the "existing data" come from? Have you tried LOAD DATA? > > > > > > > 2 - Adding new indexes on a huge INNODB table > > > > > > ------------------------------------------------------------ > > > > > > The table in question have around 12 million rows. We have > > > > > > tweaked > > > > > > innodb_buffer_pool_size > > > > > > to a much larger value than what it is usually and have seen > > > > > > a good improvement with that. Index creation may take > > > > > > 20-30 mins if my guess is good, but that is still a while since > > > > > > I have to create 4 indexes in this setting. > > > > > > > Thanks for your time having a look at this !- Hide quoted text -- Show quoted text -First of all a request. Please don't top post. > > > > OK now I'm confused!!! > > > > In your first post you said: "The scenario is a production system to > > > which new tables are added and are filled by a program using the > > > existing data. On some systems this means 5-10 millions of new records > > > to insert." > > > > Now you are saying that the "existing data" is already there and > > > doesn't need to be inserted! > > > If the existing data is what means that 5-10 millions of records need > > > to be inserted, how can they also not need to be inserted. > > > > Would you care to sort your thoughts out and try describe the scenario > > > once again?When you say don't top post, you mean I shouldn't reply at the top of > > the email, right ? Thanks for helping me doing this right. > > > Second, the 'existing data' is the data that is already in the > > database. > > The new tables data are built from what are in the other tables. > > > What it is really is something like this: > > > Existing tables: > > TRANSACTION > > TR_MOPS > > TR_TARIF > > TR_SERVICE > > > New tables are: > > TR_TARIF_TAX > > TR_SERVICE_TAX > > TR_TARIF_COMP > > TR_SERVICE_COMP > > > Those are all INNODB > > > So for each entry in TRANSACTION, there is 1 ... n entries in TR_TARIF, > > > TR_SERVICE AND TR_MOPS. They are linked with TR_DATE, TR_NUM. > > > It's the same for the new tables, there can be 1 ... n entries for each > > > of the tables they are linked with. > > > So for a particular system, it means there are 10+ millions of new > > records > > to put in those tables. > > > What I'm trying to do is increase the raw speed of insertion. All the > > inserts > > are done on the same connection, and there is a COMMIT; BEGIN; > > statement > > at every 1000 inserts. I think I have optimized all that could be in > > the program > > that does the job of 'creating' those new records and inserting them in > > the database. > > > I am trying to optimize mysql's my.cnf for maximum insertion speed. The > > > best gain we've found for now is to stop query logging. > > > When the data will get inserted, the server will be completely idle so > > I can change my.cnf and restart mysql as many times as needed, like > > with some > > beefed up the memory amounts allocation in some buffers. I am > > wondering if increasing the size of INNODB_BUFFER_POOL_SIZE will > > have a significant impact on insertion speed, from 512mb to 1024 or > > 1536. > > > I also wonder if any other my.cnf settings could help with raw > > insertion speed. > > > Or maybe putting all this data to insert in a file that could be loaded > > with LOAD DATA? > > > This is running on a quad-cpu 3gb ram IBM rackmount unit. > > > Thanks!- Hide quoted text -- Show quoted text -If the data in the new tables are built from data that is in the other > existing tables, why are you taking the data out of MySQL at all? Why > not use INSERT ... SELECT statements to let SQL copy the data for you? > Then you have only one INSERT statement per table to be populated? There are calculations that needs to be done by the external program that can't be handled by mysql. I'm really down to optimizing raw inserts speed. Even if the INSERTS would be done directly by mysql, insertion speed would still be a bottleneck. |