Bluehost.com Web Hosting $6.95

Speeding up massive (millions of rows) insertions and adding new INNODB indexes

This is a discussion on Speeding up massive (millions of rows) insertions and adding new INNODB indexes within the MySQL Database forums, part of the Database Forums category; Hello everyone, This is my first time posting (been reading for years) on Usenet so please apologize and do tell ...


Go Back   Usenet Forums > Database Forums > MySQL Database

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 01-25-2007
mastersox
 
Posts: n/a
Default Speeding up massive (millions of rows) insertions and adding new INNODB indexes

Hello everyone,

This is my first time posting (been reading for years) on Usenet so
please apologize and do tell if I'm doing something wrong.

I'm trying to speed up 2 massive operations; below is the details on
them and what have been tried up to now.

1 - Inserting millions of rows in a database.
----------------------------------------------------------------
The scenario is a production system to which new tables
are added and are filled by a program using the existing data.
On some systems this means 5-10 millions of new records to insert.

What have been done up to now is wrapping up the 1000 inserts
at a time around BEING; COMMIT; statements and disabling the
mysql log completely.

We have also tried using the
INSERT INTO TABLE1 FIELD1, FIELD2, FIELD3 VALUES
(XXX,XXX,XXX)(XXX,XXX,XXX) .... (XXX,XXX,XXX)
syntax instead of doing one INSERT per record, and didn't see much gain
with that.

2 - Adding new indexes on a huge INNODB table
------------------------------------------------------------
The table in question have around 12 million rows. We have
tweaked
innodb_buffer_pool_size
to a much larger value than what it is usually and have seen
a good improvement with that. Index creation may take
20-30 mins if my guess is good, but that is still a while since
I have to create 4 indexes in this setting.


Thanks for your time having a look at this !

Reply With Quote
  #2 (permalink)  
Old 01-25-2007
Captain Paralytic
 
Posts: n/a
Default Re: Speeding up massive (millions of rows) insertions and adding new INNODB indexes



On 25 Jan, 04:06, "mastersox" <cool...@gmail.com> wrote:
> Hello everyone,
>
> This is my first time posting (been reading for years) on Usenet so
> please apologize and do tell if I'm doing something wrong.
>
> I'm trying to speed up 2 massive operations; below is the details on
> them and what have been tried up to now.
>
> 1 - Inserting millions of rows in a database.
> ----------------------------------------------------------------
> The scenario is a production system to which new tables
> are added and are filled by a program using the existing data.
> On some systems this means 5-10 millions of new records to insert.
>
> What have been done up to now is wrapping up the 1000 inserts
> at a time around BEING; COMMIT; statements and disabling the
> mysql log completely.
>
> We have also tried using the
> INSERT INTO TABLE1 FIELD1, FIELD2, FIELD3 VALUES
> (XXX,XXX,XXX)(XXX,XXX,XXX) .... (XXX,XXX,XXX)
> syntax instead of doing one INSERT per record, and didn't see much gain
> with that.

Where does the "existing data" come from? Have you tried LOAD DATA?
>
> 2 - Adding new indexes on a huge INNODB table
> ------------------------------------------------------------
> The table in question have around 12 million rows. We have
> tweaked
> innodb_buffer_pool_size
> to a much larger value than what it is usually and have seen
> a good improvement with that. Index creation may take
> 20-30 mins if my guess is good, but that is still a while since
> I have to create 4 indexes in this setting.
>
> Thanks for your time having a look at this !


Reply With Quote
  #3 (permalink)  
Old 01-25-2007
mastersox
 
Posts: n/a
Default Re: Speeding up massive (millions of rows) insertions and adding new INNODB indexes

Existing data is already in the database, it doesn't need to be
inserted. All the inserts are generated by a c++ program.

I haven't experimented with this at all. Is the speed gain
significant enough to justify dumping the inserts in a file
then loading the data using LOAD DATA ?

On Jan 25, 5:02 am, "Captain Paralytic" <paul_laut...@yahoo.com> wrote:
> On 25 Jan, 04:06, "mastersox" <cool...@gmail.com> wrote:
>
> > Hello everyone,

>
> > This is my first time posting (been reading for years) on Usenet so
> > please apologize and do tell if I'm doing something wrong.

>
> > I'm trying to speed up 2 massive operations; below is the details on
> > them and what have been tried up to now.

>
> > 1 - Inserting millions of rows in a database.
> > ----------------------------------------------------------------
> > The scenario is a production system to which new tables
> > are added and are filled by a program using the existing data.
> > On some systems this means 5-10 millions of new records to insert.

>
> > What have been done up to now is wrapping up the 1000 inserts
> > at a time around BEING; COMMIT; statements and disabling the
> > mysql log completely.

>
> > We have also tried using the
> > INSERT INTO TABLE1 FIELD1, FIELD2, FIELD3 VALUES
> > (XXX,XXX,XXX)(XXX,XXX,XXX) .... (XXX,XXX,XXX)
> > syntax instead of doing one INSERT per record, and didn't see much gain
> > with that.Where does the "existing data" come from? Have you tried LOAD DATA?

>
>
>
> > 2 - Adding new indexes on a huge INNODB table
> > ------------------------------------------------------------
> > The table in question have around 12 million rows. We have
> > tweaked
> > innodb_buffer_pool_size
> > to a much larger value than what it is usually and have seen
> > a good improvement with that. Index creation may take
> > 20-30 mins if my guess is good, but that is still a while since
> > I have to create 4 indexes in this setting.

>
> > Thanks for your time having a look at this !


Reply With Quote
  #4 (permalink)  
Old 01-25-2007
Captain Paralytic
 
Posts: n/a
Default Re: Speeding up massive (millions of rows) insertions and adding new INNODB indexes



On 25 Jan, 14:57, "mastersox" <cool...@gmail.com> wrote:
> Existing data is already in the database, it doesn't need to be
> inserted. All the inserts are generated by a c++ program.
>
> I haven't experimented with this at all. Is the speed gain
> significant enough to justify dumping the inserts in a file
> then loading the data using LOAD DATA ?
>
> On Jan 25, 5:02 am, "Captain Paralytic" <paul_laut...@yahoo.com> wrote:
>
>
>
> > On 25 Jan, 04:06, "mastersox" <cool...@gmail.com> wrote:

>
> > > Hello everyone,

>
> > > This is my first time posting (been reading for years) on Usenet so
> > > please apologize and do tell if I'm doing something wrong.

>
> > > I'm trying to speed up 2 massive operations; below is the details on
> > > them and what have been tried up to now.

>
> > > 1 - Inserting millions of rows in a database.
> > > ----------------------------------------------------------------
> > > The scenario is a production system to which new tables
> > > are added and are filled by a program using the existing data.
> > > On some systems this means 5-10 millions of new records to insert.

>
> > > What have been done up to now is wrapping up the 1000 inserts
> > > at a time around BEING; COMMIT; statements and disabling the
> > > mysql log completely.

>
> > > We have also tried using the
> > > INSERT INTO TABLE1 FIELD1, FIELD2, FIELD3 VALUES
> > > (XXX,XXX,XXX)(XXX,XXX,XXX) .... (XXX,XXX,XXX)
> > > syntax instead of doing one INSERT per record, and didn't see much gain
> > > with that.Where does the "existing data" come from? Have you tried LOAD DATA?

>
> > > 2 - Adding new indexes on a huge INNODB table
> > > ------------------------------------------------------------
> > > The table in question have around 12 million rows. We have
> > > tweaked
> > > innodb_buffer_pool_size
> > > to a much larger value than what it is usually and have seen
> > > a good improvement with that. Index creation may take
> > > 20-30 mins if my guess is good, but that is still a while since
> > > I have to create 4 indexes in this setting.

>
> > > Thanks for your time having a look at this !- Hide quoted text -- Show quoted text -


First of all a request. Please don't top post.

OK now I'm confused!!!

In your first post you said: "The scenario is a production system to
which new tables are added and are filled by a program using the
existing data. On some systems this means 5-10 millions of new records
to insert."

Now you are saying that the "existing data" is already there and
doesn't need to be inserted!
If the existing data is what means that 5-10 millions of records need
to be inserted, how can they also not need to be inserted.

Would you care to sort your thoughts out and try describe the scenario
once again?

Reply With Quote
  #5 (permalink)  
Old 01-26-2007
mastersox
 
Posts: n/a
Default Re: Speeding up massive (millions of rows) insertions and adding new INNODB indexes



On 25 jan, 10:08, "Captain Paralytic" <paul_laut...@yahoo.com> wrote:
> On 25 Jan, 14:57, "mastersox" <cool...@gmail.com> wrote:
>
>
>
> > Existing data is already in the database, it doesn't need to be
> > inserted. All the inserts are generated by a c++ program.

>
> > I haven't experimented with this at all. Is the speed gain
> > significant enough to justify dumping the inserts in a file
> > then loading the data using LOAD DATA ?

>
> > On Jan 25, 5:02 am, "Captain Paralytic" <paul_laut...@yahoo.com> wrote:

>
> > > On 25 Jan, 04:06, "mastersox" <cool...@gmail.com> wrote:

>
> > > > Hello everyone,

>
> > > > This is my first time posting (been reading for years) on Usenet so
> > > > please apologize and do tell if I'm doing something wrong.

>
> > > > I'm trying to speed up 2 massive operations; below is the details on
> > > > them and what have been tried up to now.

>
> > > > 1 - Inserting millions of rows in a database.
> > > > ----------------------------------------------------------------
> > > > The scenario is a production system to which new tables
> > > > are added and are filled by a program using the existing data.
> > > > On some systems this means 5-10 millions of new records to insert.

>
> > > > What have been done up to now is wrapping up the 1000 inserts
> > > > at a time around BEING; COMMIT; statements and disabling the
> > > > mysql log completely.

>
> > > > We have also tried using the
> > > > INSERT INTO TABLE1 FIELD1, FIELD2, FIELD3 VALUES
> > > > (XXX,XXX,XXX)(XXX,XXX,XXX) .... (XXX,XXX,XXX)
> > > > syntax instead of doing one INSERT per record, and didn't see much gain
> > > > with that.Where does the "existing data" come from? Have you tried LOAD DATA?

>
> > > > 2 - Adding new indexes on a huge INNODB table
> > > > ------------------------------------------------------------
> > > > The table in question have around 12 million rows. We have
> > > > tweaked
> > > > innodb_buffer_pool_size
> > > > to a much larger value than what it is usually and have seen
> > > > a good improvement with that. Index creation may take
> > > > 20-30 mins if my guess is good, but that is still a while since
> > > > I have to create 4 indexes in this setting.

>
> > > > Thanks for your time having a look at this !- Hide quoted text -- Show quoted text -First of all a request. Please don't top post.

>
> OK now I'm confused!!!
>
> In your first post you said: "The scenario is a production system to
> which new tables are added and are filled by a program using the
> existing data. On some systems this means 5-10 millions of new records
> to insert."
>
> Now you are saying that the "existing data" is already there and
> doesn't need to be inserted!
> If the existing data is what means that 5-10 millions of records need
> to be inserted, how can they also not need to be inserted.
>
> Would you care to sort your thoughts out and try describe the scenario
> once again?


When you say don't top post, you mean I shouldn't reply at the top of
the email, right ? Thanks for helping me doing this right.

Second, the 'existing data' is the data that is already in the
database.
The new tables data are built from what are in the other tables.

What it is really is something like this:

Existing tables:
TRANSACTION
TR_MOPS
TR_TARIF
TR_SERVICE

New tables are:
TR_TARIF_TAX
TR_SERVICE_TAX
TR_TARIF_COMP
TR_SERVICE_COMP

Those are all INNODB

So for each entry in TRANSACTION, there is 1 ... n entries in TR_TARIF,

TR_SERVICE AND TR_MOPS. They are linked with TR_DATE, TR_NUM.

It's the same for the new tables, there can be 1 ... n entries for each

of the tables they are linked with.

So for a particular system, it means there are 10+ millions of new
records
to put in those tables.

What I'm trying to do is increase the raw speed of insertion. All the
inserts
are done on the same connection, and there is a COMMIT; BEGIN;
statement
at every 1000 inserts. I think I have optimized all that could be in
the program
that does the job of 'creating' those new records and inserting them in
the database.

I am trying to optimize mysql's my.cnf for maximum insertion speed. The

best gain we've found for now is to stop query logging.

When the data will get inserted, the server will be completely idle so
I can change my.cnf and restart mysql as many times as needed, like
with some
beefed up the memory amounts allocation in some buffers. I am
wondering if increasing the size of INNODB_BUFFER_POOL_SIZE will
have a significant impact on insertion speed, from 512mb to 1024 or
1536.

I also wonder if any other my.cnf settings could help with raw
insertion speed.

Or maybe putting all this data to insert in a file that could be loaded
with LOAD DATA?

This is running on a quad-cpu 3gb ram IBM rackmount unit.

Thanks!

Reply With Quote
  #6 (permalink)  
Old 01-26-2007
Captain Paralytic
 
Posts: n/a
Default Re: Speeding up massive (millions of rows) insertions and adding new INNODB indexes



On 26 Jan, 01:07, "mastersox" <cool...@gmail.com> wrote:
> On 25 jan, 10:08, "Captain Paralytic" <paul_laut...@yahoo.com> wrote:
>
>
>
>
>
> > On 25 Jan, 14:57, "mastersox" <cool...@gmail.com> wrote:

>
> > > Existing data is already in the database, it doesn't need to be
> > > inserted. All the inserts are generated by a c++ program.

>
> > > I haven't experimented with this at all. Is the speed gain
> > > significant enough to justify dumping the inserts in a file
> > > then loading the data using LOAD DATA ?

>
> > > On Jan 25, 5:02 am, "Captain Paralytic" <paul_laut...@yahoo.com> wrote:

>
> > > > On 25 Jan, 04:06, "mastersox" <cool...@gmail.com> wrote:

>
> > > > > Hello everyone,

>
> > > > > This is my first time posting (been reading for years) on Usenet so
> > > > > please apologize and do tell if I'm doing something wrong.

>
> > > > > I'm trying to speed up 2 massive operations; below is the details on
> > > > > them and what have been tried up to now.

>
> > > > > 1 - Inserting millions of rows in a database.
> > > > > ----------------------------------------------------------------
> > > > > The scenario is a production system to which new tables
> > > > > are added and are filled by a program using the existing data.
> > > > > On some systems this means 5-10 millions of new records to insert.

>
> > > > > What have been done up to now is wrapping up the 1000 inserts
> > > > > at a time around BEING; COMMIT; statements and disabling the
> > > > > mysql log completely.

>
> > > > > We have also tried using the
> > > > > INSERT INTO TABLE1 FIELD1, FIELD2, FIELD3 VALUES
> > > > > (XXX,XXX,XXX)(XXX,XXX,XXX) .... (XXX,XXX,XXX)
> > > > > syntax instead of doing one INSERT per record, and didn't see much gain
> > > > > with that.Where does the "existing data" come from? Have you tried LOAD DATA?

>
> > > > > 2 - Adding new indexes on a huge INNODB table
> > > > > ------------------------------------------------------------
> > > > > The table in question have around 12 million rows. We have
> > > > > tweaked
> > > > > innodb_buffer_pool_size
> > > > > to a much larger value than what it is usually and have seen
> > > > > a good improvement with that. Index creation may take
> > > > > 20-30 mins if my guess is good, but that is still a while since
> > > > > I have to create 4 indexes in this setting.

>
> > > > > Thanks for your time having a look at this !- Hide quoted text -- Show quoted text -First of all a request. Please don't top post.

>
> > OK now I'm confused!!!

>
> > In your first post you said: "The scenario is a production system to
> > which new tables are added and are filled by a program using the
> > existing data. On some systems this means 5-10 millions of new records
> > to insert."

>
> > Now you are saying that the "existing data" is already there and
> > doesn't need to be inserted!
> > If the existing data is what means that 5-10 millions of records need
> > to be inserted, how can they also not need to be inserted.

>
> > Would you care to sort your thoughts out and try describe the scenario
> > once again?When you say don't top post, you mean I shouldn't reply at the top of

> the email, right ? Thanks for helping me doing this right.
>
> Second, the 'existing data' is the data that is already in the
> database.
> The new tables data are built from what are in the other tables.
>
> What it is really is something like this:
>
> Existing tables:
> TRANSACTION
> TR_MOPS
> TR_TARIF
> TR_SERVICE
>
> New tables are:
> TR_TARIF_TAX
> TR_SERVICE_TAX
> TR_TARIF_COMP
> TR_SERVICE_COMP
>
> Those are all INNODB
>
> So for each entry in TRANSACTION, there is 1 ... n entries in TR_TARIF,
>
> TR_SERVICE AND TR_MOPS. They are linked with TR_DATE, TR_NUM.
>
> It's the same for the new tables, there can be 1 ... n entries for each
>
> of the tables they are linked with.
>
> So for a particular system, it means there are 10+ millions of new
> records
> to put in those tables.
>
> What I'm trying to do is increase the raw speed of insertion. All the
> inserts
> are done on the same connection, and there is a COMMIT; BEGIN;
> statement
> at every 1000 inserts. I think I have optimized all that could be in
> the program
> that does the job of 'creating' those new records and inserting them in
> the database.
>
> I am trying to optimize mysql's my.cnf for maximum insertion speed. The
>
> best gain we've found for now is to stop query logging.
>
> When the data will get inserted, the server will be completely idle so
> I can change my.cnf and restart mysql as many times as needed, like
> with some
> beefed up the memory amounts allocation in some buffers. I am
> wondering if increasing the size of INNODB_BUFFER_POOL_SIZE will
> have a significant impact on insertion speed, from 512mb to 1024 or
> 1536.
>
> I also wonder if any other my.cnf settings could help with raw
> insertion speed.
>
> Or maybe putting all this data to insert in a file that could be loaded
> with LOAD DATA?
>
> This is running on a quad-cpu 3gb ram IBM rackmount unit.
>
> Thanks!- Hide quoted text -- Show quoted text -


If the data in the new tables are built from data that is in the other
existing tables, why are you taking the data out of MySQL at all? Why
not use INSERT ... SELECT statements to let SQL copy the data for you?
Then you have only one INSERT statement per table to be populated?

Reply With Quote
  #7 (permalink)  
Old 01-26-2007
mastersox
 
Posts: n/a
Default Re: Speeding up massive (millions of rows) insertions and adding new INNODB indexes

On Jan 26, 7:51 am, "Captain Paralytic" <paul_laut...@yahoo.com> wrote:
> On 26 Jan, 01:07, "mastersox" <cool...@gmail.com> wrote:
>
>
>
> > On 25 jan, 10:08, "Captain Paralytic" <paul_laut...@yahoo.com> wrote:

>
> > > On 25 Jan, 14:57, "mastersox" <cool...@gmail.com> wrote:

>
> > > > Existing data is already in the database, it doesn't need to be
> > > > inserted. All the inserts are generated by a c++ program.

>
> > > > I haven't experimented with this at all. Is the speed gain
> > > > significant enough to justify dumping the inserts in a file
> > > > then loading the data using LOAD DATA ?

>
> > > > On Jan 25, 5:02 am, "Captain Paralytic" <paul_laut...@yahoo.com> wrote:

>
> > > > > On 25 Jan, 04:06, "mastersox" <cool...@gmail.com> wrote:

>
> > > > > > Hello everyone,

>
> > > > > > This is my first time posting (been reading for years) on Usenet so
> > > > > > please apologize and do tell if I'm doing something wrong.

>
> > > > > > I'm trying to speed up 2 massive operations; below is the details on
> > > > > > them and what have been tried up to now.

>
> > > > > > 1 - Inserting millions of rows in a database.
> > > > > > ----------------------------------------------------------------
> > > > > > The scenario is a production system to which new tables
> > > > > > are added and are filled by a program using the existing data.
> > > > > > On some systems this means 5-10 millions of new records to insert.

>
> > > > > > What have been done up to now is wrapping up the 1000 inserts
> > > > > > at a time around BEING; COMMIT; statements and disabling the
> > > > > > mysql log completely.

>
> > > > > > We have also tried using the
> > > > > > INSERT INTO TABLE1 FIELD1, FIELD2, FIELD3 VALUES
> > > > > > (XXX,XXX,XXX)(XXX,XXX,XXX) .... (XXX,XXX,XXX)
> > > > > > syntax instead of doing one INSERT per record, and didn't see much gain
> > > > > > with that.Where does the "existing data" come from? Have you tried LOAD DATA?

>
> > > > > > 2 - Adding new indexes on a huge INNODB table
> > > > > > ------------------------------------------------------------
> > > > > > The table in question have around 12 million rows. We have
> > > > > > tweaked
> > > > > > innodb_buffer_pool_size
> > > > > > to a much larger value than what it is usually and have seen
> > > > > > a good improvement with that. Index creation may take
> > > > > > 20-30 mins if my guess is good, but that is still a while since
> > > > > > I have to create 4 indexes in this setting.

>
> > > > > > Thanks for your time having a look at this !- Hide quoted text -- Show quoted text -First of all a request. Please don't top post.

>
> > > OK now I'm confused!!!

>
> > > In your first post you said: "The scenario is a production system to
> > > which new tables are added and are filled by a program using the
> > > existing data. On some systems this means 5-10 millions of new records
> > > to insert."

>
> > > Now you are saying that the "existing data" is already there and
> > > doesn't need to be inserted!
> > > If the existing data is what means that 5-10 millions of records need
> > > to be inserted, how can they also not need to be inserted.

>
> > > Would you care to sort your thoughts out and try describe the scenario
> > > once again?When you say don't top post, you mean I shouldn't reply at the top of

> > the email, right ? Thanks for helping me doing this right.

>
> > Second, the 'existing data' is the data that is already in the
> > database.
> > The new tables data are built from what are in the other tables.

>
> > What it is really is something like this:

>
> > Existing tables:
> > TRANSACTION
> > TR_MOPS
> > TR_TARIF
> > TR_SERVICE

>
> > New tables are:
> > TR_TARIF_TAX
> > TR_SERVICE_TAX
> > TR_TARIF_COMP
> > TR_SERVICE_COMP

>
> > Those are all INNODB

>
> > So for each entry in TRANSACTION, there is 1 ... n entries in TR_TARIF,

>
> > TR_SERVICE AND TR_MOPS. They are linked with TR_DATE, TR_NUM.

>
> > It's the same for the new tables, there can be 1 ... n entries for each

>
> > of the tables they are linked with.

>
> > So for a particular system, it means there are 10+ millions of new
> > records
> > to put in those tables.

>
> > What I'm trying to do is increase the raw speed of insertion. All the
> > inserts
> > are done on the same connection, and there is a COMMIT; BEGIN;
> > statement
> > at every 1000 inserts. I think I have optimized all that could be in
> > the program
> > that does the job of 'creating' those new records and inserting them in
> > the database.

>
> > I am trying to optimize mysql's my.cnf for maximum insertion speed. The

>
> > best gain we've found for now is to stop query logging.

>
> > When the data will get inserted, the server will be completely idle so
> > I can change my.cnf and restart mysql as many times as needed, like
> > with some
> > beefed up the memory amounts allocation in some buffers. I am
> > wondering if increasing the size of INNODB_BUFFER_POOL_SIZE will
> > have a significant impact on insertion speed, from 512mb to 1024 or
> > 1536.

>
> > I also wonder if any other my.cnf settings could help with raw
> > insertion speed.

>
> > Or maybe putting all this data to insert in a file that could be loaded
> > with LOAD DATA?

>
> > This is running on a quad-cpu 3gb ram IBM rackmount unit.

>
> > Thanks!- Hide quoted text -- Show quoted text -If the data in the new tables are built from data that is in the other

> existing tables, why are you taking the data out of MySQL at all? Why
> not use INSERT ... SELECT statements to let SQL copy the data for you?
> Then you have only one INSERT statement per table to be populated?


There are calculations that needs to be done by the external program
that can't be handled by mysql. I'm really down to optimizing raw
inserts speed. Even if the INSERTS would be done directly by mysql,
insertion speed would still be a bottleneck.

Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT +1. The time now is 03:13 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0