How to optimize primary key change on a large table?

This is a discussion on How to optimize primary key change on a large table? within the MySQL Database forums, part of the Database Forums category; I have a table with about 200 million rows with a AUTO INCREMENT primary key. However, the auto increment count ...


Go Back   Usenet Forums > Database Forums > MySQL Database

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 11-16-2007
eduardorochabr
 
Posts: n/a
Default How to optimize primary key change on a large table?

I have a table with about 200 million rows with a AUTO INCREMENT
primary key.

However, the auto increment count is being locked all the time,
because there are many concurrent inserts.

I haved decided to use an UUID primary key instead, which would
increse the primary key field from 32 bits to 96 bits.

How to do this migration using the least time possible? I have read
that changing primary keys is a very time consuming task, would it be
faster to dump and re-populate the table?

Thank you.
Reply With Quote
  #2 (permalink)  
Old 11-16-2007
Hugo
 
Posts: n/a
Default Re: How to optimize primary key change on a large table?

eduardorochabr wrote :

> I have a table with about 200 million rows with a AUTO INCREMENT
> primary key.


> I haved decided to use an UUID primary key instead, which would
> increse the primary key field from 32 bits to 96 bits.


Are you using InnoDB engine ? If so, I *highly* discourage you do to so.
Primary Key is very special for InnoDB, and:

1) Long PK is to avoid as much as possible.
2) Randomly valued PK is to avoid as much as possible.

By using an UUID, you are violating 1 and 2 statement at the same time.
So at least consider doing some tests before actually make the
migration, to validate it is a good solution (which I doubt).

> the auto increment count is being locked all the time, because there
> are many concurrent inserts.


As a workaround, you may be able to use an outside process to attribute
a pseudo auto-increment value to the PK.

In case you are using MyISAM (or other) engine, I don't know much about
those, and could not help you further.

--
Hugo
Reply With Quote
  #3 (permalink)  
Old 11-16-2007
eduardorochabr
 
Posts: n/a
Default Re: How to optimize primary key change on a large table?

On 16 nov, 09:04, Hugo <h...@nospam.invalid> wrote:
> eduardorochabr wrote :
>
> > I have a table with about 200 million rows with a AUTO INCREMENT
> > primary key.
> > I haved decided to use an UUID primary key instead, which would
> > increse the primary key field from 32 bits to 96 bits.

>
> Are you using InnoDB engine ? If so, I *highly* discourage you do to so.
> Primary Key is very special for InnoDB, and:
>
> 1) Long PK is to avoid as much as possible.
> 2) Randomly valued PK is to avoid as much as possible.
>
> By using an UUID, you are violating 1 and 2 statement at the same time.
> So at least consider doing some tests before actually make the
> migration, to validate it is a good solution (which I doubt).
>
> > the auto increment count is being locked all the time, because there
> > are many concurrent inserts.

>
> As a workaround, you may be able to use an outside process to attribute
> a pseudo auto-increment value to the PK.
>
> In case you are using MyISAM (or other) engine, I don't know much about
> those, and could not help you further.
>
> --
> Hugo


Hi Hugo,

Thank you for your answer. I am using innodb, sorry for not telling
this before.

I will look for alternatives, I am thinking about using a FK in the
primary key, so it could be kind of [user_id, some_sequential_value].
This way, I could have the benefits of clustering the index for
related records (user based), but the key would be bigger than it is
today (at least 32 bits + 32 bits, maybe 32 bits + 64 bits, since I am
about to use some kind of timestamping in "some_sequential_value").

Such primary key would still be a trouble for innodb?

Thank you.
Reply With Quote
  #4 (permalink)  
Old 11-16-2007
Willem Bogaerts
 
Posts: n/a
Default Re: How to optimize primary key change on a large table?

> I have a table with about 200 million rows with a AUTO INCREMENT
> primary key.
>
> However, the auto increment count is being locked all the time,
> because there are many concurrent inserts.
>
> I haved decided to use an UUID primary key instead, which would
> increse the primary key field from 32 bits to 96 bits.


I don't get it. You want to migrate the database structure during full
load of the database?

> How to do this migration using the least time possible? I have read
> that changing primary keys is a very time consuming task, would it be
> faster to dump and re-populate the table?


I would first be worried about the data itself, and only after that
about the time it takes.

Even then, key migration is not some "modify field" work. There are
references to the key, and even if there are no references to it, I'd
add a new key field first and drop the old one later, so I get back in
every step I take.

So: add a new field for UUID, with a uniqueness constraint and random
contents, if that is at all possible (I would not know how). If you have
to do this in steps (add a non-unique field, fill it with random data,
add the constraint), your concurrent inserts will make this impossible.
Really, you should get the inserts off-line before doing maintenance
like this.

Modify the other tables that have a foreign key to the above table, so
that they use your new field.

If all works well, drop the Primary Key index, The old primary key
column, and create a new Primary Key with the new field.

That should do it.

--
Willem Bogaerts

Application smith
Kratz B.V.
http://www.kratz.nl/
Reply With Quote
  #5 (permalink)  
Old 11-16-2007
eduardorochabr
 
Posts: n/a
Default Re: How to optimize primary key change on a large table?

On 16 nov, 11:36, eduardorochabr <eduardoroch...@gmail.com> wrote:
> On 16 nov, 09:04, Hugo <h...@nospam.invalid> wrote:
>
>
>
> > eduardorochabr wrote :

>
> > > I have a table with about 200 million rows with a AUTO INCREMENT
> > > primary key.
> > > I haved decided to use an UUID primary key instead, which would
> > > increse the primary key field from 32 bits to 96 bits.

>
> > Are you using InnoDB engine ? If so, I *highly* discourage you do to so.
> > Primary Key is very special for InnoDB, and:

>
> > 1) Long PK is to avoid as much as possible.
> > 2) Randomly valued PK is to avoid as much as possible.

>
> > By using an UUID, you are violating 1 and 2 statement at the same time.
> > So at least consider doing some tests before actually make the
> > migration, to validate it is a good solution (which I doubt).

>
> > > the auto increment count is being locked all the time, because there
> > > are many concurrent inserts.

>
> > As a workaround, you may be able to use an outside process to attribute
> > a pseudo auto-increment value to the PK.

>
> > In case you are using MyISAM (or other) engine, I don't know much about
> > those, and could not help you further.

>
> > --
> > Hugo

>
> Hi Hugo,
>
> Thank you for your answer. I am using innodb, sorry for not telling
> this before.
>
> I will look for alternatives, I am thinking about using a FK in the
> primary key, so it could be kind of [user_id, some_sequential_value].
> This way, I could have the benefits of clustering the index for
> related records (user based), but the key would be bigger than it is
> today (at least 32 bits + 32 bits, maybe 32 bits + 64 bits, since I am
> about to use some kind of timestamping in "some_sequential_value").
>
> Such primary key would still be a trouble for innodb?
>
> Thank you.


I also had a thought about using a simple field primary key with such
composition:

(user_id << 64) +
(some_sequential_value_maybe_using_timestamp_and_r andom_value)

Innodb would cluster this single field primary key in similar way as
using [user_id, some_sequential_value]?

Thank you.
Reply With Quote
  #6 (permalink)  
Old 11-16-2007
eduardorochabr
 
Posts: n/a
Default Re: How to optimize primary key change on a large table?

On 16 nov, 11:49, Willem Bogaerts
<w.bogae...@kratz.maardanzonderditstuk.nl> wrote:
> > I have a table with about 200 million rows with a AUTO INCREMENT
> > primary key.

>
> > However, the auto increment count is being locked all the time,
> > because there are many concurrent inserts.

>
> > I haved decided to use an UUID primary key instead, which would
> > increse the primary key field from 32 bits to 96 bits.

>
> I don't get it. You want to migrate the database structure during full
> load of the database?
>
> > How to do this migration using the least time possible? I have read
> > that changing primary keys is a very time consuming task, would it be
> > faster to dump and re-populate the table?

>
> I would first be worried about the data itself, and only after that
> about the time it takes.
>
> Even then, key migration is not some "modify field" work. There are
> references to the key, and even if there are no references to it, I'd
> add a new key field first and drop the old one later, so I get back in
> every step I take.
>
> So: add a new field for UUID, with a uniqueness constraint and random
> contents, if that is at all possible (I would not know how). If you have
> to do this in steps (add a non-unique field, fill it with random data,
> add the constraint), your concurrent inserts will make this impossible.
> Really, you should get the inserts off-line before doing maintenance
> like this.
>
> Modify the other tables that have a foreign key to the above table, so
> that they use your new field.
>
> If all works well, drop the Primary Key index, The old primary key
> column, and create a new Primary Key with the new field.
>
> That should do it.
>
> --
> Willem Bogaerts
>
> Application smith
> Kratz B.V.http://www.kratz.nl/


Hi William, thanks for your answer.

You're right, I will get the system offline while doing this
migration. However, I will try to minimize this downtime as much as
possible.

The steps you've described are one of the paths I am about to follow,
though I am afraid that recalculating the PK like this should be much
slower than export/import the table with the new PK structure.
Reply With Quote
  #7 (permalink)  
Old 11-16-2007
Willem Bogaerts
 
Posts: n/a
Default Re: How to optimize primary key change on a large table?

> The steps you've described are one of the paths I am about to follow,
> though I am afraid that recalculating the PK like this should be much
> slower than export/import the table with the new PK structure.


I don't know. You could be right. But whatever path you use, do not
forget to set the autocommit off. That really speeds things up!

Good luck,
--
Willem Bogaerts

Application smith
Kratz B.V.
http://www.kratz.nl/
Reply With Quote
  #8 (permalink)  
Old 11-17-2007
Michael Martinek
 
Posts: n/a
Default Re: How to optimize primary key change on a large table?

Not entirely sure this would work with InnoDB.. but I've used it with
MyISAM tables with more than 50 million rows.

I'd suggest making a copy of the table. With the copy, do:

ALTER TABLE <table> CHANGE <key_column_name> <key_column_name> BIGINT
UNSIGNED AUTO_INCREMENT

The PRIMARY KEY status will remain there.

Per MySQL "However, BIGINT values have 64 bits, so do note that if you
were to insert one million rows per second, it would still take nearly
three hundred thousand years before BIGINT reached its upper bound."
Then you get the benefits of keeping it as an integer type instead of
a UUID.

Once it finishes, this is where you could expect some down time. Go
offline for a while, and select any entries from the first table that
don't exist in the original table. If you've been using an
auto_increment column, then you can do INSERT INTO <table> SELECT *
FROM <original_table> WHERE id > <last_entry_id>. Shutdown the DB
server and move the new tables/db over.

For changes to multiple tables, I'd suggest making a new DB and use
the same table names. This way you only need to rename the old DB
files (add a _ to the end or something, so MySQL won't load them)..
then move the new DB tables into the old DB directory and start the
server up again.

Note that this will work for MyISAM tables.. I'm not 100% sure about
InnoDB tables. Although with an InnoDB table, I'm sure it will work up
to the point of moving the new DB files into the old DB folder. From
there, I'd advise doing a little test of your own before trying any of
this.


Another method that I've used is very circumstantial. I have it set up
to work with a view for reads, but writes go directly to the table.
This was by design, so I could change the view over to an offline
table without losing any write requests. This allowed me to create the
"offline" table, copy the data I wanted into it, change the view to
read from the offline table, alter the original table, then change the
view back and drop the "offline" table. The number of columns was low,
and the row count would at most, EVER be around two million rows..
after years and years and years of collecting data.

Hope some of this is useful.

Regards,
Michael Martinek
Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 05:41 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0