Bulk load performance

This is a discussion on Bulk load performance within the MySQL Database forums, part of the Database Forums category; Hi all, I'm in the process of selecting a DBMS for a project. We've narrowed down our choices ...


Go Back   Usenet Forums > Database Forums > MySQL Database

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 12-18-2006
Xiaolei Li
 
Posts: n/a
Default Bulk load performance

Hi all,

I'm in the process of selecting a DBMS for a project. We've narrowed
down our choices to MySQL, Oracle, and PostgreSQL. One thing we're not
sure is the speed of bulk loading. Some of the tables could contain
tens or hundreds of millions of tuples. So, does anyone know of any
benchmarks about this stuff? I tried searching online but couldn't find
anything. Thank you.

Reply With Quote
  #2 (permalink)  
Old 12-19-2006
Robert Klemme
 
Posts: n/a
Default Re: Bulk load performance

On 18.12.2006 17:50, Xiaolei Li wrote:
> I'm in the process of selecting a DBMS for a project. We've narrowed
> down our choices to MySQL, Oracle, and PostgreSQL. One thing we're not
> sure is the speed of bulk loading. Some of the tables could contain
> tens or hundreds of millions of tuples. So, does anyone know of any
> benchmarks about this stuff? I tried searching online but couldn't find
> anything. Thank you.


There are tons of benchmarks out there but I am not sure whether you
will find something specialized for bulk loading. Why not set up a test
database of each brand and do your own tests?

Kind regards

robert
Reply With Quote
  #3 (permalink)  
Old 12-19-2006
Walter Vaughan
 
Posts: n/a
Default Re: Bulk load performance

Xiaolei Li wrote:

> I'm in the process of selecting a DBMS for a project. We've narrowed
> down our choices to MySQL, Oracle, and PostgreSQL. One thing we're not
> sure is the speed of bulk loading.


Well I can address MySQL 5.0.27 and PostgreSQL 8.1.5 bulk loading on a freeBSD
server. PostgreSQL is can be about twice as fast in our testing (YMMV) when
mySQL is using InnoDB tables.

One thing that is either good/bad about mySQL vs pgSQL loading is that pgSQL is
extremely picky about datatypes, and the number of fields must match exactly. So
if you are missing fields or try to cram invalid data using MySQL you might not
get warned. At least I haven't been.

I've used mySQL for years, so don't take this message as FUD/bashing. We run a
large data warehouse on MySQL. It just that things that need a ton of data
integrity and data import verification work much better for *us* on postgresql.

I'm sure someone else has horror stories quite different from our experience.

You may want to post at pgsql-general@postgresql.org for postgresql input on
this topic. No idea about Oracle.

Good luck with your search...
--
Walter
Reply With Quote
  #4 (permalink)  
Old 12-20-2006
Axel Schwenke
 
Posts: n/a
Default Re: Bulk load performance

Walter Vaughan <wvaughan@steelerubber.com> wrote:
> Xiaolei Li wrote:
>
>> I'm in the process of selecting a DBMS for a project. We've narrowed
>> down our choices to MySQL, Oracle, and PostgreSQL. One thing we're not
>> sure is the speed of bulk loading.

>
> Well I can address MySQL 5.0.27 and PostgreSQL 8.1.5 bulk loading on a freeBSD
> server. PostgreSQL is can be about twice as fast in our testing (YMMV) when
> mySQL is using InnoDB tables.


Bulk loading performance varies very much depending on how *exactly*
it is done. The MySQL manual contains a whole chapter on that topic.

In a nutshell:

- turn off AUTO-COMMIT !
- you can COMMIT in batches or just once after loading all data. I
suggest to COMMIT every 10.000 (or so) rows.
- if you import from SQL commands (SQL dump) - using MySQLs proprietary
multi-value INSERT will speedup the load
- prepared statements are good too, but can insert just one row per
execute. Multi-value INSERT and PS are approx. equally fast.
- LOAD DATA INFILE will be even faster, because parsing the raw file is
faster than parsing SQL
- keys should be deactivated for bulk loading


A MySQL issue could be the fact that MySQL uses only one thread - that
is: only one cpu core - per connection. So if you have multiple cpu
cores you should split your data and load in multiple connections.
I did so on a 16-core box and it scales quite nicely.


XL
--
Axel Schwenke, Senior Software Developer, MySQL AB

Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
MySQL User Forums: http://forums.mysql.com/
Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 06:06 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0