This is a discussion on Bulk load performance within the MySQL Database forums, part of the Database Forums category; Hi all, I'm in the process of selecting a DBMS for a project. We've narrowed down our choices ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Hi all,
I'm in the process of selecting a DBMS for a project. We've narrowed down our choices to MySQL, Oracle, and PostgreSQL. One thing we're not sure is the speed of bulk loading. Some of the tables could contain tens or hundreds of millions of tuples. So, does anyone know of any benchmarks about this stuff? I tried searching online but couldn't find anything. Thank you. |
|
|||
|
On 18.12.2006 17:50, Xiaolei Li wrote:
> I'm in the process of selecting a DBMS for a project. We've narrowed > down our choices to MySQL, Oracle, and PostgreSQL. One thing we're not > sure is the speed of bulk loading. Some of the tables could contain > tens or hundreds of millions of tuples. So, does anyone know of any > benchmarks about this stuff? I tried searching online but couldn't find > anything. Thank you. There are tons of benchmarks out there but I am not sure whether you will find something specialized for bulk loading. Why not set up a test database of each brand and do your own tests? Kind regards robert |
|
|||
|
Xiaolei Li wrote:
> I'm in the process of selecting a DBMS for a project. We've narrowed > down our choices to MySQL, Oracle, and PostgreSQL. One thing we're not > sure is the speed of bulk loading. Well I can address MySQL 5.0.27 and PostgreSQL 8.1.5 bulk loading on a freeBSD server. PostgreSQL is can be about twice as fast in our testing (YMMV) when mySQL is using InnoDB tables. One thing that is either good/bad about mySQL vs pgSQL loading is that pgSQL is extremely picky about datatypes, and the number of fields must match exactly. So if you are missing fields or try to cram invalid data using MySQL you might not get warned. At least I haven't been. I've used mySQL for years, so don't take this message as FUD/bashing. We run a large data warehouse on MySQL. It just that things that need a ton of data integrity and data import verification work much better for *us* on postgresql. I'm sure someone else has horror stories quite different from our experience. You may want to post at pgsql-general@postgresql.org for postgresql input on this topic. No idea about Oracle. Good luck with your search... -- Walter |
|
|||
|
Walter Vaughan <wvaughan@steelerubber.com> wrote:
> Xiaolei Li wrote: > >> I'm in the process of selecting a DBMS for a project. We've narrowed >> down our choices to MySQL, Oracle, and PostgreSQL. One thing we're not >> sure is the speed of bulk loading. > > Well I can address MySQL 5.0.27 and PostgreSQL 8.1.5 bulk loading on a freeBSD > server. PostgreSQL is can be about twice as fast in our testing (YMMV) when > mySQL is using InnoDB tables. Bulk loading performance varies very much depending on how *exactly* it is done. The MySQL manual contains a whole chapter on that topic. In a nutshell: - turn off AUTO-COMMIT ! - you can COMMIT in batches or just once after loading all data. I suggest to COMMIT every 10.000 (or so) rows. - if you import from SQL commands (SQL dump) - using MySQLs proprietary multi-value INSERT will speedup the load - prepared statements are good too, but can insert just one row per execute. Multi-value INSERT and PS are approx. equally fast. - LOAD DATA INFILE will be even faster, because parsing the raw file is faster than parsing SQL - keys should be deactivated for bulk loading A MySQL issue could be the fact that MySQL uses only one thread - that is: only one cpu core - per connection. So if you have multiple cpu cores you should split your data and load in multiple connections. I did so on a 16-core box and it scales quite nicely. XL -- Axel Schwenke, Senior Software Developer, MySQL AB Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/ MySQL User Forums: http://forums.mysql.com/ |