This is a discussion on Wikipedia 'pagelinks' - InnoDB Bulk Insert within the MySQL Database forums, part of the Database Forums category; I'm trying to import a dump of the most interesting Wikipedia tables. Specifically page.sql (560MB), categorylinks.sql (584MB) ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I'm trying to import a dump of the most interesting Wikipedia tables.
Specifically page.sql (560MB), categorylinks.sql (584MB) and pagelinks.sql (3GB) from [download.wikimedia.org]. I've successfully imported both page.sql and categorylinks.sql in about 30 minutes each but I haven't had the same success with pagelinks.sql. I've alotted as much memory as is available (1.5GB) but it fills up and CPU usage drops to about 2-5% with gratuitous disk usage. I'm going on 12 hours here so I thought I'd check and see if this is normal and if there's anything I can do to improve my import speed. A couple notes. pagelinks.sql begins with: DROP TABLE IF EXISTS `pagelinks`; CREATE TABLE `pagelinks` ( `pl_from` int(8) unsigned NOT NULL default '0', `pl_namespace` int(11) NOT NULL default '0', `pl_title` varchar(255) binary NOT NULL default '', UNIQUE KEY `pl_from` (`pl_from`,`pl_namespace`,`pl_title`), KEY `pl_namespace` (`pl_namespace`,`pl_title`) ) TYPE=InnoDB; /*!40000 ALTER TABLE `pagelinks` DISABLE KEYS */; -- -- Dumping data for table `pagelinks` -- LOCK TABLES `pagelinks` WRITE; INSERT INTO `pagelinks` VALUES (11327,0,'!'),(33640,0,'!'),(64486,0,'!'),... Importing with: mysql -u username -p schema < pagelinks.sql Running on a Core 2 Duo 6600 running Windows 2k3 Server with 2GB RAM. mysql.ini is: [mysqld] #Port number to use for connections. port=3306 #Path to installation directory. All paths are usually resolved relative to this. basedir=C:/Program Files/MySQL/MySQL Server 5.0 #Path to the database root datadir=D:/MySQL #If no specific storage engine/table type is defined in an SQL-Create statement the default type will be used. default-storage-engine=innodb #The bigger you set this the less disk I/O is needed to access data in tables. On a dedicated database server you may set this parameter up to 80% of the machine physical memory size. Do not set it too large, though, because competition of the physical memory may cause paging in the operating system. innodb_buffer_pool_size=1500M #Size of a memory pool InnoDB uses to store data dictionary information and other internal data structures. A sensible value for this might be 2M, but the more tables you have in your application the more you will need to allocate here. If InnoDB runs out of memory in this pool, it will start to allocate memory from the operating system, and write warning messages to the MySQL error log. innodb_additional_mem_pool_size=8M #Paths to individual datafiles and their sizes. innodb_data_file_path=ibdata1:1000M:autoextend:max :10000M #How much to increase datafile size by when full. innodb_autoextend_increment=512M #The common part of the directory path for all InnoDB datafiles. Leave this empty if you want to split the data files onto different drives. innodb_data_home_dir=D:/MySQL/InnoDB #Directory path to InnoDB log files. innodb_log_group_home_dir=D:/MySQL/InnoDB #Size of each log file in a log group in megabytes. Sensible values range from 1M to 1/n-th of the size of the buffer pool specified below, where n is the number of log files in the group. The larger the value, the less checkpoint flush activity is needed in the buffer pool, saving disk I/O. But larger log files also mean that recovery will be slower in case of a crash. The combined size of log files must be less than 4 GB on 32-bit computers. The default is 5M. innodb_log_file_size=300M #The size of the buffer which InnoDB uses to write log to the log files on disk. Sensible values range from 1M to 8M. A big log buffer allows large transactions to run without a need to write the log to disk until the transaction commit. Thus, if you have big transactions, making the log buffer big will save disk I/O. innodb_log_buffer_size=8M #Specifies when log files are flushed to disk. innodb_flush_log_at_trx_commit=1 #Helps in performance tuning in heavily concurrent environments. innodb_thread_concurrency=4 #Max packetlength to send/receive from to server. max_allowed_packet=128M #The size of the buffer used for index blocks. Increase this to get better index handling (for all reads and multiple writes) to as much as you can afford; 64M on a 256M machine that mainly runs MySQL is quite common. key_buffer_size=64M [mysql] #Max packetlength to send/receive from to client. max_allowed_packet=128M Any help/insight would be greatly appreciated. |
|
|||
|
"froggy000" <chris.richard@gmail.com> wrote:
> I'm trying to import a dump of the most interesting Wikipedia tables. > Specifically page.sql (560MB), categorylinks.sql (584MB) and > pagelinks.sql (3GB) from [download.wikimedia.org]. This is a 3GB uncompressed SQL dump? > I've successfully imported both page.sql and categorylinks.sql in about > 30 minutes each but I haven't had the same success with pagelinks.sql. > I've alotted as much memory as is available (1.5GB) but it fills up and > CPU usage drops to about 2-5% with gratuitous disk usage. Seems your system is I/O bound now. This was foreseeable with only 1.5GB for InnoDBs buffer pool. > A couple notes. pagelinks.sql begins with: > > DROP TABLE IF EXISTS `pagelinks`; > CREATE TABLE `pagelinks` ( > `pl_from` int(8) unsigned NOT NULL default '0', > `pl_namespace` int(11) NOT NULL default '0', > `pl_title` varchar(255) binary NOT NULL default '', > UNIQUE KEY `pl_from` (`pl_from`,`pl_namespace`,`pl_title`), > KEY `pl_namespace` (`pl_namespace`,`pl_title`) > ) TYPE=InnoDB; Bah! Bad one! I assume we're talking about 3GB of uncompressed SQL Dump. Since it uses the multi-value INSERT syntax, there is not much overhead from SQL in the dump. Assuming we have 6 digits for the numbers and k characters in the VARCHAR() column we have 6+6+k+7 characters in the dump and get 4+4+k bytes raw data from it. Estimating k~30 this leads to approx. 2.3GB of raw data. But rows are small with 38 bytes and I guess InnoDB adds another 10 Bytes of overhead so we're back at 3GB for the rows alone. Now for indexes: the first index is on *all* columns in the table and therefore duplicates the amount of memory used. The second index duplicates the bigger part of the data again. So overall you will have approx. 8GB of active data in InnoDB (when you create an index on a table you can consider the whole table "active"). With just 1.5GB of memory this will clearly become slow. You may try to remove both indexes from the table definition, loading the raw data and then creating the indexes. The following line > /*!40000 ALTER TABLE `pagelinks` DISABLE KEYS */; does that for the second index already, but not for the first one. But I guess index creation will still take very long. There is another workaround: switch the engine to MyISAM. MyISAM uses much less memory than InnoDB. You should put your memory in key_buffer instead of the innodb_buffer_pool then. Wikipedia uses InnoDB because they have a lot of writes. Unless you want to run a Wikipedia system yourself, MyISAM should be good. XL -- Axel Schwenke, Support Engineer, MySQL AB Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/ MySQL User Forums: http://forums.mysql.com/ |