This is a discussion on Trouble with truncation error when loading data from a file within the MySQL Database forums, part of the Database Forums category; The table has many columns, but it includes a column like: CREATE TABLE f ( ... `x` DECIMAL(10,2) DEFAULT NULL, ... ) ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
The table has many columns, but it includes a column like:
CREATE TABLE f ( ... `x` DECIMAL(10,2) DEFAULT NULL, ... ) ENGINE=InnoDB; I use a statement like the following to load the data from a file. LOAD DATA INFILE "path to file" INTO TABLE f (list of columns); When I try to run this, I get an error 1265 "Data truncated for column "x" at row 3. There are thousands of rows in this file. The fields in the rows are tab delimited. Two consecutive tabs means there is no data for that specific field in that specific row. The first two rows do not have any missing fields. I do not understand this. MySQL kows that data for field "x" may be missing, from the definition provided in creating the table, and that when this happens, the value ought to be set to null. So why is it crashing and burning on the first null value it encounters? How can I fix this so that I can reliably load data into my tables from these files, and be confident it properly handles missing values? Ted |
|
|||
|
Ted wrote:
> LOAD DATA INFILE "path to file" INTO TABLE f (list of columns); > > When I try to run this, I get an error 1265 "Data truncated for column > "x" at row 3. > > How can I fix this so that I can reliably load data into my tables from > these files, and be confident it properly handles missing values? If you got data truncation while inserting to a DECIMAL column, I'd guess that the value contains illegal characters of some kind, that are not part of a numeric value. You should clean up your import file so that it contains legal values. One could apply some script (e.g. Perl) to the import file prior to loading it into the MySQL table. The script cleans up values to make sure they're legal for the format you're loading it into. Then it can either output a new data file for use with LOAD DATA, or else the script can use a database library to insert directly into the database (but LOAD DATA is undoubtedly faster for bulk loads). Another solution is to LOAD DATA to a temporary table that has only VARCHAR columns instead of DECIMAL, and then do some cleanup in SQL expressions, as you copy data from the temp table to your real table. In my experience, real-world data drops contains values that are not formatted consistently enough to be loaded directly into columns with strict format requirements, such as INTEGER, NUMERIC, DATE, ENUM, etc., and there needs to be a "staging" load to provide an opportunity to clean up the data. Regards, Bill K. |
|
|||
|
If the file contains illegal characters, they are not visible when the
file is opened in Notepad, and they appear to take up no space when using the right arrow key to move through a record one character at a time, but who knwos? I tried the option of creating a temporary file with the same column names, but all columns being defined as having type varchar(256) or varchar(45), depending on the field (I used the MEMORY engine and it didn't like the text type I occassionally use. But in this, too, I encountered trouble. The second column in the first table is a date with the format mm/dd/yyyy. We discussed this particular problem before. What I don't understand is that when loading this data into a table where everything is defined to be a varchar field, why would it care what the date format is. Should it not just treat that date field as any other string? As it is, no data gets loaded into the temporary tables. Do you have any simple examples of the perl scripts or "cleanup in SQL expressions" you mention? Thanks, Ted |
|
|||
|
Ted wrote:
> The second column in the first table is a date with the format > mm/dd/yyyy. We discussed this particular problem before. What I don't > understand is that when loading this data into a table where everything > is defined to be a varchar field, why would it care what the date > format is. Should it not just treat that date field as any other > string? That is indeed odd. What error does it give? Can you insert one record to the table manually with an INSERT statement, using the values from one of the rows of the input file? Does this still get an error if you use a non-MEMORY table? A non-TEMPORARY table? > As it is, no data gets loaded into the temporary tables. > > Do you have any simple examples of the perl scripts or "cleanup in SQL > expressions" you mention? Something like the following (off the cuff, not tested). Supply the database name, user, and password where you see "...". #!perl -an use DBI; BEGIN { my $dsn = $dsn = "DBI:mysql:database=..."; my $user = '...'; my $password = '...'; $dbh = DBI->connect($dsn, $user, $password); if (!defined($dbh)) { die("Error:" . $DBI::errstr . "\n"); } $sth = $dbh->prepare("INSERT INTO mytable (a, b, c) VALUES (?, ?, ?)"); if (!defined($sth)) { die("Error:" . $dbh->errstr . "\n"); } } # Now the code executes for each line of input, thanks to "-n". # @F is an array of the fields, auto-split by "-a". # modify elements of the array, for instance # reverse the characters of element 1: $F[1] = reverse($F[1]); # Execute our prepared INSERT statement, with the modified values: $sth->execute(@F) or die("Error:" . $dbh->errstr . "\n"); Regards, Bill K. |
|
|||
|
>> When I try to run this, I get an error 1265 "Data truncated for column
>> "x" at row 3. >> >> How can I fix this so that I can reliably load data into my tables from >> these files, and be confident it properly handles missing values? > >If you got data truncation while inserting to a DECIMAL column, I'd >guess that the value contains illegal characters of some kind, that are >not part of a numeric value. You should clean up your import file so >that it contains legal values. It's easy to get data truncation inserting to a DECIMAL column. All it has to do is round the value. No illegal characters need be involved. For example: mysql> create table t1 ( a1 decimal(12,2) ); mysql> insert into t1 values (4/3); mysql> show warnings; (OK, so I'm not drawing the cute boxes here) Note 1265 Data truncated for column 'a1' at row 1 mysql> insert into t1 values (1.3456); mysql> show warnings; Note 1265 Data truncated for column 'a1' at row 1 mysql> Gordon L. Burditt |
|
|||
|
Well, so far I have been able to verify that the data is clean: no
bogus characters. I loaded the file into Open Office'sspreadsheet. It properly detected that the data file is a tab delimited, plain text file, and loaded and displayed the data without incident. The error I get with the temporary files having only varchar fields is that 11/22/1999 is an invalid value for a date. I don't understand this since the table has only varchar fields. I CAN insert data, a record at a time using the query browser! I'll work on the other questions you asked over the weekend. Ted |
|
|||
|
In my case, the issue seems to be related to the fact the field is
empty. All of the fields in all of the records prior to the problem record, and all of the fields on the problem record prior to the problem field, have proper data values. The field causing the problem, on the record causing the problem, is empty (represented in the data file by consecutive tabs). Are there known issues in MySQL WRT how LOAD DATA INFILE handles missing values? Ted |
|
|||
|
The problem occurs regardless of the DB engine I use or whether or not
the table is temporary. The error message is "Incorrect DATETIME value 11/22/1999 at line 1" and the error code is 1292. Remember, this is when trying to load the data into a temporary table with ALL of the fields defined as VARCHAR of varying lengths. Since I can load the file into a spreadsheet with Open Office, I know I can create a Java program to load the data into memory, and then use the java program to load the data into MySQL from the java program, but I have been resisting that options since it is bound to be as slow as molasses in January. The largest table has 26 fields and several thousand records. I am now expecting to have to sanitize the data anyway since, on loading the data into a spreadsheet, I have learned that there are some problems including the fact that the date fields are inconsistently formatted (especially in regard to the number of characters used to represent the year). Ted |
|
|||
|
Hi Ted,
Let me share my experience with you: I have stumbled on the exact same problem, with sometimes millions of warnings when loading geographical data into a table with decimal fields. In the end, it is no big deal: either the input data has more decimal digits than your destination field and it gets properly truncated, or it's empty and gets converted to 0. Depending on your application requirements, you might want to change the 0 values to NULL after loading the initial data. Warning 1265 is really just a note to get your attention at the loss of precision; after you analyze the situation and understand what is happening, you can safely ignore it. Hope this helps, JFG |
|
|||
|
But how do you ignore it when the warning or error stops the process of
loading the data entirely. This database has 28 fields and several thousand records. The third record has an empty value for one field, and that is the field that gives the message about data being truncated. After I read the error message, I look in the table and I find only the first two records! As I said before, I loaded the data into my spreadsheet application and examined the data. There are no illegal chacacters that I can find, but the field and record containing it are the first in the database. Ted |