This is a discussion on utf8 encoding problem within the MySQL Database forums, part of the Database Forums category; The instruction show variables like 'character%' lists all variables correctly set to utf8. I wrote a batch file, before launching ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
The instruction show variables like 'character%' lists all variables
correctly set to utf8. I wrote a batch file, before launching it, I checked the internal encoding with a hex editor, everything is ok, the font encoding is utf8. I launched the batch file everything was running nicely. The batch file is as follows: SET NAMES utf8; DROP DATABASE museumoz; CREATE DATABASE museumoz CHARACTER SET utf8 COLLATE utf8_general_ci; USE museumoz; CREATE TABLE naz_pt ( id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, nome VARCHAR(100) ) CHARACTER SET utf8; INSERT INTO naz_pt (nome) VALUES ('Moçambique'); select * from naz_pt; the last instruction is useful for checking the character ç But instead of ç I get on screen a couple of ascii characters ( ├º ) There must be an error, but I'm unable to see it. Any idea? Thanks in advance www.oreste.parlatano.com |
|
|||
|
Oreste wrote:
> The instruction show variables like 'character%' lists all variables > correctly set to utf8. > > I wrote a batch file, before launching it, I checked the internal > encoding with a hex editor, everything is ok, the font encoding is > utf8. > > I launched the batch file everything was running nicely. > > The batch file is as follows: > > SET NAMES utf8; > DROP DATABASE museumoz; > CREATE DATABASE museumoz CHARACTER SET utf8 COLLATE utf8_general_ci; > USE museumoz; > CREATE TABLE naz_pt ( > id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, > nome VARCHAR(100) > ) CHARACTER SET utf8; > INSERT INTO naz_pt (nome) VALUES ('Moçambique'); > select * from naz_pt; > > the last instruction is useful for checking the character ç > > But instead of ç I get on screen a couple of ascii characters ( ├º ) > > There must be an error, but I'm unable to see it. > > Any idea? Well, yes. I don't know what system you are using, but you only have instructed MySQL to use utf-8. So the INSERT command may insert a differently encoded string if your batch file is not utf-8 encoded. Likewise, the results are rendered by the operating system, not by MySQL. So a better check is to check HEX(nome) and count the bytes if hex results do not mean anything to you (two characters is one byte). Best regards, -- Willem Bogaerts Application smith Kratz B.V. http://www.kratz.nl/ |
|
|||
|
On May 21, 10:49 am, Willem Bogaerts <w.bogae...@kratz.nl> wrote:
> Oreste wrote: > > The instruction show variables like 'character%' lists all variables > > correctly set to utf8. > > > I wrote a batch file, before launching it, I checked the internal > > encoding with a hex editor, everything is ok, the font encoding is > > utf8. > > > I launched the batch file everything was running nicely. > > > The batch file is as follows: > > > SET NAMES utf8; > > DROP DATABASE museumoz; > > CREATE DATABASE museumoz CHARACTER SET utf8 COLLATE utf8_general_ci; > > USE museumoz; > > CREATE TABLE naz_pt ( > > id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, > > nome VARCHAR(100) > > ) CHARACTER SET utf8; > > INSERT INTO naz_pt (nome) VALUES ('Moçambique'); > > select * from naz_pt; > > > the last instruction is useful for checking the character ç > > > But instead of ç I get on screen a couple of ascii characters ( ├º ) > > > There must be an error, but I'm unable to see it. > > > Any idea? > > Well, yes. I don't know what system you are using, but you only have > instructed MySQL to use utf-8. So the INSERT command may insert a > differently encoded string if your batch file is not utf-8 encoded. > Likewise, the results are rendered by the operating system, not by > MySQL. So a better check is to check HEX(nome) and count the bytes if > hex results do not mean anything to you (two characters is one byte). > > Best regards, > -- > Willem Bogaerts > > Application smith > Kratz B.V.http://www.kratz.nl/ Thanks for your quick reply, you point the real problem a not yet uncovered mystery, the batch file and in particular INSERT INTO naz_pt (nome) VALUES ('Moçambique') appear to be correctly utf8 encoded by the system, I'm working under Windows Vista. The specific value 'Moçambique' that contains the ç character is correctly displayed by the hex editor. But what happens inside mysql command line client? The ascii utf8 table according to the following web page http://kellyjones.netfirms.com/webto...f8_table.shtml shows that the caracter ç has the decimal code 231 (hex 00e7) but when I input the statement SELECT ascii('ç'); I get 135 as result. According to the table the utf8 encoding for ç is equal to 0xc3,0xa7 if you type those values in the command line client you get on screen the two ascii characters ├º At the moment it seems that the possible solution is to replace the caracter with the code accepted by mysql command line client, therefore INSERT INTO naz_pt (nome) VALUES ('Moçambique'); becomes INSERT INTO naz_pt (nome) VALUES (concat('Mo',char(135),'ambique')); So even if it is quite painful, could be an acceptable solution, but it does not work, I get the error code "incorrect string value" I remind that show variables like 'character%' lists all variables correctly set to utf8 furthemore I add SET NAMES utf8; at the top of the batch file and CHARACTER SET utf8; for each create table instruction. It seems I have to spend some time before finding a suitable solution, this is exactly the reason why a good programmer cannot have a good social life. |
|
|||
|
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 Oreste wrote: | It seems I have to spend some time before finding a suitable solution, | this is exactly the reason why a good programmer cannot have a good | social life. Programming is a good programmers social life. :) Sorry I can't add anything more helpful. What you are already doing should work. By using SET NAMES you are telling mysql that incoming statements and data are in utf8, and to return all results in utf8. All I can think is that the string itself in the INSERT is not in UTF8. If you save your output from the select to a file and compare it to the INSERTed string, how do they compare? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with PCLinuxOS - http://enigmail.mozdev.org iEYEARECAAYFAkg0IcEACgkQzIf+rZpn0oSK3gCeIaGA7Pj3NH cM/n3Qwb17EyOg 0jsAn1C1WJ/rDH+ap0dh2Xf6Cbw1cw1p =TVoG -----END PGP SIGNATURE----- |
|
|||
|
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 Oreste wrote: | | But instead of ç I get on screen a couple of ascii characters ( ├º ) | | I pretty much just duplicated your test and got the same results. That is, until I changed the character encoding of the terminal emaulator I'm using (konsole) from "default" to utf8. Then I saw the character the way it's supposed to look. HTH -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with PCLinuxOS - http://enigmail.mozdev.org iEYEARECAAYFAkg0JVQACgkQzIf+rZpn0oQeBwCfWQpOi2RXCi IJ78KJeMQhUZpj gv4AniO4450BWRQWBw0GCiKqs2R/cwu2 =cWhV -----END PGP SIGNATURE----- |
|
|||
|
....
> At the moment it seems that the possible solution is to replace the > caracter with the code accepted by mysql command line client, > therefore > INSERT INTO naz_pt (nome) VALUES ('Moçambique'); > becomes > INSERT INTO naz_pt (nome) VALUES (concat('Mo',char(135),'ambique')); If you really want to make sure that your statement survives reencoding, use something like: INSERT INTO naz_pt (nome) VALUES (_utf8 0x4D6FC3A7616D6269717565); (which is the output of the HEX function, prepended by "0x" and the used encoding) > It seems I have to spend some time before finding a suitable solution, > this is exactly the reason why a good programmer cannot have a good > social life. The coffee is ready. You are welcome to drop in if you like ;) As chuck already said, it is probably your terminal window that renders the characters wrongly. Xubuntu's terminal window renders strings it just fine after you have given the "SET NAMES utf8" command. Best regards, -- Willem Bogaerts Application smith Kratz B.V. http://www.kratz.nl/ |
|
|||
|
On May 21, 3:57 pm, Willem Bogaerts <w.bogae...@kratz.nl> wrote:
> ... > > > At the moment it seems that the possible solution is to replace the > > caracter with the code accepted by mysql command line client, > > therefore > > INSERT INTO naz_pt (nome) VALUES ('Moçambique'); > > becomes > > INSERT INTO naz_pt (nome) VALUES (concat('Mo',char(135),'ambique')); > > If you really want to make sure that your statement survives reencoding, > use something like: > > INSERT INTO naz_pt (nome) VALUES (_utf8 0x4D6FC3A7616D6269717565); > > (which is the output of the HEX function, prepended by "0x" and the used > encoding) > > > It seems I have to spend some time before finding a suitable solution, > > this is exactly the reason why a good programmer cannot have a good > > social life. > > The coffee is ready. You are welcome to drop in if you like ;) > > As chuck already said, it is probably your terminal window that renders > the characters wrongly. Xubuntu's terminal window renders strings it > just fine after you have given the "SET NAMES utf8" command. > > Best regards, > -- > Willem Bogaerts > > Application smith > Kratz B.V.http://www.kratz.nl/ It definetively an OS problem, I did the same tests on remote system based on Linux via cpanel and everything is working fine. It must be another headache gift from Microsoft. Thanks again to both of you. www.oreste.parlatano.com |
![]() |
| Thread Tools | |
| Display Modes | |
|
|