View Single Post

  #3 (permalink)  
Old 05-21-2008
Oreste
 
Posts: n/a
Default Re: utf8 encoding problem

On May 21, 10:49 am, Willem Bogaerts <w.bogae...@kratz.nl> wrote:
> Oreste wrote:
> > The instruction show variables like 'character%' lists all variables
> > correctly set to utf8.

>
> > I wrote a batch file, before launching it, I checked the internal
> > encoding with a hex editor, everything is ok, the font encoding is
> > utf8.

>
> > I launched the batch file everything was running nicely.

>
> > The batch file is as follows:

>
> > SET NAMES utf8;
> > DROP DATABASE museumoz;
> > CREATE DATABASE museumoz CHARACTER SET utf8 COLLATE utf8_general_ci;
> > USE museumoz;
> > CREATE TABLE naz_pt (
> > id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
> > nome VARCHAR(100)
> > ) CHARACTER SET utf8;
> > INSERT INTO naz_pt (nome) VALUES ('Moçambique');
> > select * from naz_pt;

>
> > the last instruction is useful for checking the character ç

>
> > But instead of ç I get on screen a couple of ascii characters ( ├º )

>
> > There must be an error, but I'm unable to see it.

>
> > Any idea?

>
> Well, yes. I don't know what system you are using, but you only have
> instructed MySQL to use utf-8. So the INSERT command may insert a
> differently encoded string if your batch file is not utf-8 encoded.
> Likewise, the results are rendered by the operating system, not by
> MySQL. So a better check is to check HEX(nome) and count the bytes if
> hex results do not mean anything to you (two characters is one byte).
>
> Best regards,
> --
> Willem Bogaerts
>
> Application smith
> Kratz B.V.http://www.kratz.nl/


Thanks for your quick reply,

you point the real problem a not yet uncovered mystery, the batch file
and in particular INSERT INTO naz_pt (nome) VALUES ('Moçambique')
appear to be correctly utf8 encoded by the system, I'm working under
Windows Vista. The specific value 'Moçambique' that contains the ç
character is correctly displayed by the hex editor.

But what happens inside mysql command line client?

The ascii utf8 table according to the following web page
http://kellyjones.netfirms.com/webto...f8_table.shtml
shows that the caracter ç has the decimal code 231 (hex 00e7) but when
I input the statement SELECT ascii('ç'); I get 135 as result.
According to the table the utf8 encoding for ç is equal to 0xc3,0xa7
if you type those values in the command line client you get on screen
the two ascii characters ç

At the moment it seems that the possible solution is to replace the
caracter with the code accepted by mysql command line client,
therefore
INSERT INTO naz_pt (nome) VALUES ('Moçambique');
becomes
INSERT INTO naz_pt (nome) VALUES (concat('Mo',char(135),'ambique'));

So even if it is quite painful, could be an acceptable solution, but
it does not work, I get the error code "incorrect string value"

I remind that show variables like 'character%' lists all variables
correctly set to utf8 furthemore I add SET NAMES utf8; at the top of
the batch file and CHARACTER SET utf8; for each create table
instruction.

It seems I have to spend some time before finding a suitable solution,
this is exactly the reason why a good programmer cannot have a good
social life.
Reply With Quote