utf8 encoding problem

This is a discussion on utf8 encoding problem within the MySQL Database forums, part of the Database Forums category; The instruction show variables like 'character%' lists all variables correctly set to utf8. I wrote a batch file, before launching ...


Go Back   Usenet Forums > Database Forums > MySQL Database

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 05-21-2008
Oreste
 
Posts: n/a
Default utf8 encoding problem

The instruction show variables like 'character%' lists all variables
correctly set to utf8.

I wrote a batch file, before launching it, I checked the internal
encoding with a hex editor, everything is ok, the font encoding is
utf8.

I launched the batch file everything was running nicely.

The batch file is as follows:

SET NAMES utf8;
DROP DATABASE museumoz;
CREATE DATABASE museumoz CHARACTER SET utf8 COLLATE utf8_general_ci;
USE museumoz;
CREATE TABLE naz_pt (
id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
nome VARCHAR(100)
) CHARACTER SET utf8;
INSERT INTO naz_pt (nome) VALUES ('Moçambique');
select * from naz_pt;

the last instruction is useful for checking the character ç

But instead of ç I get on screen a couple of ascii characters ( ├º )

There must be an error, but I'm unable to see it.

Any idea?

Thanks in advance

www.oreste.parlatano.com
Reply With Quote
  #2 (permalink)  
Old 05-21-2008
Willem Bogaerts
 
Posts: n/a
Default Re: utf8 encoding problem

Oreste wrote:
> The instruction show variables like 'character%' lists all variables
> correctly set to utf8.
>
> I wrote a batch file, before launching it, I checked the internal
> encoding with a hex editor, everything is ok, the font encoding is
> utf8.
>
> I launched the batch file everything was running nicely.
>
> The batch file is as follows:
>
> SET NAMES utf8;
> DROP DATABASE museumoz;
> CREATE DATABASE museumoz CHARACTER SET utf8 COLLATE utf8_general_ci;
> USE museumoz;
> CREATE TABLE naz_pt (
> id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
> nome VARCHAR(100)
> ) CHARACTER SET utf8;
> INSERT INTO naz_pt (nome) VALUES ('Moçambique');
> select * from naz_pt;
>
> the last instruction is useful for checking the character ç
>
> But instead of ç I get on screen a couple of ascii characters ( ├º )
>
> There must be an error, but I'm unable to see it.
>
> Any idea?


Well, yes. I don't know what system you are using, but you only have
instructed MySQL to use utf-8. So the INSERT command may insert a
differently encoded string if your batch file is not utf-8 encoded.
Likewise, the results are rendered by the operating system, not by
MySQL. So a better check is to check HEX(nome) and count the bytes if
hex results do not mean anything to you (two characters is one byte).

Best regards,
--
Willem Bogaerts

Application smith
Kratz B.V.
http://www.kratz.nl/
Reply With Quote
  #3 (permalink)  
Old 05-21-2008
Oreste
 
Posts: n/a
Default Re: utf8 encoding problem

On May 21, 10:49 am, Willem Bogaerts <w.bogae...@kratz.nl> wrote:
> Oreste wrote:
> > The instruction show variables like 'character%' lists all variables
> > correctly set to utf8.

>
> > I wrote a batch file, before launching it, I checked the internal
> > encoding with a hex editor, everything is ok, the font encoding is
> > utf8.

>
> > I launched the batch file everything was running nicely.

>
> > The batch file is as follows:

>
> > SET NAMES utf8;
> > DROP DATABASE museumoz;
> > CREATE DATABASE museumoz CHARACTER SET utf8 COLLATE utf8_general_ci;
> > USE museumoz;
> > CREATE TABLE naz_pt (
> > id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
> > nome VARCHAR(100)
> > ) CHARACTER SET utf8;
> > INSERT INTO naz_pt (nome) VALUES ('Moçambique');
> > select * from naz_pt;

>
> > the last instruction is useful for checking the character ç

>
> > But instead of ç I get on screen a couple of ascii characters ( ├º )

>
> > There must be an error, but I'm unable to see it.

>
> > Any idea?

>
> Well, yes. I don't know what system you are using, but you only have
> instructed MySQL to use utf-8. So the INSERT command may insert a
> differently encoded string if your batch file is not utf-8 encoded.
> Likewise, the results are rendered by the operating system, not by
> MySQL. So a better check is to check HEX(nome) and count the bytes if
> hex results do not mean anything to you (two characters is one byte).
>
> Best regards,
> --
> Willem Bogaerts
>
> Application smith
> Kratz B.V.http://www.kratz.nl/


Thanks for your quick reply,

you point the real problem a not yet uncovered mystery, the batch file
and in particular INSERT INTO naz_pt (nome) VALUES ('Moçambique')
appear to be correctly utf8 encoded by the system, I'm working under
Windows Vista. The specific value 'Moçambique' that contains the ç
character is correctly displayed by the hex editor.

But what happens inside mysql command line client?

The ascii utf8 table according to the following web page
http://kellyjones.netfirms.com/webto...f8_table.shtml
shows that the caracter ç has the decimal code 231 (hex 00e7) but when
I input the statement SELECT ascii('ç'); I get 135 as result.
According to the table the utf8 encoding for ç is equal to 0xc3,0xa7
if you type those values in the command line client you get on screen
the two ascii characters ç

At the moment it seems that the possible solution is to replace the
caracter with the code accepted by mysql command line client,
therefore
INSERT INTO naz_pt (nome) VALUES ('Moçambique');
becomes
INSERT INTO naz_pt (nome) VALUES (concat('Mo',char(135),'ambique'));

So even if it is quite painful, could be an acceptable solution, but
it does not work, I get the error code "incorrect string value"

I remind that show variables like 'character%' lists all variables
correctly set to utf8 furthemore I add SET NAMES utf8; at the top of
the batch file and CHARACTER SET utf8; for each create table
instruction.

It seems I have to spend some time before finding a suitable solution,
this is exactly the reason why a good programmer cannot have a good
social life.
Reply With Quote
  #4 (permalink)  
Old 05-21-2008
Chuck
 
Posts: n/a
Default Re: utf8 encoding problem

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Oreste wrote:

| It seems I have to spend some time before finding a suitable solution,
| this is exactly the reason why a good programmer cannot have a good
| social life.

Programming is a good programmers social life. :)

Sorry I can't add anything more helpful. What you are already doing
should work. By using SET NAMES you are telling mysql that incoming
statements and data are in utf8, and to return all results in utf8. All
I can think is that the string itself in the INSERT is not in UTF8.

If you save your output from the select to a file and compare it to the
INSERTed string, how do they compare?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with PCLinuxOS - http://enigmail.mozdev.org

iEYEARECAAYFAkg0IcEACgkQzIf+rZpn0oSK3gCeIaGA7Pj3NH cM/n3Qwb17EyOg
0jsAn1C1WJ/rDH+ap0dh2Xf6Cbw1cw1p
=TVoG
-----END PGP SIGNATURE-----
Reply With Quote
  #5 (permalink)  
Old 05-21-2008
Chuck
 
Posts: n/a
Default Re: utf8 encoding problem

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Oreste wrote:

|
| But instead of ç I get on screen a couple of ascii characters ( ├º )
|
|

I pretty much just duplicated your test and got the same results. That
is, until I changed the character encoding of the terminal emaulator I'm
using (konsole) from "default" to utf8. Then I saw the character the way
it's supposed to look.

HTH
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with PCLinuxOS - http://enigmail.mozdev.org

iEYEARECAAYFAkg0JVQACgkQzIf+rZpn0oQeBwCfWQpOi2RXCi IJ78KJeMQhUZpj
gv4AniO4450BWRQWBw0GCiKqs2R/cwu2
=cWhV
-----END PGP SIGNATURE-----
Reply With Quote
  #6 (permalink)  
Old 05-21-2008
Willem Bogaerts
 
Posts: n/a
Default Re: utf8 encoding problem

....
> At the moment it seems that the possible solution is to replace the
> caracter with the code accepted by mysql command line client,
> therefore
> INSERT INTO naz_pt (nome) VALUES ('Moçambique');
> becomes
> INSERT INTO naz_pt (nome) VALUES (concat('Mo',char(135),'ambique'));


If you really want to make sure that your statement survives reencoding,
use something like:

INSERT INTO naz_pt (nome) VALUES (_utf8 0x4D6FC3A7616D6269717565);

(which is the output of the HEX function, prepended by "0x" and the used
encoding)

> It seems I have to spend some time before finding a suitable solution,
> this is exactly the reason why a good programmer cannot have a good
> social life.


The coffee is ready. You are welcome to drop in if you like ;)

As chuck already said, it is probably your terminal window that renders
the characters wrongly. Xubuntu's terminal window renders strings it
just fine after you have given the "SET NAMES utf8" command.

Best regards,
--
Willem Bogaerts

Application smith
Kratz B.V.
http://www.kratz.nl/
Reply With Quote
  #7 (permalink)  
Old 05-21-2008
Oreste
 
Posts: n/a
Default Re: utf8 encoding problem

On May 21, 3:57 pm, Willem Bogaerts <w.bogae...@kratz.nl> wrote:
> ...
>
> > At the moment it seems that the possible solution is to replace the
> > caracter with the code accepted by mysql command line client,
> > therefore
> > INSERT INTO naz_pt (nome) VALUES ('Moçambique');
> > becomes
> > INSERT INTO naz_pt (nome) VALUES (concat('Mo',char(135),'ambique'));

>
> If you really want to make sure that your statement survives reencoding,
> use something like:
>
> INSERT INTO naz_pt (nome) VALUES (_utf8 0x4D6FC3A7616D6269717565);
>
> (which is the output of the HEX function, prepended by "0x" and the used
> encoding)
>
> > It seems I have to spend some time before finding a suitable solution,
> > this is exactly the reason why a good programmer cannot have a good
> > social life.

>
> The coffee is ready. You are welcome to drop in if you like ;)
>
> As chuck already said, it is probably your terminal window that renders
> the characters wrongly. Xubuntu's terminal window renders strings it
> just fine after you have given the "SET NAMES utf8" command.
>
> Best regards,
> --
> Willem Bogaerts
>
> Application smith
> Kratz B.V.http://www.kratz.nl/


It definetively an OS problem, I did the same tests on remote system
based on Linux via cpanel and everything is working fine.
It must be another headache gift from Microsoft.

Thanks again to both of you.

www.oreste.parlatano.com
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 08:56 AM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0