multibyte post data automatically encoded?

This is a discussion on multibyte post data automatically encoded? within the alt.comp.lang.php forums, part of the PHP Programming Forums category; Hi, When I post some multibyte chars to a test script, the html output is presented as numeric unicode entities, ...


Go Back   Usenet Forums > PHP Programming Forums > alt.comp.lang.php

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 09-28-2005
Jos van Uden
 
Posts: n/a
Default multibyte post data automatically encoded?

Hi,

When I post some multibyte chars to a test script,
the html output is presented as numeric unicode entities,
like ڸ etc, even though I didn't use any encoding
functions on it.

For instance: some arabic chars came out as
ﺙﺀﺐ

I looked at the mbstring settings in phpinfo but
it's all set to: off, NULL, pass, neutral

Could somebody explain this to me?

Here's the test script I used:


<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1"/>
</head>
<body>

<?php echo $_POST['val']; ?>

<form method="post">
<textarea name="val"></textarea><br>
<input type="submit" value="submit">
</form>

</body>
</html>


Thanks,

Jos
Reply With Quote
  #2 (permalink)  
Old 09-28-2005
Andy Hassall
 
Posts: n/a
Default Re: multibyte post data automatically encoded?

On Wed, 28 Sep 2005 19:07:17 +0200, Jos van Uden <no@spam.nl> wrote:

>When I post some multibyte chars to a test script,
>the html output is presented as numeric unicode entities,
>like ڸ etc, even though I didn't use any encoding
>functions on it.
>
>For instance: some arabic chars came out as
>ﺙﺀﺐ
>
>I looked at the mbstring settings in phpinfo but
>it's all set to: off, NULL, pass, neutral
>
>Could somebody explain this to me?
>
>Here's the test script I used:
>
><html>
><head>
><meta http-equiv="Content-Type"
>content="text/html; charset=iso-8859-1"/>


Browsers will typically send HTML entity encoded values when you paste in
characters that are not present in the encoding of the page - IE and Firefox
do, at least. It's the client that's doing it, it's not a setting in PHP.

If you want to avoid this, you likely need to send the page encoded in utf-8,
so the browser can send utf-8 back again, which covers pretty much everything.
You can then work out how you want to handle characters outside of your target
encoding yourself.
--
Andy Hassall :: andy@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
Reply With Quote
  #3 (permalink)  
Old 09-28-2005
Jos van Uden
 
Posts: n/a
Default Re: multibyte post data automatically encoded?

Andy Hassall wrote:
> On Wed, 28 Sep 2005 19:07:17 +0200, Jos van Uden <no@spam.nl> wrote:
>
>
>>When I post some multibyte chars to a test script,
>>the html output is presented as numeric unicode entities,
>>like ڸ etc, even though I didn't use any encoding
>>functions on it.



> Browsers will typically send HTML entity encoded values when you paste in
> characters that are not present in the encoding of the page - IE and Firefox
> do, at least. It's the client that's doing it, it's not a setting in PHP.


I see. Of course.

Another question, if I may: how does the browser tell
the difference between multiple bytes and multibytes?
Is it inferred from the bit patterns?

Thank you,

Jos
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 08:48 AM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0