This is a discussion on multibyte post data automatically encoded? within the alt.comp.lang.php forums, part of the PHP Programming Forums category; Hi, When I post some multibyte chars to a test script, the html output is presented as numeric unicode entities, ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Hi,
When I post some multibyte chars to a test script, the html output is presented as numeric unicode entities, like ڸ etc, even though I didn't use any encoding functions on it. For instance: some arabic chars came out as ﺙﺀﺐ I looked at the mbstring settings in phpinfo but it's all set to: off, NULL, pass, neutral Could somebody explain this to me? Here's the test script I used: <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/> </head> <body> <?php echo $_POST['val']; ?> <form method="post"> <textarea name="val"></textarea><br> <input type="submit" value="submit"> </form> </body> </html> Thanks, Jos |
|
|||
|
On Wed, 28 Sep 2005 19:07:17 +0200, Jos van Uden <no@spam.nl> wrote:
>When I post some multibyte chars to a test script, >the html output is presented as numeric unicode entities, >like ڸ etc, even though I didn't use any encoding >functions on it. > >For instance: some arabic chars came out as >ﺙﺀﺐ > >I looked at the mbstring settings in phpinfo but >it's all set to: off, NULL, pass, neutral > >Could somebody explain this to me? > >Here's the test script I used: > ><html> ><head> ><meta http-equiv="Content-Type" >content="text/html; charset=iso-8859-1"/> Browsers will typically send HTML entity encoded values when you paste in characters that are not present in the encoding of the page - IE and Firefox do, at least. It's the client that's doing it, it's not a setting in PHP. If you want to avoid this, you likely need to send the page encoded in utf-8, so the browser can send utf-8 back again, which covers pretty much everything. You can then work out how you want to handle characters outside of your target encoding yourself. -- Andy Hassall :: andy@andyh.co.uk :: http://www.andyh.co.uk http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool |
|
|||
|
Andy Hassall wrote:
> On Wed, 28 Sep 2005 19:07:17 +0200, Jos van Uden <no@spam.nl> wrote: > > >>When I post some multibyte chars to a test script, >>the html output is presented as numeric unicode entities, >>like ڸ etc, even though I didn't use any encoding >>functions on it. > Browsers will typically send HTML entity encoded values when you paste in > characters that are not present in the encoding of the page - IE and Firefox > do, at least. It's the client that's doing it, it's not a setting in PHP. I see. Of course. Another question, if I may: how does the browser tell the difference between multiple bytes and multibytes? Is it inferred from the bit patterns? Thank you, Jos |
![]() |
| Thread Tools | |
| Display Modes | |
|
|