View Single Post

  #6 (permalink)  
Old 04-23-2008
Robert William Vesterman
 
Posts: n/a
Default Re: [PHP] mb_convert_encoding converting to ASCII instead of UTF-8

I wasn't saying I was /telling/ it to go from UTF-8 to ASCII. I was
saying it /was/ going from UTF-8 to ASCII, despite the fact that I was
telling it to go from UTF-8 to UTF-8.

And as noted previously in this thread, it turned out to be because
mb_detect_encoding was /mistakenly/ detecting it as UTF-8 in the first
place. It was actually ISO-8859-1, not UTF-8. So when I told it to
convert from UTF-8 (which mb_detect_encoding said it was),
mb_convert_encoding ran into a non-UTF-8 character (the ņ), and so threw
it away. The generated output was therefore all straight ASCII
characters, which mb_detect_encoding therefore said was ASCII.

tedd wrote:
> At 11:28 AM -0400 4/23/08, Robert William Vesterman wrote:
>> A little additional info: The "ASCII to ASCII" case for
>> "Minnie=Mouse" is merely because the UTF-8 encoding for "Mouse" is
>> the same as the ASCII encoding for "Mouse", and mb_detect_encoding is
>> matching on ASCII before UTF-8. So that's not an issue.
>>
>> But, the "UTF-8 to ASCII" case for "Minnie=Miņoso" is still
>> (seemingly) screwy.

>
> Going for "UTF-8 to ASCII" is not going to work. The ASCII to UTF-8
> works because ASCII is contained within UTF8. But the reverse is not
> true. Not all of UTF-8 is contained within ASCII.
>
> For example, the character (code-point) ņ does not appear in ASCII, so
> that doesn't work.
>
> Cheers,
>
> tedd
>


Reply With Quote