Re: fread and UTF-16 encodings
Chuck Anderson wrote:
> I am having trouble reading files that are using UTF-16 encoding.
>
> I noticed this when I started trying to read an xml file produced by
> Winamp. And now I see it in the id3tags of files created by Winamp as
> well.
OK. Is this application-related or a general question? If
application-related, you will have to learn the encoding from the
application's documentation itself or just by trying.
> When I use fread to get the file contents (a playlist), I get this:
>
> ÿþ<�?�x�m�l� �v�e�r�s�i�o�n�=�"�1�.�0ï¿ ½"� �e�n�c�o�d�
> i�n�g�=�"�U�T�F�-�1�6�"�?�>�<�p�l�a�y�l�iï¿ ½s�t�s� �p�l
> �a�y�l�i�s�t�s�=�"�7�"�>ï¿ ½<�p�l�a�y�l�i�s�t� �f�i�l�e�
> n�a�m�e�=�"�p�l�f�2�2�C�Dï ¿½.�m�3�u�8�"� ....
>
> (I have wrapped that text manually)
>
> When I pass that string to xml_parse, however, it properly decodes it
> and gives me this:
>
> <PLAYLISTS PLAYLISTS="7"><PLAYLIST FILENAME="plf22CD.m3u8 ....
>
> How do I detect when file data is encoded like this, and then how should
> I work with it?
If you get the file from the internet or serve it yourself, the encoding
is in the Content-Type header. You have just discovered how stupid it is
to use a meta tag for that: you can't read the encoding because it is
encoded in the unknown encoding! It is similar to locking the key to a
safe inside it. But you'd be amazed by how many applications lock away
the key...
Anyway, there are two utf-16 encodings: Big Endian and Little Endian
(often abbreviated to utf-16 BE and utf-16 LE). The difference is in the
order of the byte pairs.
You should be able to convert them with the mb_string functions.
> I have found utf16_decode on the php.net site, but when I use that
> function I get an empty string.
>
I could not find that function, so is it defined and is your error
handling ignoring undefined functions?
Good luck!
|