Problem with length of Multibyte String

This is a discussion on Problem with length of Multibyte String within the PHP Language forums, part of the PHP Programming Forums category; Hi all, I want to write some UTF-8 Chinese characters to file with following php codes: <code> ....... $...


Go Back   Usenet Forums > PHP Programming Forums > PHP Language

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 09-05-2004
lian
 
Posts: n/a
Default Problem with length of Multibyte String

Hi all,
I want to write some UTF-8 Chinese characters to file with following
php codes:

<code>
.......
$fp = fopen($filepath,'wb');
fwrite($fp,$utf8string,strlen($utf8string));
fclose($fp);
........
</code>

Problem happened on function "strlen". utf-8 string consists of
multibye characters. Every characters have more than one byte. But to
function "strlen", every character is just one byte.
Take Chinese character "我" for example, its utf-8 code is "0xE6
0x88 0x91“,obviously 3 bytes, but strlen return 1 byte. And then
function "fwrite" just write 1 byte to the file.
So I wonder if there are any way to get actural length of multibyte
string in PHP?
Thank you for suggestions!
Reply With Quote
  #2 (permalink)  
Old 09-05-2004
Alvaro G. Vicario
 
Posts: n/a
Default Re: Problem with length of Multibyte String

*** lian escribió/wrote (Sun, 05 Sep 2004 17:27:28 +0800):
> Problem happened on function "strlen". utf-8 string consists of
> multibye characters. Every characters have more than one byte. But to
> function "strlen", every character is just one byte.


There's a chapter in PHP manual titled "Multi-Byte String Functions". There
you have info about mb_strlen().

It's an extension so if it isn't installed in your server you'll probably
have to write your own function. There're some user comments about it in
strlent() manual page.

In any case, please note that the length parameter in fwrite is optional.
If not set, it'll write the whole string.

--
--
-+ Álvaro G. Vicario - Burgos, Spain - ICQ 46788716
+- http://www.demogracia.com (la web de humor para mayores de 100 años)
++ «Sonríe, que te vamos a hacer una foto para la esquela»
--
Reply With Quote
  #3 (permalink)  
Old 09-05-2004
Brion Vibber
 
Posts: n/a
Default Re: Problem with length of Multibyte String

lian wrote:
> Problem happened on function "strlen". utf-8 string consists of
> multibye characters. Every characters have more than one byte. But to
> function "strlen", every character is just one byte.


Is mbstring.func_overload on? If so, this is overridding the normal
strlen function with mb_strlen which returns the number of characters
instead of bytes. Try turning it off.

See http://www.php.net/mb_string

-- brion vibber (brion @ pobox.com)
Reply With Quote
  #4 (permalink)  
Old 09-05-2004
Chung Leong
 
Posts: n/a
Default Re: Problem with length of Multibyte String

"lian" <lian@fed.com> wrote in message news:2q04g4Fokjp2U1@uni-berlin.de...
> Hi all,
> I want to write some UTF-8 Chinese characters to file with following
> php codes:
>
> <code>
> .......
> $fp = fopen($filepath,'wb');
> fwrite($fp,$utf8string,strlen($utf8string));
> fclose($fp);
> ........
> </code>
>
> Problem happened on function "strlen". utf-8 string consists of
> multibye characters. Every characters have more than one byte. But to
> function "strlen", every character is just one byte.
> Take Chinese character "?" for example, its utf-8 code is "0xE6
> 0x88 0x91",obviously 3 bytes, but strlen return 1 byte. And then
> function "fwrite" just write 1 byte to the file.
> So I wonder if there are any way to get actural length of multibyte
> string in PHP?


First you all, you don't need to pass the length to fwrite() if you want to
whole string written. Just fwrite($fp, $utf8string) will do.

Second, your description of strlen() is wrong. It returns the byte count,
never the Unicode character count.


Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 10:40 PM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0