This is a discussion on UTF-8 not decoding within the PHP Language forums, part of the PHP Programming Forums category; Hi, I am opening a stream that is UTF encoded. I use fgetc to read the stream- which is binary ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Hi,
I am opening a stream that is UTF encoded. I use fgetc to read the stream- which is binary safe. I add every character read to a string. But when I look at the stream, I see some characters with a bunch of "?" question markets, and then utf8_decode has no effect on it either. How do you go about decoding utf. Does adding the characters to the string somehow mess it up. Please help. Running 4.3.4 PHP on Win. -- http://www.dbForumz.com/ This article was posted by author's request Articles individually checked for conformance to usenet standards Topic URL: http://www.dbForumz.com/PHP-UTF-deco...ict138860.html Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=464220 |
|
|||
|
"steve" <UseLinkToEmail@dbForumz.com> wrote in message
news:411ab511$1_7@news.athenanews.com... > Hi, > I am opening a stream that is UTF encoded. I use fgetc to read the > stream- which is binary safe. I add every character read to a string. > > > But when I look at the stream, I see some characters with a bunch of > "?" question markets, and then utf8_decode has no effect on it > either. Question marks means that there're Unicode characters that aren't found within the current codepage. Basically the characters are there, they're just represented by ?s. utf8_decode() does have an effect: it replaces characters outside of ISO-8859-1 with question marks. > How do you go about decoding utf. Does adding the characters to the > string somehow mess it up. Please help. Running 4.3.4 PHP on Win. The question is, what do you mean by decoding UTF8. Using fgetc on UTF8 text is not a good idea, since one Unicode character can span multiple bytes. |
|
|||
|
"Chung Leong" wrote:
> "steve" <UseLinkToEmail@dbForumz.com> wrote in message > news:411ab511 _7@news.athenanews.com... > > Hi, > > I am opening a stream that is UTF encoded. I use fgetc to read > the > > stream- which is binary safe. I add every character read to a > string. > > > > > > But when I look at the stream, I see some characters with a bunch > of > > "?" question markets, and then utf8_decode has no effect on it > > either. > > Question marks means that there’re Unicode characters that > aren’t found > within the current codepage. Basically the characters are there, > they’re > just represented by ?s. > > utf8_decode() does have an effect: it replaces characters outside of > ISO-8859-1 with question marks. > > > How do you go about decoding utf. Does adding the characters to > the > > string somehow mess it up. Please help. Running 4.3.4 PHP on > Win. > > The question is, what do you mean by decoding UTF8. Using fgetc on > UTF8 text > is not a good idea, since one Unicode character can span multiple > bytes. Thanks, Chung. I am interested in decoding usenet message headers that look like this: "=?Utf-8?B?YmVsZGVyYXo=?=" -- http://www.dbForumz.com/ This article was posted by author's request Articles individually checked for conformance to usenet standards Topic URL: http://www.dbForumz.com/PHP-UTF-deco...ict138860.html Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=464367 |
|
|||
|
"steve" wrote:
> [quote:eff0459c7e="Chung Leong"]"steve" > <UseLinkToEmail@dbForumz.com> wrote in message > news:411ab511 _7@news.athenanews.com... > > Hi, > > I am opening a stream that is UTF encoded. I use fgetc to read > the > > stream- which is binary safe. I add every character read to a > string. > > > > > > But when I look at the stream, I see some characters with a bunch > of > > "?" question markets, and then utf8_decode has no effect on it > > either. > > Question marks means that there’re Unicode characters that > aren’t found > within the current codepage. Basically the characters are there, > they’re > just represented by ?s. > > utf8_decode() does have an effect: it replaces characters outside of > ISO-8859-1 with question marks. > > > How do you go about decoding utf. Does adding the characters to > the > > string somehow mess it up. Please help. Running 4.3.4 PHP on > Win. > > The question is, what do you mean by decoding UTF8. Using fgetc on > UTF8 text > is not a good idea, since one Unicode character can span multiple > bytes. Thanks, Chung. I am interested in decoding usenet message headers that look like this: "=?Utf-8?B?YmVsZGVyYXo=?="[/quote:eff0459c7e] Ok, figured it out. Take a string like this: $instr = "=?Utf-8?B?YmVsZGVyYXo=?=" and feed it as argument to this function: function decode_subject( $instr ) { $enstr = $instr; while( preg_match( ’/^([^?]+)?=\?[^?]+\?(B|Q)\?([^?]+)=?=?\?=(.+)?$/i’, $enstr, $match ) ) { if( $match[2] == ’b’ || $match[2] == ’B’ ) $enstr = $match[1] . base64_decode( $match[3] ) . (isset($match[4])?$match[4]:’’); else $enstr = $match[1] . quoted_printable_decode( $match[3] ); } return( $enstr ); } and it will return the ascii equivalent. The function is included in: PHP Newsreader http://pnews.sourceforge.net/ -- http://www.dbForumz.com/ This article was posted by author's request Articles individually checked for conformance to usenet standards Topic URL: http://www.dbForumz.com/PHP-UTF-deco...ict138860.html Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=464416 |