Character Set Question

This is a discussion on Character Set Question within the alt.comp.lang.php forums, part of the PHP Programming Forums category; Adrian Nievergelt qrote: "...The only problem with UTF-8 is that some operating systems .... have no or hardly sufficient ...


Go Back   Usenet Forums > PHP Programming Forums > alt.comp.lang.php

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 11-28-2007
Zach
 
Posts: n/a
Default Character Set Question

Adrian Nievergelt qrote:

"...The only problem with UTF-8 is that some operating systems .... have
no or hardly sufficient support. About every modern system can handle
unicode though..."

Question:

is iso-8859-1 unicode?
is utf-8 unicode?

What is unicode?

Zach.
Reply With Quote
  #2 (permalink)  
Old 11-28-2007
Michael Fesser
 
Posts: n/a
Default Re: Character Set Question

..oO(Zach)

>Adrian Nievergelt qrote:
>
>"...The only problem with UTF-8 is that some operating systems .... have
>no or hardly sufficient support. About every modern system can handle
>unicode though..."
>
>Question:
>
>is iso-8859-1 unicode?


No.

>is utf-8 unicode?


No. But UTF-8 is an encoding for Unicode, where all characters are
encoded as a sequence of 1 to 4 bytes.

>What is unicode?


http://en.wikipedia.org/wiki/Unicode

Micha
Reply With Quote
  #3 (permalink)  
Old 11-28-2007
Zach
 
Posts: n/a
Default Re: Character Set Question

"UTF-8 is not catered for properly by "some operating systems"
"Every system can handle Unicode"
"ISO-8859-1 isn't Unicode"
"UTF-8 isn't Unicode"
"UTF-8 is an encoding for Unicode"
+ ---------------------------------
Add this together and the outcome is
.oO(Mich)

Zach.


Michael Fesser wrote:
> .oO(Zach)
>
>> Adrian Nievergelt qrote:
>>
>> "...The only problem with UTF-8 is that some operating systems .... have
>> no or hardly sufficient support. About every modern system can handle
>> unicode though..."
>>
>> Question:
>>
>> is iso-8859-1 unicode?

>
> No.
>
>> is utf-8 unicode?

>
> No. But UTF-8 is an encoding for Unicode, where all characters are
> encoded as a sequence of 1 to 4 bytes.
>
>> What is unicode?

>
> http://en.wikipedia.org/wiki/Unicode
>
> Micha

Reply With Quote
  #4 (permalink)  
Old 11-29-2007
Michael Fesser
 
Posts: n/a
Default Re: Character Set Question

..oO(Zach)

> "UTF-8 is not catered for properly by "some operating systems"
> "Every system can handle Unicode"
> "ISO-8859-1 isn't Unicode"
> "UTF-8 isn't Unicode"
> "UTF-8 is an encoding for Unicode"
> + ---------------------------------
> Add this together and the outcome is


Is what?

It's really not that complicated. Actually I don't care about systems
that can't handle Unicode, even the old NN4 can handle most of it. So I
use it in all of my recent web projects without exceptions: From the
database to my scripts to the final HTML pages - it's all UTF-8, which
really makes things much easier (for example no ugly HTML character
references anymore, except for a few special chars).

Some words to the last two points from the list above: Simply spoken
Unicode itself just assigns a number (a code point) to any character
that's part of the standard. Until now there are nearly 100.000(!) chars
registered, more than a million are currently possible. But of course
now you have to find a way to transfer all these different numbers/code
points to a client (a browser for example) in an efficient way.

That's where the different encodings come into play. UTF-32 for example
uses 32 bit (4 bytes) for all characters. This has the advantage of an
equal size of every character in a string, but of course it wastes a lot
of memory. UTF-8 on the contrary uses a variable char length. The most
important characters (the entire ASCII charset) are encoded with just a
single byte, all other characters require two or more bytes (up to 4).
It still allows to display characters from the entire Unicode space.

So Unicode is one thing, the used transfer encoding another.

Micha
Reply With Quote
  #5 (permalink)  
Old 11-29-2007
Zach
 
Posts: n/a
Default Re: Character Set Question

Micha,

Thank you for the explanation!

Zach

Michael Fesser wrote:
> .oO(Zach)
>
>> "UTF-8 is not catered for properly by "some operating systems"
>> "Every system can handle Unicode"
>> "ISO-8859-1 isn't Unicode"
>> "UTF-8 isn't Unicode"
>> "UTF-8 is an encoding for Unicode"
>> + ---------------------------------
>> Add this together and the outcome is

>
> Is what?
>
> It's really not that complicated. Actually I don't care about systems
> that can't handle Unicode, even the old NN4 can handle most of it. So I
> use it in all of my recent web projects without exceptions: From the
> database to my scripts to the final HTML pages - it's all UTF-8, which
> really makes things much easier (for example no ugly HTML character
> references anymore, except for a few special chars).
>
> Some words to the last two points from the list above: Simply spoken
> Unicode itself just assigns a number (a code point) to any character
> that's part of the standard. Until now there are nearly 100.000(!) chars
> registered, more than a million are currently possible. But of course
> now you have to find a way to transfer all these different numbers/code
> points to a client (a browser for example) in an efficient way.
>
> That's where the different encodings come into play. UTF-32 for example
> uses 32 bit (4 bytes) for all characters. This has the advantage of an
> equal size of every character in a string, but of course it wastes a lot
> of memory. UTF-8 on the contrary uses a variable char length. The most
> important characters (the entire ASCII charset) are encoded with just a
> single byte, all other characters require two or more bytes (up to 4).
> It still allows to display characters from the entire Unicode space.
>
> So Unicode is one thing, the used transfer encoding another.
>
> Micha

Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 02:35 PM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0