This is a discussion on Character Set Question within the alt.comp.lang.php forums, part of the PHP Programming Forums category; Adrian Nievergelt qrote: "...The only problem with UTF-8 is that some operating systems .... have no or hardly sufficient ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Adrian Nievergelt qrote:
"...The only problem with UTF-8 is that some operating systems .... have no or hardly sufficient support. About every modern system can handle unicode though..." Question: is iso-8859-1 unicode? is utf-8 unicode? What is unicode? Zach. |
|
|||
|
..oO(Zach)
>Adrian Nievergelt qrote: > >"...The only problem with UTF-8 is that some operating systems .... have >no or hardly sufficient support. About every modern system can handle >unicode though..." > >Question: > >is iso-8859-1 unicode? No. >is utf-8 unicode? No. But UTF-8 is an encoding for Unicode, where all characters are encoded as a sequence of 1 to 4 bytes. >What is unicode? http://en.wikipedia.org/wiki/Unicode Micha |
|
|||
|
"UTF-8 is not catered for properly by "some operating systems"
"Every system can handle Unicode" "ISO-8859-1 isn't Unicode" "UTF-8 isn't Unicode" "UTF-8 is an encoding for Unicode" + --------------------------------- Add this together and the outcome is .oO(Mich) Zach. Michael Fesser wrote: > .oO(Zach) > >> Adrian Nievergelt qrote: >> >> "...The only problem with UTF-8 is that some operating systems .... have >> no or hardly sufficient support. About every modern system can handle >> unicode though..." >> >> Question: >> >> is iso-8859-1 unicode? > > No. > >> is utf-8 unicode? > > No. But UTF-8 is an encoding for Unicode, where all characters are > encoded as a sequence of 1 to 4 bytes. > >> What is unicode? > > http://en.wikipedia.org/wiki/Unicode > > Micha |
|
|||
|
..oO(Zach)
> "UTF-8 is not catered for properly by "some operating systems" > "Every system can handle Unicode" > "ISO-8859-1 isn't Unicode" > "UTF-8 isn't Unicode" > "UTF-8 is an encoding for Unicode" > + --------------------------------- > Add this together and the outcome is Is what? It's really not that complicated. Actually I don't care about systems that can't handle Unicode, even the old NN4 can handle most of it. So I use it in all of my recent web projects without exceptions: From the database to my scripts to the final HTML pages - it's all UTF-8, which really makes things much easier (for example no ugly HTML character references anymore, except for a few special chars). Some words to the last two points from the list above: Simply spoken Unicode itself just assigns a number (a code point) to any character that's part of the standard. Until now there are nearly 100.000(!) chars registered, more than a million are currently possible. But of course now you have to find a way to transfer all these different numbers/code points to a client (a browser for example) in an efficient way. That's where the different encodings come into play. UTF-32 for example uses 32 bit (4 bytes) for all characters. This has the advantage of an equal size of every character in a string, but of course it wastes a lot of memory. UTF-8 on the contrary uses a variable char length. The most important characters (the entire ASCII charset) are encoded with just a single byte, all other characters require two or more bytes (up to 4). It still allows to display characters from the entire Unicode space. So Unicode is one thing, the used transfer encoding another. Micha |
|
|||
|
Micha,
Thank you for the explanation! Zach Michael Fesser wrote: > .oO(Zach) > >> "UTF-8 is not catered for properly by "some operating systems" >> "Every system can handle Unicode" >> "ISO-8859-1 isn't Unicode" >> "UTF-8 isn't Unicode" >> "UTF-8 is an encoding for Unicode" >> + --------------------------------- >> Add this together and the outcome is > > Is what? > > It's really not that complicated. Actually I don't care about systems > that can't handle Unicode, even the old NN4 can handle most of it. So I > use it in all of my recent web projects without exceptions: From the > database to my scripts to the final HTML pages - it's all UTF-8, which > really makes things much easier (for example no ugly HTML character > references anymore, except for a few special chars). > > Some words to the last two points from the list above: Simply spoken > Unicode itself just assigns a number (a code point) to any character > that's part of the standard. Until now there are nearly 100.000(!) chars > registered, more than a million are currently possible. But of course > now you have to find a way to transfer all these different numbers/code > points to a client (a browser for example) in an efficient way. > > That's where the different encodings come into play. UTF-32 for example > uses 32 bit (4 bytes) for all characters. This has the advantage of an > equal size of every character in a string, but of course it wastes a lot > of memory. UTF-8 on the contrary uses a variable char length. The most > important characters (the entire ASCII charset) are encoded with just a > single byte, all other characters require two or more bytes (up to 4). > It still allows to display characters from the entire Unicode space. > > So Unicode is one thing, the used transfer encoding another. > > Micha |
![]() |
| Thread Tools | |
| Display Modes | |
|
|