how to tell server from PHP that charset is UTF-8??

This is a discussion on how to tell server from PHP that charset is UTF-8?? within the PHP Language forums, part of the PHP Programming Forums category; Andy Hassall <andy@andyh.co.uk> wrote: > OK - so we officially default to ISO-8859-1, at ...


Go Back   Usenet Forums > PHP Programming Forums > PHP Language

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #11 (permalink)  
Old 09-21-2004
Daniel Tryba
 
Posts: n/a
Default Re: how to tell server that charset is UTF-8??

Andy Hassall <andy@andyh.co.uk> wrote:
> OK - so we officially default to ISO-8859-1, at least for text/* content
> types, which is a superset of ASCII, but definitely a well-defined character
> set and not just a raw stream of bytes. Makes sense.


Completely true... almost. text/html has unicode as it characterset
accoding to w3c[1], the charset header is nothing more than the encoding
used to transport the data. iso-8859-1 is the best choice if you need
upto the first 256 characters in unicode. If one needs more characters
the utf-x encodings should be used.

[1] http://www.w3.org/TR/html401/charset.html

--

Daniel Tryba

Reply With Quote
  #12 (permalink)  
Old 09-22-2004
John Dunlop
 
Posts: n/a
Default Re: how to tell server that charset is UTF-8??

Daniel Tryba wrote:

> Andy Hassall <andy@andyh.co.uk> wrote:


> > OK - so we officially default to ISO-8859-1, at least for text/* content
> > types, which is a superset of ASCII, but definitely a well-defined character
> > set and not just a raw stream of bytes. Makes sense.

>
> Completely true... almost. text/html has unicode as it characterset
> accoding to w3c[1],


'Character set', with or without a space, breeds confusion.

http://www.w3.org/MarkUp/html-spec/charset-harmful.html

If by 'characterset' you meant HTML4.01's document character
set, you're right. But HTML's document character set is
unrelated to this discussion. If however you meant
character encoding, you're wrong, because any encoding is
allowed. Did you mean something else?

RFC2854 sec. 6 lists sources that specify the default when a
text/html document is served without explicitly declaring
its character encoding. Despite RFC2616 defining text/*'s
default character encoding as ISO-8859-1, HTML4.01
conforming user-agents mustn't assume any default value:

'The HTTP protocol ([RFC2616], section 3.7.1) mentions ISO-
8859-1 as a default character encoding when the "charset"
parameter is absent from the "Content-Type" header field. In
practice, this recommendation has proved useless because
some servers don't allow a "charset" parameter to be sent,
and others may not be configured to send the parameter.
Therefore, user agents must not assume any default value for
the "charset" parameter.' (HTML4.01 sec. 5.2.2.)

So it'd be absurd to heed the advice given in RFC2616 sec.
19.3, which says that 'not labelling the entity is preferred
over labelling the entity with the labels US-ASCII or ISO-
8859-1'. The usual ciwa* recommendation stands, discord
notwithstanding: send a charset parameter.

[ ... ]

Roll on the weekend!

--
Jock
Reply With Quote
  #13 (permalink)  
Old 09-22-2004
lawrence
 
Posts: n/a
Default Re: how to tell server that charset is UTF-8??

Andy Hassall <andy@andyh.co.uk> wrote in message news:<36v0l0d9sm2t2f1e0n9s51f3ajc692boda@4ax.com>. ..
> OK, so Apache sends out a character set heading under the recommended
> configuration - although it's effectively hardcoded; it doesn't 'detect' the
> encoding of the file since that's basically impossible in isolation.
>
> To get Apache to send out a character set header for a specific file, you'd
> then need to use Apache content negotiation if you wanted to select a different
> character set for a particular file - either with a type-map or I believe it
> can base it off suffixes of the filename (index.html.iso8859-p15 and so on).
>
> Consider the following response from Apache:
>
> andyh@server:~/public_html$ touch utf8.html.utf8
> andyh@server:~/public_html$ telnet localhost 80
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> HEAD /~andyh/utf8.html HTTP/1.0
>
> HTTP/1.1 200 OK
> Date: Tue, 21 Sep 2004 19:19:03 GMT
> Server: Apache/2.0.51 (Unix) PHP/5.0.1 DAV/2 SVN/1.0.6
> Content-Location: utf8.html.utf8
> Vary: negotiate
> TCN: choice
> Last-Modified: Tue, 21 Sep 2004 19:18:47 GMT
> ETag: "3811f-0-7f9b93c0;7f9b93c0"
> Accept-Ranges: bytes
> Connection: close
> Content-Type: text/html; charset=utf-8
>
> Connection closed by foreign host.
>
> OK - so a filename of utf8.html.utf8 means that a request for utf8.html comes
> out in utf8 encoding. (I've got content negotiation enabled on my server).
>
> Presumably in the case of multiple encodings for the same URI then the
> browser's Accept-charset header comes into play for Apache to pick which to
> serve.


That's very interesting. Thanks for doing that bit of digging.

I'm sorry to say I've temporarily been handed responsibility for
keeping an Apache server going, though I don't know much about Apache.
We're hosting about 30 different domains on this machine. Most of
those domains have individuals who are handling all the web design for
that domain. If I set a default charset for Apache, how do the
individual web designers override the decision, if they need to? An
..htaccess file? http-equiv meta tags?

Just curious.
Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT +1. The time now is 09:07 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0