- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Tue, 10 Dec 2002 06:28:35 +0100
- To: "Molly E. Holzschlag" <molly@Molly.COM>
- Cc: public-evangelist@w3.org
* Molly E. Holzschlag wrote: >The first article is on properly specifying character sets. > >http://www.webstandards.org/learn/askw3c/dec2002.html > >Discussion can take place here. [...] There are several ways of specifying the character set for a particular document. Which of the following methods (or combination thereof) does the W3C recommend, and why? * Have the server administrator set the proper encoding via the HTTP headers returned by the Web server * Have the author add the encoding with a meta element * XHTML authors can add the character encoding using the XML prolog [...] The term "character set" should be avoided, see <http://www.w3.org/MarkUp/html-spec/charset-harmful.html>. The third item should be rephrased to match the prolog. It should be "XML declaration", not "XML prolog". [...] These three ways of providing the character encoding of a document are not equivalent. When trying to figure out the character encoding of a resource, user agents will try, in this order: * The HTTP Content-Type header sent by the server * The XML declaration (only for XHTML documents) * The HTML/XHTML meta element * Other ways. There are algorithms to guess the character encoding, for example [...] The XML declaration should be ignored for XHTML documents delievered as text/html (the HTML WG says, user agents should not use any heuristics to determine whether a document is HTML or XHTML and thus parse all XHTML documents delivered as text/html as beeing HTML and thus processing instructions (the XML declaration is a processing instruction from an HTML point-of-view) are to be ignored) (and this is what most user agents do), this information should be added to the document, it is otherwise confusing. [...] However, in at least two cases, this is simply not possible: * The document author does not have any way to configure the server to send the proper HTTP Content-Type header * The document is not served via HTTP. It's a standalone document, or served via MIME [...] MIME also has a Content-Type header, hence this is not a valid exception. [...] In these cases, an HTML document should provide the character encoding via a meta element, and an XML document must provide it via the XML declaration. [...] No, XML documents do not need to have an XML declaration in these cases if they are encoded using one of the default encodings (us-ascii for text/xml, utf-8 or utf-16 for most other cases). [...] Example of an HTML 4.01 document written in French with a UTF-8 encoding: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html lang="fr"> <head> <meta http-equiv="content-type" content="text/html; charset=UTF-8"> <title>Exemple de document HTML 4.01</title> </head> <body> <h1>Portrait Intérieur</h1> <h2>Rainer-Maria Rilke</h2> <p>Ce ne sont pas des souvenirs<br /> qui, en moi, t'entretiennent ;<br /> tu n'es pas non plus mienne<br /> par la force d'un beau désir.</p> </body> </html> [...] The "<br />"s have to be "<br>"s. regards.
Received on Tuesday, 10 December 2002 00:28:28 UTC