- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Tue, 10 Dec 2002 06:28:35 +0100
- To: "Molly E. Holzschlag" <molly@Molly.COM>
- Cc: public-evangelist@w3.org
* Molly E. Holzschlag wrote:
>The first article is on properly specifying character sets.
>
>http://www.webstandards.org/learn/askw3c/dec2002.html
>
>Discussion can take place here.
[...]
There are several ways of specifying the character set for a
particular document. Which of the following methods (or combination
thereof) does the W3C recommend, and why?
* Have the server administrator set the proper encoding via the
HTTP headers returned by the Web server
* Have the author add the encoding with a meta element
* XHTML authors can add the character encoding using the XML
prolog
[...]
The term "character set" should be avoided, see
<http://www.w3.org/MarkUp/html-spec/charset-harmful.html>. The third
item should be rephrased to match the prolog. It should be "XML
declaration", not "XML prolog".
[...]
These three ways of providing the character encoding of a document
are not equivalent. When trying to figure out the character
encoding of a resource, user agents will try, in this order:
* The HTTP Content-Type header sent by the server
* The XML declaration (only for XHTML documents)
* The HTML/XHTML meta element
* Other ways. There are algorithms to guess the character
encoding, for example
[...]
The XML declaration should be ignored for XHTML documents delievered as
text/html (the HTML WG says, user agents should not use any heuristics
to determine whether a document is HTML or XHTML and thus parse all
XHTML documents delivered as text/html as beeing HTML and thus
processing instructions (the XML declaration is a processing instruction
from an HTML point-of-view) are to be ignored) (and this is what most
user agents do), this information should be added to the document, it is
otherwise confusing.
[...]
However, in at least two cases, this is simply not possible:
* The document author does not have any way to configure the
server to send the proper HTTP Content-Type header
* The document is not served via HTTP. It's a standalone
document, or served via MIME
[...]
MIME also has a Content-Type header, hence this is not a valid
exception.
[...]
In these cases, an HTML document should provide the character
encoding via a meta element, and an XML document must provide
it via the XML declaration.
[...]
No, XML documents do not need to have an XML declaration in these cases
if they are encoded using one of the default encodings (us-ascii for
text/xml, utf-8 or utf-16 for most other cases).
[...]
Example of an HTML 4.01 document written in French with a UTF-8
encoding:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html lang="fr">
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<title>Exemple de document HTML 4.01</title>
</head>
<body>
<h1>Portrait Intérieur</h1>
<h2>Rainer-Maria Rilke</h2>
<p>Ce ne sont pas des souvenirs<br />
qui, en moi, t'entretiennent ;<br />
tu n'es pas non plus mienne<br />
par la force d'un beau désir.</p>
</body>
</html>
[...]
The "<br />"s have to be "<br>"s.
regards.
Received on Tuesday, 10 December 2002 00:28:28 UTC