W3C home > Mailing lists > Public > public-evangelist@w3.org > December 2002

Re: WaSP Asks the W3C

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Tue, 10 Dec 2002 06:28:35 +0100
To: "Molly E. Holzschlag" <molly@Molly.COM>
Cc: public-evangelist@w3.org
Message-ID: <3dfb778b.106401797@smtp.bjoern.hoehrmann.de>

* Molly E. Holzschlag wrote:
>The first article is on properly specifying character sets.
>
>http://www.webstandards.org/learn/askw3c/dec2002.html
>
>Discussion can take place here.

[...]
     There are several ways of specifying the character set for a
     particular document. Which of the following methods (or combination
     thereof) does the W3C recommend, and why?

       * Have the server administrator set the proper encoding via the
         HTTP headers returned by the Web server
       * Have the author add the encoding with a meta element
       * XHTML authors can add the character encoding using the XML
         prolog
[...]

The term "character set" should be avoided, see
<http://www.w3.org/MarkUp/html-spec/charset-harmful.html>. The third
item should be rephrased to match the prolog. It should be "XML
declaration", not "XML prolog".

[...]
     These three ways of providing the character encoding of a document 
     are not equivalent. When trying to figure out the character
     encoding of a resource, user agents will try, in this order:

       * The HTTP Content-Type header sent by the server
       * The XML declaration (only for XHTML documents)
       * The HTML/XHTML meta element
       * Other ways. There are algorithms to guess the character
         encoding, for example
[...]

The XML declaration should be ignored for XHTML documents delievered as
text/html (the HTML WG says, user agents should not use any heuristics
to determine whether a document is HTML or XHTML and thus parse all
XHTML documents delivered as text/html as beeing HTML and thus
processing instructions (the XML declaration is a processing instruction
from an HTML point-of-view) are to be ignored) (and this is what most
user agents do), this information should be added to the document, it is
otherwise confusing.

[...]
     However, in at least two cases, this is simply not possible:
       * The document author does not have any way to configure the 
         server to send the proper HTTP Content-Type header
       * The document is not served via HTTP. It's a standalone
         document, or served via MIME
[...]

MIME also has a Content-Type header, hence this is not a valid
exception.

[...]
   In these cases, an HTML document should provide the character
   encoding via a meta element, and an XML document must provide
   it via the XML declaration.
[...]

No, XML documents do not need to have an XML declaration in these cases
if they are encoded using one of the default encodings (us-ascii for
text/xml, utf-8 or utf-16 for most other cases).

[...]
     Example of an HTML 4.01 document written in French with a UTF-8
     encoding:

  <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">
  
  <html lang="fr">
  
  <head>
  <meta http-equiv="content-type" content="text/html; charset=UTF-8">
  
  <title>Exemple de document HTML 4.01</title>
  </head>
  
  <body>
  <h1>Portrait Intérieur</h1>
  <h2>Rainer-Maria Rilke</h2>
  
  <p>Ce ne sont pas des souvenirs<br />
  qui, en moi, t'entretiennent ;<br />
  tu n'es pas non plus mienne<br />
  par la force d'un beau désir.</p>
  </body>
  </html>
[...]

The "<br />"s have to be "<br>"s.

regards.
Received on Tuesday, 10 December 2002 00:28:28 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 15 July 2011 00:13:21 GMT