W3C home > Mailing lists > Public > public-evangelist@w3.org > December 2002

Re: WaSP Asks the W3C

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Tue, 10 Dec 2002 06:28:35 +0100
To: "Molly E. Holzschlag" <molly@Molly.COM>
Cc: public-evangelist@w3.org
Message-ID: <3dfb778b.106401797@smtp.bjoern.hoehrmann.de>

* Molly E. Holzschlag wrote:
>The first article is on properly specifying character sets.
>Discussion can take place here.

     There are several ways of specifying the character set for a
     particular document. Which of the following methods (or combination
     thereof) does the W3C recommend, and why?

       * Have the server administrator set the proper encoding via the
         HTTP headers returned by the Web server
       * Have the author add the encoding with a meta element
       * XHTML authors can add the character encoding using the XML

The term "character set" should be avoided, see
<http://www.w3.org/MarkUp/html-spec/charset-harmful.html>. The third
item should be rephrased to match the prolog. It should be "XML
declaration", not "XML prolog".

     These three ways of providing the character encoding of a document 
     are not equivalent. When trying to figure out the character
     encoding of a resource, user agents will try, in this order:

       * The HTTP Content-Type header sent by the server
       * The XML declaration (only for XHTML documents)
       * The HTML/XHTML meta element
       * Other ways. There are algorithms to guess the character
         encoding, for example

The XML declaration should be ignored for XHTML documents delievered as
text/html (the HTML WG says, user agents should not use any heuristics
to determine whether a document is HTML or XHTML and thus parse all
XHTML documents delivered as text/html as beeing HTML and thus
processing instructions (the XML declaration is a processing instruction
from an HTML point-of-view) are to be ignored) (and this is what most
user agents do), this information should be added to the document, it is
otherwise confusing.

     However, in at least two cases, this is simply not possible:
       * The document author does not have any way to configure the 
         server to send the proper HTTP Content-Type header
       * The document is not served via HTTP. It's a standalone
         document, or served via MIME

MIME also has a Content-Type header, hence this is not a valid

   In these cases, an HTML document should provide the character
   encoding via a meta element, and an XML document must provide
   it via the XML declaration.

No, XML documents do not need to have an XML declaration in these cases
if they are encoded using one of the default encodings (us-ascii for
text/xml, utf-8 or utf-16 for most other cases).

     Example of an HTML 4.01 document written in French with a UTF-8

  <html lang="fr">
  <meta http-equiv="content-type" content="text/html; charset=UTF-8">
  <title>Exemple de document HTML 4.01</title>
  <h1>Portrait Intérieur</h1>
  <h2>Rainer-Maria Rilke</h2>
  <p>Ce ne sont pas des souvenirs<br />
  qui, en moi, t'entretiennent ;<br />
  tu n'es pas non plus mienne<br />
  par la force d'un beau désir.</p>

The "<br />"s have to be "<br>"s.

Received on Tuesday, 10 December 2002 00:28:28 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:16:17 UTC