W3C home > Mailing lists > Public > public-html@w3.org > June 2009

Re: Auto-detect and encodings in HTML5

From: Jirka Kosek <jirka@kosek.cz>
Date: Fri, 12 Jun 2009 16:19:24 +0200
Message-ID: <4A3263EC.1040000@kosek.cz>
To: Leif Halvard Silli <lhs@malform.no>
CC: Ian Hickson <ian@hixie.ch>, public-html@w3.org
Leif Halvard Silli wrote:

> Thus, if a file has the name "file.html.utf8", then UAs should, when
> reading that file via the file URL protocol give precedence to the
> encoding expressed by the file suffix.
> Thus, I would suggest that HTML 5 a) specifies the file suffixes for all
> the encodings that it endorses (building on those that Apache by default
> uses), b) recommend Web browsers to recognize these suffixes, when
> reading files via file://

I don't think that this is good idea:

First, on majority of systems, files ending with suffixes like .utf8
will not be simply opened in a web browser at all.

Second, filename is too fragile to convey any reasonable metadata about
content. User can easily change extension and this will affect
interpretation of encoded text.

Third, there is already widely used mechanism for conveying encoding
information inside HTML body using <meta charset=..."> (in HTML5) and
<meta http-equiv="content-type" content="text/html;charset=..."> (in
"legacy" HTML). Sure this can collide with HTTP headers, but this
problem is well known and web-masters are somehow trained to cope with it.


  Jirka Kosek      e-mail: jirka@kosek.cz      http://xmlguru.cz
       Professional XML consulting and training services
  DocBook customization, custom XSLT/XSL-FO document processing
 OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member

Received on Friday, 12 June 2009 14:20:09 UTC

This archive was generated by hypermail 2.4.0 : Saturday, 9 October 2021 18:44:49 UTC