W3C home > Mailing lists > Public > www-validator@w3.org > February 2003

Re: autodetecting character encoding

From: Nick Kew <nick@webthing.com>
Date: Sat, 8 Feb 2003 11:29:15 +0000 (GMT)
To: Darin McGrew <mcgrew@stanfordalumni.org>
cc: www-validator@w3.org
Message-ID: <Pine.LNX.4.21.0302081123030.1173-100000@jarl.webthing.com>

On Fri, 7 Feb 2003, Darin McGrew wrote:

Maybe I'm just being luddite here, but ...

> Appendix F of the XML 1.0 Recommendation specifies ways to autodetect the
> character encoding of XML documents, and this works fine for documents that
> start with the four bytes 3C 3F 78 6D ("<?xm"). Maybe we need a similar
> mechanism for valid HTML documents, documents that start with the four
> bytes 3C 21 44 4F ("<!DO").

But there's no requirement on HTML documents to start with those four
bytes: they can be preceded by whitespace or an SGML comment.  Neither
does HTML have a BOM to deal with multibyte character encodings, which
I think is the key feature in XML that enables autodetection.

> key ring /'kE 'ri[ng]/ n. device enabling simultaneous loss of multiple keys

Why do you need a device for such a simple and routine task?

-- 
Nick Kew
Received on Saturday, 8 February 2003 06:29:18 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:05 GMT