Re: autodetecting character encoding from Nick Kew on 2003-02-08 (www-validator@w3.org from February 2003)

From: Nick Kew <nick@webthing.com>
Date: Sat, 8 Feb 2003 11:29:15 +0000 (GMT)
To: Darin McGrew <mcgrew@stanfordalumni.org>
cc: www-validator@w3.org
Message-ID: <Pine.LNX.4.21.0302081123030.1173-100000@jarl.webthing.com>

On Fri, 7 Feb 2003, Darin McGrew wrote:

Maybe I'm just being luddite here, but ...

> Appendix F of the XML 1.0 Recommendation specifies ways to autodetect the
> character encoding of XML documents, and this works fine for documents that
> start with the four bytes 3C 3F 78 6D ("<?xm"). Maybe we need a similar
> mechanism for valid HTML documents, documents that start with the four
> bytes 3C 21 44 4F ("<!DO").

But there's no requirement on HTML documents to start with those four
bytes: they can be preceded by whitespace or an SGML comment.  Neither
does HTML have a BOM to deal with multibyte character encodings, which
I think is the key feature in XML that enables autodetection.

> key ring /'kE 'ri[ng]/ n. device enabling simultaneous loss of multiple keys

Why do you need a device for such a simple and routine task?

-- 
Nick Kew

Received on Saturday, 8 February 2003 06:29:18 UTC