Re: autodetecting character encoding

On Fri, 7 Feb 2003, Darin McGrew wrote:

Maybe I'm just being luddite here, but ...

> Appendix F of the XML 1.0 Recommendation specifies ways to autodetect the
> character encoding of XML documents, and this works fine for documents that
> start with the four bytes 3C 3F 78 6D ("<?xm"). Maybe we need a similar
> mechanism for valid HTML documents, documents that start with the four
> bytes 3C 21 44 4F ("<!DO").

But there's no requirement on HTML documents to start with those four
bytes: they can be preceded by whitespace or an SGML comment.  Neither
does HTML have a BOM to deal with multibyte character encodings, which
I think is the key feature in XML that enables autodetection.

> key ring /'kE 'ri[ng]/ n. device enabling simultaneous loss of multiple keys

Why do you need a device for such a simple and routine task?

-- 
Nick Kew

Received on Saturday, 8 February 2003 06:29:18 UTC