- From: Martin Duerst <duerst@w3.org>
- Date: Thu, 26 Jul 2001 11:22:18 +0900
- To: Lloyd Wood <L.Wood@eim.surrey.ac.uk>, Terje Bless <link@pobox.com>
- Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, www-validator@w3.org
At 14:03 01/07/25 +0100, Lloyd Wood wrote: >On Wed, 25 Jul 2001, Terje Bless wrote: > > > The issue is that the transport protocol sez that an absense of an explicit > > charset parameter on the Content-Type means "ISO-8859-1"; HTML or XML rules > > don't apply here. When it comes time to parse the markup, you already have > > a charset; the XML/HTML rules do not govern HTTP. > >well, that's handy. But as I wrote, it's not correct. >I've always wondered how you define the charset for the line that >defines the charset so that you can interpret it. The HTTP headers are defined to be in ASCII. For the 'in-document' information, either you assume ASCII (for HTML) or there are more complicated heuristics (see XML app. F). The validator currently assumes ASCII (or anything compatible with it). > > In practice you have to decide between "Assume ISO-8859-1 as that's what > > /people/ tend to assume" or "Assume nothing as people will get it wrong > > some part of the time". > >I don't see how you can ever assume nothing. Well, for the validator, 'assume nothing' just means 'document doesn't validate'. That's quite easy :-). Regards, Martin.
Received on Wednesday, 25 July 2001 22:23:27 UTC