W3C home > Mailing lists > Public > www-validator@w3.org > July 2001

Re: charset parameter

From: Nick Kew <nick@webthing.com>
Date: Thu, 26 Jul 2001 09:08:23 +0100 (BST)
To: Terje Bless <link@pobox.com>
cc: W3C Validator <www-validator@w3.org>
Message-ID: <Pine.BSF.4.21.0107260848100.1599-100000@fenris.webthing.com>
On Thu, 26 Jul 2001, Terje Bless wrote:

> On 25.07.01 at 14:03, Lloyd Wood <l.wood@eim.surrey.ac.uk> wrote:
> 
> >I've always wondered how you define the charset for the line that defines
> >the charset so that you can interpret it.
> 
> For the HTTP header fields it's fairly simple; they're US-ASCII period. For
> that bogosity called "META" the waters are substansially more muddy.
> Especially since there aren't any clear rules for whether the charset in
> the META element overrides the one in the HTTP header... Or vice versa...

Surely that at least is clear: HTTP headers take precedence over
<META bogus="hot-air">?  Can't cite references OTTOMH (and not time
to go looking just now), but ... 

> Or what this means for the case when the charset in the HTTP header is
> there by inference (as a default, not explicitly)...

But *ML rules don't apply to HTTP, so whence the conclusion that
*anything* is implicit (as opposed to absent) in the headers?
Sure, if we take the whole thing (*TP transmission + *ML document),
then we can start to talk about undeclared charsets being implicit.

> "I'm sorry, but that Document Type is not in my Catalog. I cannot Validate
> this document"

We are happy with SYSTEM FPIs.  It's the No FPI case (or FPIs which
are not accessible to the validator) you need to complain about.

>	 and "I'm sorry, but that Character Encoding is not in my
> database. I cannot Validate this document."

Hmmm ..

Would it not be fair to say US-ASCII is a subset of every other encoding
that might be considered as a sefault (certainly iso-8859-1 and utf-8)?
so that a document that validates to it should always be fine?

>	 or "I'm sorry, but I was unable
> to determine the Character Encoding based on available information. Please
> make your Character Encoding explicit in the HTTP headers".

Except if HTTP happens to be FTP or file upload, and there is no header...

> To "assume nothing" in this context means that if we cannot get a clear,
> unambigius, indication, we abort instead of guessing or, in this case,
> instead of interpreting the internally inconsistent specifications (that's
> the HTML-WG's job ;D).

Have you never had to do someone elses job because they made too
much of a hash of it?  Not that I'm saying that's relevant here,
but in general.

-- 
Nick Kew
Received on Thursday, 26 July 2001 04:08:37 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:13:59 GMT