Re: charset parameter

On 26.07.01 at 09:08, Nick Kew <nick@webthing.com> wrote:

>Surely that at least is clear: [HTTP] takes precedence over [META]?

Nope. HTTP 1.1 doesn't mention META, and HTML just sez it's supposed to be
read by _servers_ to initialize the HTTP header... :-(


>>Or what this means for the case when the charset in the HTTP header is
>>there by inference (as a default, not explicitly)...
>
>But *ML rules don't apply to HTTP, so whence the conclusion that
>*anything* is implicit (as opposed to absent) in the headers?

The lack of a "charset" parameter on the HTTP 1.1 "Content-Type" header
field means that you should assume it is there with a value of "ISO-889-1"
according to the HTTP 1.1 RFC. HTML doesn't specify a default (it actually
discourages it). But if HTTP overrides META, and the HTTP charset is only
there by default, does HTTP's default still override an explicitly inserted
META?

That is, if the META sez EUC-JP and HTTP implicitly defines ISO-8859-1 (by
being absent), does that really mean that we should use ISO-8859-1 (which
the user obviously does _not_ want) over EUC-JP (which s/he _does_ want)?


>>"I'm sorry, but that Document Type is not in my Catalog. I cannot
>>Validate this document"
>
>We are happy with SYSTEM FPIs.  It's the No FPI case (or FPIs which
>are not accessible to the validator) you need to complain about.

A DOCTYPE Declaration referencing an External Subset by Formal Public
Identifier not in out Catalog, and without a System Identifier, should
generate the error message above. An Internal Subset, an External Subset
referenced by a FPI that is in our Catalog, or an External Subset giving a
resolvable System Identifier, should all generate normal validation
results.


>>and "I'm sorry, but that Character Encoding is not in my
>>database. I cannot Validate this document."
>
>Would it not be fair to say US-ASCII is a subset of every other encoding
>that might be considered as a sefault (certainly iso-8859-1 and utf-8)?
>so that a document that validates to it should always be fine?

This is again very much Western thinking. US-ASCII is a subset only of
common Western encodings. This means the answer to your question depends on
whether you accept the validity of these "defaulted" charset parameters.

I must admit to being both uncertain and ambivalent on this issue.


>>or "I'm sorry, but I was unable to determine the Character Encoding based
>>on available information. Please make your Character Encoding explicit in
>>the HTTP headers".
>
>Except if HTTP happens to be FTP or file upload, and there is no header...

Or a fragment pasted into the form (not finished yet)... Or...

It must be dealt with, but these are sufficiantly fringe cases that we can
add exceptions for those. I think... :-)

What does Site Valet do?

Received on Thursday, 26 July 2001 05:48:10 UTC