Re: charset=us-ascii mandatory? from Jukka K. Korpela on 2007-05-08 (www-validator@w3.org from May 2007)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Tue, 8 May 2007 22:17:25 +0300 (EEST)
To: www-validator@w3.org
Message-ID: <Pine.GSO.4.64.0705082204180.1264@hopeatilhi.cs.tut.fi>

On Tue, 8 May 2007, Andreas Prilop wrote:

> I specifically mean (HTML 4) documents with only US-ASCII characters.

They, too, should have their encoding declared. I can't really say "must" 
instead of "should", since the specifications are vague, but this also 
means that the meaning of a sequence of octets is formally left 
unspecified if it is purported to be HTML 4 but does not have its encoding 
declared (in an HTTP header or in a meta tag or, nominally, in a charset 
parameter of a referring link). In principle, its validity is undecidable 
since we don't even know how to interpret the octets.

In practice, of course, browsers will do what you want and infer US-ASCII 
or some 8-bit encoding that contains US-ASCII as its subset. In principle, 
they could do otherwise; maybe even some browser running in an EBCDIC 
environment does that - for local documents.

> If I'm not mistaken, it is still correct to send e-mail in US-ASCII
> without any MIME header and charset declaration.

Yes, because that's specified in the e-mail protocol.

> How is that with HTML 4?

Not specified. If you are thinking about a non-MIME e-mail message 
containing an HTML document, then I'm afraid we must formally treat the 
content as plain text, since that's what e-mail messages are by default.

Is there some practical problem behind your question?

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Tuesday, 8 May 2007 19:17:35 UTC