Re: Weird thing from Masayasu Ishikawa on 2003-05-07 (www-validator@w3.org from May 2003)

From: Masayasu Ishikawa <mimasa@w3.org>
Date: Wed, 07 May 2003 21:55:16 +0900 (JST)
To: albie@alfarrabio.di.uminho.pt
Cc: www-validator@w3.org
Message-Id: <20030507.215516.112607062.mimasa@w3.org>

albie@alfarrabio.di.uminho.pt wrote:

>  I'm validating http://alfarrabio.di.uminho.pt/~albie which was valid
> xhtml, but now, I've tested again and the validator says there are
> weird chars on the file. The problem is that I think the file is correct,
> and because my first line shows the correct encoding.

Even though it looks weird, the Validator's behavior is correct
according to RFC 3023.  The problem is that you served your resource
as 'text/xml' with no charset parameter.  "3.4. 'text/xml'" of
the XHTML Media Types Note warned as follows:

    Authors should also be aware of the difference between
    'application/xml' (and for that matter 'application/xhtml+xml' as
    well) and 'text/xml' with regard to the treatment of character
    encoding. According to "3.1 Text/xml Registration" of [RFC3023],
    "if a text/xml entity is received with the charset parameter omitted,
    MIME processors and XML processors MUST use the default charset value
    of "us-ascii"[ASCII]". This default value is authoritative over
    the encoding information specified in the XML declaration, or the XML
    default encodings of UTF-8 and UTF-16 when no encoding declaration is
    supplied, so omitting the charset parameter of a 'text/xml' entity
    might cause an unexpected result. As mentioned in [RFC3023], the
    use of the charset parameter is STRONGLY RECOMMENDED.

  cf. http://www.w3.org/TR/xhtml-media-types/#text-xml

Adding an explicit charset parameter, or changing the media type to
'application/xhtml+xml' or 'application/xml' would solve the problem.

Regards,
-- 
Masayasu Ishikawa / mimasa@w3.org
W3C - World Wide Web Consortium

Received on Wednesday, 7 May 2003 08:55:18 UTC