Re: Public beta test of the W3C Markup Validator

On Tuesday, October 22, 2002, 11:37:30 PM, Karl wrote:


KD>      * Character Encoding issues are more fully handled and the majority of
KD>        the Character Encoding related code has been rewritten. This should
KD>        mean better and more robust handling of Character Encodings --
KD>        particularly for non-US/European Encodings -- but is also stricter
KD>        with sloppy encoding declarations and malformed encoded documents
KD>        (Windows-1252 users take note!).

A quick test (of a well-formed but non-valid UTF-8 encoded SVG
document) revealed:

Note: The HTTP Content-Type field did not contain a "charset"
attribute, but the Content-Type was one of the XML text/* sub-types.
The relevant specification (RFC 3023) specifies a strong default of
"us-ascii" for such documents so we will use this value regardless of
any encoding you may have indicated elsewhere. If you would like to
use a different encoding, you should arrange to have your server send
this new encoding information.

Firstly, that is neither desirable, nor an improvement.

Plus, its arguably not true (the file was sent from local disk using
file upload, so its a mystery where the 'HTTP Content-type' field came
from or how it figured out that a 'text/*' type had been sent.




-- 
 Chris                            mailto:chris@w3.org

Received on Tuesday, 22 October 2002 18:01:59 UTC