- From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
- Date: Fri, 24 Jun 2005 21:08:09 +1000
- To: Rodrigo Witzel <rodrigo.witzel@gmx.de>
- CC: "www-validator@w3.org" <www-validator@w3.org>
Rodrigo Witzel wrote: > ... the Content-Type was one of > the XML text/* sub-types (text/xml). The relevant specification (RFC > 3023) specifies a strong default of "us-ascii" for such documents so we > will use this value regardless of any encoding you may have indicated > elsewhere. ..." > > As a matter of fact, your website tests BOTH the markup and the > behaviour of my web server. Or even worse, it refuses to test my markup > if my server fails the test. If my XML is valid, the test should be > passed even though my server doesn't fulfil any other requirements. How can the validator possibly validate your document if it does not know which character encoding to use to read the file? If it's not correctly specified, it must default to something, which may result in errors being reported that would not be present had the validator known the correct encoding. Say, for example, your document was encoded as UTF-8 and contained characters outside of the US-ASCII subset; yet because your server declared the content-type as text/xml but did not indicate the encoding with a charset parameter, the validator *must* follow the rules specified in RFC 3023 and parse the file as though it were encoded in US-ASCII. However, because your document contained characters outside of the US-ASCII subset, the validator would issue a well-formedness error and your document would not validate, even though it would validate if it were parsed as UTF-8. The moral of the story is to either specify the encoding with a charset parameter, if you are going to continue using text/xml; but note that for this reason, it is not recommended that you use text/* media types for XML documents. The alternative is to use application/xml, application/xhtml+xml or other appropriate application/*+xml media type. The validator will then obey the encoding declared in the XML declaration, if present, or default to UTF-8 or UTF-16, as decribed in the XML Recommendation based the presence (or absense) of the Byte Order Mark. -- Lachlan Hunt http://lachy.id.au/
Received on Friday, 24 June 2005 11:08:24 UTC