- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Sun, 16 Sep 2007 21:58:09 +0200
- To: www-validator@w3.org
Olivier Thereaux wrote: >> Apparently (= reported by the validator) my browser claims to send >> Content-Type: text/xml without charset. Therefore the validator >> expects US-ASCII ignoring the first input line: > It doesn't just expect us-ascii. It *has* to process as us-ascii, > per http://tools.ietf.org/html/rfc3023#section-8.5 Yes, I'm aware of RFC 3023, and instead of "expects" I should have written "MUST assume" US ASCII. But after that step we get to the business at hand, I'm the author of a tool creating an XML document and wish to validate it. I'm not the author of the OS and the browser used to upload this document to the validator (for the upload interface), and I'm unfortunately not the admin of the Web server where the validator finds this document (in the case of the URL interface). For obvious reasons browsers and Web servers of 3rd parties might be sloppy and assume that anyfile.xml is text/xml without bothering to figure out the correct charset. That's sad or something, but it's not what I'm really interested in, I want to see what's wrong (if anything) _within_ my document. So if the validator would tell me "BTW, your upload tool failed to announce the correct charset" it would be okay. But what it really does is to refuse to start to work at all, it even doesn't show me the source with the offending octet when I explicitly want this :-( > see also: http://annevankesteren.nl/2005/03/text-xml Nice... :-) Wrt the validator you've two kinds of users, those who implement tools like Firefox or adminster Web servers, and another group writing documents or implementing tools to create documents. The second group should be much larger, and IMO the validator should help them as good as possible without giving up on being strict. The validator reported the _last_ offending octet in line 35591, obviously it didn't run into serious processing problems in this case. It could finish its processing in an orderly manner, e.g. show the source when I want this, and any errors it finds, instead of throwing a fatal error and giving up. Just "giving up" is an option when there are too many errors like say somebody uploading a binary, or tons of NULs in UTF-16 interpreted as UTF-8 or ASCII. But for the common cases "ASCII turns out to be UTF-8", "Latin-1 turns out to be windows-1252", or similar, it shouldn't take the fast "fatal error" exit. Frank
Received on Sunday, 16 September 2007 19:59:44 UTC