- From: Terje Bless <link@pobox.com>
- Date: Sat, 29 Jun 2002 09:47:43 +0200
- To: William F Hammond <hammond@csc.albany.edu>
- cc: www-validator@w3.org
William F Hammond <hammond@csc.albany.edu> wrote: >>Could you please send this to www-validator so it doesn't get lost? Yes, please do send these issues to www-validator@w3.org; precisely so they do not get lost or forgotten. :-) >Shouldn't the W3C validator attempt to parse any content submitted as >text/html (RFC 2854), text/xml (RFC 3023), application/xml (RFC 3023), >or application/xhtml+xml (RFC 3206)? The application/xhtml+xml media type was not defined when the last update of the public version of the Validator was released (some would argue that it still isn't in any meaningful way![0]). The current development version -- which can usually be found on http://validator.w3.org:8001/ -- should support application/xhtml+xml (but I haven't checked in a while so it may be broken). The other content types should be supported in the public version of the Validator. Please let me know if any are missing! >Isn't it assumed for text/html transfer that any necessary non-default >encoding information is to be derived from a "charset" spec in the >Content-Type transfer header? The W3C and the IETF have incompatible specifications of HTTP defaulting rules in the text/* media type hierarchy. The IETF specifies "ISO-8859-1" as the default and the W3C recommends applying no defaulting rules. At the same time, the W3C seems fond of the "Chicken _And_ Egg" practice of specifying the encoding information _inside_ the encoded entity (cf. HTML's "meta" element, and XML's "encoding" attribute), and providing defaulting rules in the absence of explicit encoding indication. The only sane course of action is to apply _no_ defaulting rules and refuse to attempt validation of any resource whose character encoding is not explicitly labelled. This unfortunately means that our heuristics for identifying the "Chicken&Egg Encoding" are less then ideal and will fail when presented with XML defaulting rules. This code is up for review in the not too distant future -- recent developments have (potentially) badly broken all of Martin Dürst's hard work in that area -- and it is possible that this situation can be improved then. Most likely this will be achieved by making anything served as text/html be a special case, with "Tagsoup Semantics", and then make XML syntax and semantics the "primary" method for dealing with character encoding determination. Unfortunately, the deployment of resources with the application/xhtml+xml media type is likely to be pretty much equal to "nil" until certain popular UAs decide to treat it in a reasonable manner. -- Everytime I write a rhyme these people thinks its a crime I tell `em what's on my mind. I guess I'm a CRIMINAL! I don't gotta say a word I just flip `em the bird and keep goin, I don't take shit from no one. I'm a CRIMINAL!
Received on Saturday, 29 June 2002 03:49:35 UTC