- From: Michael(tm) Smith <mike@w3.org>
- Date: Sun, 20 Jun 2010 13:05:25 +0900
- To: Ville Skytt¸«£ <ville.skytta@iki.fi>
- Cc: public-qa-dev@w3.org, ted@w3.org, Dominique Hazael-Massieux <dom@w3.org>, jean-gui@w3.org, tgambet@w3.org
Ville Skytt¸«£ <ville.skytta@iki.fi>, 2010-06-19 00:06 +0300: > On Friday 18 June 2010, Michael(tm) Smith wrote: > > > About the idea of disabling XML wellformedness checks, I want to > > raise something for discussion here that I've already also brought > > up off-list, which is: I don't think we should do XML > > wellformedness checking on pages that are served as text/html. > > If we don't do that (for non-XML docs or at all) and leave it to > SGML::Parser::OpenSP, validator will be bitten by OpenSP's XML limitations. I > gather this is pretty much the reason the "extra" XML wellformedness check > exists in the first place; it was added in April 2007, in validator 0.8.0 beta > 1. More info: http://openjade.sourceforge.net/doc/xml.htm Looking at that one-by-one: - "XML constrains processing instructions with a target matching [Xx][Mm][Ll], both in terms of where they can occur and their content." That one, to me, does not seem important enough to justify adding an additional dependency (on XML::LibXML or whatever) - "XML does not allow a parameter separator that is adjacent to a delimiter to be omitted." I don't know what that means. I see it's mentioned also in http://www.w3.org/TR/NOTE-sgml-xml-971215 but I still don't know what it means. - "XML has constraints on the use of & in parameter literals. In SGML terms, XML says that the ero delimiter is recognized in a parameter literal, and that it must be followed by an entity reference, but the entity reference is not expanded." - "Line ends are normalized using SGML conventions to a CR/LF character pair rather than using the XML convention of a single LF character." I think that does not make any difference as far as validation. > One example of this is that XHTML documents (no matter with what content type > they are served with) containing something like: > > <p id="foo"class="bar"> That's an error in the non-XML HTML language, as well -- even in HTML4. The HTML4 spec says: http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2 Any number of (legal) attribute value pairs, separated by spaces, may appear in an element's start tag. I realize that's not how normative conformance requirements are typically stated these days, but I suspect there are other nominal HTML4 requirements which the validator is already enforcing that are stated in the HTML4 spec itself less clearly than that. > (missing space between "foo" and class) will start to go unnoticed and > declared valid by the validator, Couldn't we patch our copy of OpenSP to always report a lack of spec between attributes as an error? Isn't that something that could be caught and reported in the lexer/tokenizer (or whatever else it might be called in SGML terms) part of the OpenSP code? (Rather than introducing another dependency on XML::LibXML or whatever.) > of course assuming there are no other errors the validator does > catch. I think this would be such a serious problem that it > should be considered only as a last resort, and if done, the > note about XML support limitations that was there in validator < > 0.8.0 should be brought back. Yeah, I think everybody would agree that not being able to report XML conformance violations in documents served with an XML MIME type would not be acceptable. --Mike -- Michael(tm) Smith http://people.w3.org/mike
Received on Sunday, 20 June 2010 04:05:31 UTC