W3C home > Mailing lists > Public > public-qa-dev@w3.org > June 2010

Re: Proposal: Don't to XML wf-ness checks on text/html pages [was: Validator timeout and XML-LibXML bug]

From: Michael(tm) Smith <mike@w3.org>
Date: Sun, 20 Jun 2010 13:05:25 +0900
To: Ville Skytt¸«£ <ville.skytta@iki.fi>
Cc: public-qa-dev@w3.org, ted@w3.org, Dominique Hazael-Massieux <dom@w3.org>, jean-gui@w3.org, tgambet@w3.org
Message-ID: <20100620040519.GB6370@sideshowbarker>
Ville Skytt¸«£ <ville.skytta@iki.fi>, 2010-06-19 00:06 +0300:

> On Friday 18 June 2010, Michael(tm) Smith wrote:
> > About the idea of disabling XML wellformedness checks, I want to
> > raise something for discussion here that I've already also brought
> > up off-list, which is: I don't think we should do XML
> > wellformedness checking on pages that are served as text/html.
> If we don't do that (for non-XML docs or at all) and leave it to 
> SGML::Parser::OpenSP, validator will be bitten by OpenSP's XML limitations.  I 
> gather this is pretty much the reason the "extra" XML wellformedness check 
> exists in the first place; it was added in April 2007, in validator 0.8.0 beta 
> 1.  More info: http://openjade.sourceforge.net/doc/xml.htm

Looking at that one-by-one:

- "XML constrains processing instructions with a target matching
  [Xx][Mm][Ll], both in terms of where they can occur and their

  That one, to me, does not seem important enough to justify
  adding an additional dependency (on XML::LibXML or whatever)

- "XML does not allow a parameter separator that is adjacent to a
  delimiter to be omitted."

  I don't know what that means. I see it's mentioned also in
  http://www.w3.org/TR/NOTE-sgml-xml-971215 but I still don't
  know what it means.

- "XML has constraints on the use of & in parameter literals. In
  SGML terms, XML says that the ero delimiter is recognized in a
  parameter literal, and that it must be followed by an entity
  reference, but the entity reference is not expanded."

- "Line ends are normalized using SGML conventions to a CR/LF
  character pair rather than using the XML convention of a single
  LF character."

  I think that does not make any difference as far as validation.

> One example of this is that XHTML documents (no matter with what content type 
> they are served with) containing something like:
>     <p id="foo"class="bar">

That's an error in the non-XML HTML language, as well -- even in
HTML4. The HTML4 spec says:

  Any number of (legal) attribute value pairs, separated by
  spaces, may appear in an element's start tag.

I realize that's not how normative conformance requirements are
typically stated these days, but I suspect there are other
nominal HTML4 requirements which the validator is already
enforcing that are stated in the HTML4 spec itself less clearly
than that.

> (missing space between "foo" and class) will start to go unnoticed and 
> declared valid by the validator,

Couldn't we patch our copy of OpenSP to always report a lack of
spec between attributes as an error? Isn't that something that
could be caught and reported in the lexer/tokenizer (or
whatever else it might be called in SGML terms) part of the OpenSP
code? (Rather than introducing another dependency on XML::LibXML
or whatever.)

> of course assuming there are no other errors the validator does
> catch.  I think this would be such a serious problem that it
> should be considered only as a last resort, and if done, the
> note about XML support limitations that was there in validator <
> 0.8.0 should be brought back.

Yeah, I think everybody would agree that not being able to report
XML conformance violations in documents served with an XML MIME
type would not be acceptable.


Michael(tm) Smith
Received on Sunday, 20 June 2010 04:05:31 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:36:28 UTC