Proposal: Don't to XML wf-ness checks on text/html pages [was: Validator timeout and XML-LibXML bug]

Ville Skytt¸«£ <ville.skytta@iki.fi>, 2010-06-14 09:34 +0300:

> Thanks, but please note that this fix is not a silver bullet: it just works 
> around one (common, I hope) instance of the problem; XML::LibXML's slowness 
> when it has lots of errors to report hasn't gone anywhere.  And I think the 
> only things that could be done about that is get XML::LibXML fixed, disable 
> XML wellformedness checks in the validator, or switch to another XML parser.

About the idea of disabling XML wellformedness checks, I want to
raise something for discussion here that I've already also brought
up off-list, which is: I don't think we should do XML
wellformedness checking on pages that are served as text/html. And
since it sounds like disabling XML wellformedness checks for
text/html might win us a significant reduction on load on the
validator, I think it should be something to seriously consider.

Rationale:

I think it could be reasonably argued that it's not helpful to be
running an XML well-formedness check on a page that's served as
text/html. I do realize there are others who may argue that
because it has an XHTML doctype and an XHTML namespace
declaration, we can assume that its author somehow intends for it
to be considered as XML an instance. But I think we should not be
having the validator making that assumption.

So if a site serves a page as text/html:

  - the validator should, by default, evaluate it text/html -- not
    as XML -- and should therefore not do any XML well-formedness
    checking on it
              
  - if we provide a means for doing XML wf-ness checking on
    text/html pages at all, it should be an option that the user
    needs to manually select; it should not be the default

Browsers and other conformant UAs do not parse text/html pages
using XML parsers, so any XML wf-ness errors in them are not
relevant to the actual processing/rendering of the pages

So we are arguably wasting users' time -- and wasting our limited
system resources -- running what seems to be a very expensive
check that we should arguably not be doing to begin with.

  --Mike

-- 
Michael(tm) Smith
http://people.w3.org/mike

Received on Friday, 18 June 2010 02:58:40 UTC