W3C home > Mailing lists > Public > public-qa-dev@w3.org > June 2010

Re: Proposal: Don't to XML wf-ness checks on text/html pages [was: Validator timeout and XML-LibXML bug]

From: Ian Jacobs <ij@w3.org>
Date: Thu, 17 Jun 2010 22:54:06 -0500
To: "Michael(tm) Smith" <mike@w3.org>
Message-Id: <F228F981-2F68-4EC2-AB89-18D4181DA240@w3.org>
Cc: Ville U2t5dHSPq6M= <ville.skytta@iki.fi>, ted@w3.org, public-qa-dev@w3.org, Dominique Hazael-Massieux <dom@w3.org>, jean-gui@w3.org, tgambet@w3.org

On 17 Jun 2010, at 9:58 PM, Michael(tm) Smith wrote:

> Ville Skyttлг <ville.skytta@iki.fi>, 2010-06-14 09:34 +0300:
>
>> Thanks, but please note that this fix is not a silver bullet: it  
>> just works
>> around one (common, I hope) instance of the problem; XML::LibXML's  
>> slowness
>> when it has lots of errors to report hasn't gone anywhere.  And I  
>> think the
>> only things that could be done about that is get XML::LibXML fixed,  
>> disable
>> XML wellformedness checks in the validator, or switch to another  
>> XML parser.
>
> About the idea of disabling XML wellformedness checks, I want to
> raise something for discussion here that I've already also brought
> up off-list, which is: I don't think we should do XML
> wellformedness checking on pages that are served as text/html. And
> since it sounds like disabling XML wellformedness checks for
> text/html might win us a significant reduction on load on the
> validator, I think it should be something to seriously consider.


How about making the well-formedness check optional? (e.g., off by  
default)

  _ Ian

>
> Rationale:
>
> I think it could be reasonably argued that it's not helpful to be
> running an XML well-formedness check on a page that's served as
> text/html. I do realize there are others who may argue that
> because it has an XHTML doctype and an XHTML namespace
> declaration, we can assume that its author somehow intends for it
> to be considered as XML an instance. But I think we should not be
> having the validator making that assumption.
>
> So if a site serves a page as text/html:
>
>  - the validator should, by default, evaluate it text/html -- not
>    as XML -- and should therefore not do any XML well-formedness
>    checking on it
>
>  - if we provide a means for doing XML wf-ness checking on
>    text/html pages at all, it should be an option that the user
>    needs to manually select; it should not be the default
>
> Browsers and other conformant UAs do not parse text/html pages
> using XML parsers, so any XML wf-ness errors in them are not
> relevant to the actual processing/rendering of the pages
>
> So we are arguably wasting users' time -- and wasting our limited
> system resources -- running what seems to be a very expensive
> check that we should arguably not be doing to begin with.
>
>  --Mike
>
> -- 
> Michael(tm) Smith
> http://people.w3.org/mike
>
>

--
Ian Jacobs (ij@w3.org)    http://www.w3.org/People/Jacobs/
Tel:                                      +1 718 260 9447
Received on Friday, 18 June 2010 03:54:11 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 19 August 2010 18:12:51 GMT