Re: Validation & errors from Nick Kew on 2004-04-25 (www-validator@w3.org from April 2004)

From: Nick Kew <nick@webthing.com>
Date: Sun, 25 Apr 2004 08:18:13 +0100 (BST)
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Cc: www-validator@w3.org
Message-ID: <Pine.LNX.4.53.0404250807180.2038@hugin.webthing.com>

On Sun, 25 Apr 2004, Bjoern Hoehrmann wrote:

>
> * Nick Kew wrote:
> >> You don't need to be, you could use an output filter for your webserver
> >> that passes invalid documents to HTML Tidy first and thus correct most
> >> error automatically...
> >
> >Indeed you could.  Except that Tidy has neither a SAX (or comparable
> >linear) parse mode nor a parseChunk API, and would therefore be
> >seriously inefficient in this context.  libxml2 does the job a whole
> >lot more efficiently.
>
> Well, you can't really compare these two tools;

Of course you can!

>	 AFAIK, libxml2 helps to
> parse tag soup to some extend, but lacks most of Tidy's functionality
> which requires a DOM and could thus not be realized with a streaming
> API.

It doesn't have all of Tidy's capabilities.  But it has ample for the
task in question.  In fact for the purpose of ensuring valid output,
the fact that it now supports DTDs and validation makes it the better
tool.

>  And performance does not matter much as you could cache Tidy's
> output,

Automatically?  That's adding complexity.  Or by preprocessing?
Yes, I'll grant you the second, but the question seems to have changed.

>	 unless you have truely dynamic content that changes for each
> request (personalized services, for example).

The greatest virtue of an output filter is that it's reusable in a wide
range of situations.  And it's most useful in cases where preprocessing
is not an option, such as dynamic or proxied content.

-- 
Nick Kew

Nick's manifesto: http://www.htmlhelp.com/~nick/

Received on Sunday, 25 April 2004 03:18:59 UTC