W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2001

Re: Cleaning HTML Sources With Tidy

From: Dave Raggett <dsr@w3.org>
Date: Mon, 8 Oct 2001 09:22:08 +0100 (GMT Daylight Time)
To: BARRY MEEHAN <bmeehan@us.ibm.com>
cc: html-tidy@w3.org, gerald@w3.org
Message-ID: <Pine.WNT.4.10.10110080911420.-858583@hazel>
The maintenance of HTML Tidy has moved to
http://tidy.sourceforge.net and you are advised to download the
latest version which fixes a number of bugs. You are invited to
submit change requests and details of problems you have found.

Note that Tidy doesn't produce an output file when it comes
across severe errors requiring human attention. Examples of
this are missing trailing quotes in attribute values, and
missing closing ">" for tags.

Barry may well find what he wants if he looks at the configuration
options. Tidy works by applying rules to build a clean parse tree
which it then pretty prints provided there are no severe errors.

On Tue, 2 Oct 2001, BARRY MEEHAN wrote:

> IBM Product Lifecycle Management has a business partner that
> develops a software product that IBM markets and supports.  As
> part of the development effort, the business partner creates the
> end-user product information in HTML.  When we send this source
> to our IBM translation centers, the tool that checks the HTML
> files for compliance with our HTML guidelines (which are based
> on W3's) always finds errors.  The business partner creates the
> files with MS FrontPage.  We have a deviation that allows them
> to use the FONT attribute for "human factors" reasons, even
> though it's a problem for Japanese translation.
> 
> So, I asked the business partner to give Tidy a spin becuase it
> looks like it would catch and correct the majority of the
> problems our checking tool finds.  Attached is their findings.  
> It's possible they missed something in the instructions or
> release notes because it does seem to have missed things it
> should have fixed.  I am interested in your reaction, in
> particular, which of the errors can it not fix?
> 
> Thanks,
> 
> Barry Meehan
> Internet:    bmeehan@us.ibm.com
> _______________________
> 
> Barry,
> 
> I downloaded and tested the Tidy tool on five HTML files. The results are
> not impressive: very few errors are fixed and even one file was totally
> empty after cleaning.
> 
> The test consisted of opening the file and run the HTML Tidy "Clean,
> correct, convert and format" function. I did not try to customize the
> cleaning. I did not find yet how to remove the comments.
> 
> I attached hereafter the data (check tool output) before and after
> cleaning.
> 
> 
> 
> BEFORE CLEANING:
> Checker Output:
> (See attached file: BEFORE.SUM)
> 
> 
> AFTER CLEANING:
> Checker Output:
> (See attached file: AFTER.TXT)
> 
> 
>       (See attached file: Before.txt)            (See attached file:
> After.txt)

Regards,

-- Dave Raggett <dsr@w3.org> or <dave.raggett@openwave.com>
W3C Visiting Fellow, see http://www.w3.org/People/Raggett 
tel/fax: +44 122 586-6240 (or 7351) +44 771 213 7629 (mobile)
Received on Monday, 8 October 2001 04:22:14 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:46 GMT