W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2001

Cleaning HTML Sources With Tidy

From: BARRY MEEHAN <bmeehan@us.ibm.com>
Date: Tue, 2 Oct 2001 11:03:13 -0400 (EDT)
To: html-tidy@w3.org
Message-ID: <OF20E8AC90.399AA7E9-ON85256AD8.0077B0C1@pok.ibm.com>
IBM Product Lifecycle Management has a business partner that develops a
software product that IBM markets and supports.  As part of the development
effort, the business partner creates the end-user product information in
HTML.  When we send this source to our IBM translation centers, the tool
that checks the HTML files for compliance with our HTML guidelines (which
are based on W3's) always finds errors.  The business partner creates the
files with MS FrontPage.  We have a deviation that allows them to use the
FONT attribute for "human factors" reasons, even though it's a problem for
Japanese translation.

So, I asked the business partner to give Tidy a spin becuase it looks like
it would catch and correct the majority of the problems our checking tool
finds.  Attached is their findings.  It's possible they missed something in
the instructions or release notes because it does seem to have missed
things it should have fixed.  I am interested in your reaction, in
particular, which of the errors can it not fix?


Barry Meehan
Internet:    bmeehan@us.ibm.com


I downloaded and tested the Tidy tool on five HTML files. The results are
not impressive: very few errors are fixed and even one file was totally
empty after cleaning.

The test consisted of opening the file and run the HTML Tidy "Clean,
correct, convert and format" function. I did not try to customize the
cleaning. I did not find yet how to remove the comments.

I attached hereafter the data (check tool output) before and after

Checker Output:
(See attached file: BEFORE.SUM)

Checker Output:
(See attached file: AFTER.TXT)

      (See attached file: Before.txt)            (See attached file:

Received on Friday, 5 October 2001 15:05:50 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:51 UTC