W3C home > Mailing lists > Public > www-validator@w3.org > February 2011

Re: Cleaning up with TIDY

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Tue, 15 Feb 2011 09:58:44 +0200
Message-ID: <72AF96E49C154ED6878B61AADA80D39B@JukanPC>
To: <historysoul@earthlink.net>, <www-validator@w3.org>
historysoul@earthlink.net wrote:

> Upon validating some pages, with the ³clean up with HTML Tidy² box
> checked, I'm wondering why the Œtidyı version would show some coding
> that is exactly the same code as what it gave me errors for.  It
> still wont pass validation, but why would it just give me the same
> code I gave to it?

I'm curious to know what might cause that (as usual, a URL would have been 
essential), but as the validator says in its report (adjacent to the Tidy'ed 
version),
"HTML-Tidy is a third-party software not developed at W3C, and its output is 
provided without any guarantee."
The text has a link to http://search.cpan.org/dist/HTML-Tidy/ which might be 
a good start for looking for a solution or filing a bug report.

I'd like to add that HTML Tidy is a rather crude tool that does not simply 
"tidy up" documents but converts them with no guarantee that meaning and 
behavior are preserved. In validation context, I would recommend it only for 
getting an idea of how some features, disallowed in some HTML versions, 
_might_ be replaced by the use of different HTML and some CSS.

In fact I'm afraid people can get rather confused if they e.g. try to 
validate a document as HTML5, say

<!doctype html>
<title>Hello</title>
<big>Hello world!</big>

and request for HTML-Tidy output, which would in this case be

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<meta name="generator" content="HTML Tidy for Linux (vers 6 November 2007), 
see www.w3.org">
<title>Hello</title>
</head>
<body>
<big>Hello world!</big>
</body>
</html>

I wonder why the meta tag refers to www.w3.org... But anyway, here Tidy has 
changed the doctype to an HTML 3.2 doctype, rather oddly. It has preserved 
the <big> markup, which is OK in HTML 3.2 (and HTML 4), so apart from the 
uncalled-for changes in the doctype and the insertion of a meta tag, and the 
addition of some optional tags, it has not changed the document.

If you take the Tidy output and submit it to validation, it passes, so now 
you have been helped to move from HTML5 to HTML 3.2. If you manually 
override the doctype and try to validate as HTML5, it fails of course.

Perhaps the bottom line of this is:
HTML-Tidying a document does not imply that it will validate, even though 
people may understand things so (and their belief is supported by the 
feature that you don't get HTML-Tidy output if a page validates, even if you 
have selected "Clean up Markup with HTML-Tidy").

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/ 
Received on Tuesday, 15 February 2011 08:00:06 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:45 GMT