W3C home > Mailing lists > Public > www-validator@w3.org > August 2012

Re: Mark-up/clean-up

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Mon, 20 Aug 2012 07:28:06 +0300
Message-ID: <5031BCD6.9040203@cs.tut.fi>
To: "www-validator@w3.org" <www-validator@w3.org>
CC: richard lebeau <lebrpl61@gmail.com>
2012-08-19 23:32, David Dorward wrote:

> On 19 Aug 2012, at 19:51, richard lebeau <lebrpl61@gmail.com> wrote:
>
>> Mark-up/clean-up, Is it possible to copy clean-up and markup fixed errors into my existing website so I can correct errors on validator
>
> The validator has an option marked "Clean up Markup with HTML-Tidy". Obviously this is imperfect as guessing what an author meant is difficult.

Moreover, the "clean-up" simply runs HTML Tidy, performing questionable 
transformations. It does not actually clean things up.

(There's also the technical issue that automatic copying of "fixed" 
markup into a website would require quite a lot new code and could not 
work if the pages are actually generated by PHP, ASP, or other 
server-side technologies.)

Here's a simple document with some errors:

<!doctype html>
<p align=center>Hello</p>

Here's what HTML-Tidy turns it to:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<meta name="generator" content="HTML Tidy for Linux (vers 25 March 
2009), see www.w3.org">
<title></title>

<style type="text/css">
  p.c1 {text-align: center}
</style>
</head>
<body>
<p class="c1">Hello</p>
</body>
</html>

So it changes the doctype to a historic one, HTML 3.2, so that the 
"cleaned-up" document does not validate (due to code that HTML-Tidy has 
added!). It adds a meta tag about generator, referring to www.w3.org, as 
if HTML-Tidy were W3C software, contrary to the statement "HTML-Tidy is 
a third-party software not developed at W3C, and its output is provided 
without any guarantee." It inserts a title element with empty content - 
hardly an improvement in practice. It replaces the attribute 
align=center by the use of CSS, using a generated class name, without 
checking whether that name is already used in the document. Oh, and it 
adds some optional tags like <html> and <head> - not required for validity.

Yucca
Received on Monday, 20 August 2012 04:28:37 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 20 August 2012 04:28:44 GMT