- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Sun, 29 Jul 2012 10:37:16 +0300
- To: Rob^_^ <iecustomizer@hotmail.com>
- CC: "w3.org Validator List" <www-validator@w3.org>
2012-07-29 5:40, Rob^_^ wrote: > consider this simple html document. [...] > which the w3c validator ‘Tidy html’ corrects to > <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN"> [...] You are referring to the "Clean up Markup with HTML-Tidy" option in the extended user interface ("More Options") at <http://validator.w3.org>. As the result of taking that option tells, "HTML-Tidy is a third-party software not developed at W3C, and its output is /provided without any guarantee/". The option is more or less bogus. Just don't use it. In addition to the feature you have observed, the option causes the incomplete HTML 3.2 doctype to be emitted even if you used a different doctype or implied it to the HTML 4.01 doctype. Moreover, when getting rid of presentational markup, HTML-Tidy uses automatically generated class names, so the result is really less readable than the original. And it can go very wrong. Consider this: <!doctype html> <title>Hello world</title> <p class=c1>Hello <p align=center>Hi! This results in the following "tidied" document: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta name="generator" content="HTML Tidy for Linux (vers 25 March 2009), see www.w3.org"> <title>Hello world</title> <style type="text/css"> p.c1 {text-align: center} </style> </head> <body> <p class="c1">Hello</p> <p class="c1">Hi!</p> </body> </html> Not only has it changed the HTML5 doctype (ensuring Standards Mode as far as possible) to an HTML 3.2 doctype, with the risk of Quirks Mode. It has also "cleaned up" align=center by introducing the class name c1 and associated CSS, without checking that the name is already in use, so this would end up with centering both paragraphs (plus applying whatever external CSS might apply to class c1). In addition to this, people trying to use HTML5 have been misled into thinking that HTML-Tidy generally fixes presentational markup, converting it to use CSS. For example, if you submit the following document, you will get an error message, saying "The width attribute on the td element is obsolete. Use CSS instead.": <!doctype html> <title>Hello world</title> <table><tr><td width=100>foo</table> Now, as HTML-Tidy has been advertised to fix problems of this type, and since it is available as an option in the validator's user interface, people take this option and get a "tidied" version - which has the same <table> markup, just with different formatting. It gets worse. Suppose you are validating an HTML5 document, with <!doctype html>, containing some element introduced i HTML5, say <aside>What is going on?</aside> to the document. You will get no error message about it of course, since you are validating with HTML5, but if you use the HTML-Tidy option, the "tidied" document has been silently ripped off of the <aside> and </aside> tags. (This happens when there is _some_ error message.) The "Tidy-HTML" option should simply be removed. The Tidy-HTML software should be used, at most, by people who know well what it really does. And such people can surely run it separately on their documents. Yucca
Received on Sunday, 29 July 2012 07:37:48 UTC