RE: Tidy becomes less forgiving from J. David Bryan on 2001-09-17 (html-tidy@w3.org from July to September 2001)

From: J. David Bryan <jdbryan@acm.org>
Date: Mon, 17 Sep 2001 19:21:29 -0400
To: HTML Tidy List <html-tidy@w3.org>
Message-Id: <200109172321.f8HNLVX23638@mail.bcpl.net>

On 17 Sep 2001, at 16:54, Reitzel, Charlie wrote:

> But consider those cases where most browsers support a feature not
> included in any HTML spec (e.g. cols attribute of <table>).  Tidy knows
> all about this one and doesn't emit a peep. I.e. Tidy respects the de
> facto standard implemented by major browsers.

Your observation is correct, but your conclusion is suspect, I'm afraid.

Tidy doesn't check to ensure that otherwise valid attributes are placed on 
the correct tags.  So Tidy accepts the "cols" attribute (valid for use on 
the "textarea" tag) on the "table" tag as you say.

However, Tidy also doesn't complain if the "table" tag contains the 
attributes "type=disc" or "charset=iso-8859-1" or "rel=stylesheet".  To 
which de facto standard do these belong?  ;-)

> Nor am I so quick to label as "invalid" large bodies of working code.

If the markup doesn't adhere to the syntactic and semantic requirements of 
the HTML specification, then it is invalid HTML by definition.  Whether 
such invalid HTML behaves as one wants is an entirely separate issue.

A misspelled English word is misspelled regardless of whether it is 
understood by the reader or whether it conveys the appropriate meaning 
(i.e., whether or not it "works").

> If everybody took the rigid view, the web simply wouldn't exist.

Personally, I believe that if everybody had taken the rigid view, we would 
have a much more interoperable and accessible Web.

But this is all really beside the point.  If you believe that Tidy should 
morph into a program that generates syntactically invalid HTML, that's 
fine.  The result won't be HTML Tidy, though, because the function of the 
program will have changed significantly.  Whether that changed function is 
or is not desirable is open to debate, but if so, then I believe that it 
should be implemented separately from a program that has, as its 
fundamental goal, a transformation to valid HTML.

I use Tidy to turn junk markup into valid HTML.  I suspect others might be 
using it for similar purposes.  If the output of Tidy doesn't validate and 
doesn't warn about the use of proprietary constructs, then I would consider 
that to be a Tidy bug (as the mix-and-match attribute problem is a bug).

If it is deemed desirable to have a program that generates invalid HTML, 
then a new program could be created from the existing Tidy code.  I 
suppose, as an alternate to creating a separate program, that adding an 
option (perhaps "--broken-html" :-) to Tidy would be acceptable, although 
at the expense of additional complexity.  But changing Tidy to generate 
invalid HTML, "de facto standards" notwithstanding, will render Tidy 
useless for a segment of the user population.

                                      -- Dave

Received on Monday, 17 September 2001 19:21:35 UTC