Re: 30apr00 nested <CODE>

"Richard A. O'Keefe" <ok@atlas.otago.ac.nz> wrote:

>HTML Tidy makes the same mistake with other examples of
><CODE>a<CODE>b</CODE>c</CODE>.  In fact, it makes the same
>mistake with _all_ the %font and %phrase elements that I have
>tested.  I enclose a test file at the end of this message.
>I've tried it with Amaya and Netscape, and more importantly,
>I've tried it with Nsgmls.  They all love it.  Only htmltidy hates it.

There has been a problem with Tidy's attempt to clean such things.  I
thought a correction was made regarding BIG inside BIG and SMALL inside
SMALL to not clean those up (because they do have cumulative effect).
Otherwise, the nesting is presentationally meaningless and should be
cleaned up.  Tidy's method of presuming that the second <CODE> should be
</CODE> though is a very poor solution.  IMO, the cleanup should wait until
after all other parsing is done, and then the interior tags can be
eliminated as a pair.  Otherwise, first-pass parsing can only result in
these erroneous outputs:

	<CODE>a</CODE>bc
	<CODE>a</CODE><CODE>b</CODE>c
	<CODE>a</CODE>b<CODE></CODE>c<CODE></CODE>
	<CODE>a</CODE><CODE>b</CODE>c<CODE></CODE>

because it can't see that the tags as written were paired, and not the ideal:

	<CODE>abc</CODE>

>     Purpose: Test HTML Tidy.  This is all LEGAL and should not be changed.

Tidy doesn't just make invalid markup valid.  It also cleans up useless yet
valid markup.  I think this is very appropriate and addresses some types of
bad markup still found in the wild, mostly empty tags created by so-called
WYSIWYG editors.

And sometimes even they are intended, perhaps as a hook for styles which
use the "content" property, or used to detect whether or not stylesheets
are enabled, things Tidy can't know about without adding lots of code to
parse stylesheets, Javascript, VBScript, etc.  You can argue that it's
still perfectly valid, but if Tidy became paranoid and avoided making any
changes to valid markup, next you'll want it to leave certain invalid
markup alone because the browsers still do what you mean (already dealt
with such a request), and eventually you'll end up with Tidy doing
absolutely nothing to anything.  Once that happens Tidy ends up being just
a glorified cat.

Received on Monday, 26 June 2000 00:05:06 UTC