- From: <html-tidy@war-of-the-worlds.org>
- Date: Sun, 25 Jun 2000 23:05:03 -0500
- To: html-tidy@w3.org
"Richard A. O'Keefe" <ok@atlas.otago.ac.nz> wrote: >HTML Tidy makes the same mistake with other examples of ><CODE>a<CODE>b</CODE>c</CODE>. In fact, it makes the same >mistake with _all_ the %font and %phrase elements that I have >tested. I enclose a test file at the end of this message. >I've tried it with Amaya and Netscape, and more importantly, >I've tried it with Nsgmls. They all love it. Only htmltidy hates it. There has been a problem with Tidy's attempt to clean such things. I thought a correction was made regarding BIG inside BIG and SMALL inside SMALL to not clean those up (because they do have cumulative effect). Otherwise, the nesting is presentationally meaningless and should be cleaned up. Tidy's method of presuming that the second <CODE> should be </CODE> though is a very poor solution. IMO, the cleanup should wait until after all other parsing is done, and then the interior tags can be eliminated as a pair. Otherwise, first-pass parsing can only result in these erroneous outputs: <CODE>a</CODE>bc <CODE>a</CODE><CODE>b</CODE>c <CODE>a</CODE>b<CODE></CODE>c<CODE></CODE> <CODE>a</CODE><CODE>b</CODE>c<CODE></CODE> because it can't see that the tags as written were paired, and not the ideal: <CODE>abc</CODE> > Purpose: Test HTML Tidy. This is all LEGAL and should not be changed. Tidy doesn't just make invalid markup valid. It also cleans up useless yet valid markup. I think this is very appropriate and addresses some types of bad markup still found in the wild, mostly empty tags created by so-called WYSIWYG editors. And sometimes even they are intended, perhaps as a hook for styles which use the "content" property, or used to detect whether or not stylesheets are enabled, things Tidy can't know about without adding lots of code to parse stylesheets, Javascript, VBScript, etc. You can argue that it's still perfectly valid, but if Tidy became paranoid and avoided making any changes to valid markup, next you'll want it to leave certain invalid markup alone because the browsers still do what you mean (already dealt with such a request), and eventually you'll end up with Tidy doing absolutely nothing to anything. Once that happens Tidy ends up being just a glorified cat.
Received on Monday, 26 June 2000 00:05:06 UTC