W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2000

Re: 30apr00 nested <CODE>

From: Richard A. O'Keefe <ok@atlas.otago.ac.nz>
Date: Mon, 26 Jun 2000 17:39:22 +1200 (NZST)
Message-Id: <200006260539.RAA17692@atlas.otago.ac.nz>
To: html-tidy@w3.org, html-tidy@war-of-the-worlds.org
I wrote about <CODE>...<CODE>...</CODE>...</CODE>

gjames@pop.war-of-the-worlds.org (Unverified) wrote
	Otherwise, the nesting is presentationally meaningless and
	should be cleaned up.

Well, no.  A style sheet could perfectly well say that CODE inside
CODE is formatted differently from CODE that is not inside CODE.
It doesn't require a CLASS attribute to do this.

  code {
     color: red; white-space: pre; font-family: monospace;
  }
  code code {
     color: blue; white-space: pre; font-family: monospace;
  }

That applies to things like <SAMP><SAMP> and <VAR><VAR> as well.
They *were* presentationally redundant (not meaningless) before the
introduction of CSS, but *aren't* now.

	Tidy's method of presuming that the second <CODE> should be
	</CODE> though is a very poor solution.  IMO, the cleanup should
	wait until after all other parsing is done, and then the
	interior tags can be eliminated as a pair.

Exactly.  But even then, not without explicit permission.

	Tidy doesn't just make invalid markup valid.  It also cleans up
	useless yet valid markup.  I think this is very appropriate and
	addresses some types of bad markup still found in the wild,
	mostly empty tags created by so-called WYSIWYG editors.
	
True.  I see a lot of empty <P> elements, and <TABLE>s inside <FONT>s,
and really unbelievable rubbish, from big-name products.

	You can argue that it's still perfectly valid, but if Tidy
	became paranoid and avoided making any changes to valid markup,
	next you'll want it to leave certain invalid markup alone
	because the browsers still do what you mean (already dealt with
	such a request), and eventually you'll end up with Tidy doing
	absolutely nothing to anything.  Once that happens Tidy ends up
	being just a glorified cat.

This is a "slippery slope" argument.  Such arguments can be valid, but
there is a huge difference between syntactically legal HTML and
syntactically illegal argument, making the claim "next you'll want it
to leave certain invalid markup alone" unwarranted.

In this case, we are talking about a construction which is not just
legal but *harmless*.  Now HTML Tidy has this "feature" because there
are HTML generators out there that get their brackets wrong; sometimes
this correction is *essential*.

I suppose that there is no disagreement that
 - a warning is appropriate for technically legal "lint"
 - this kind of stuff is sometimes a mistake so should SOMETIMES be fixed
 - HTML Tidy's configuration files are a Good Thing
The question is
 - *when* should the correction be made
 - should it be made at all if the start- and end-tags are in fact
   correctly balanced?
 - what form should the correction take? (is there any disagreement
   that HTML Tidy's treatment of *this* example is undesirable?)
 - should the alteration of legal nested %fonts and %phrases be
   enabled by default?

What you answer may well depend on what the HTML you most commonly have
to clean up looks like.  This particular example is automatically
generated by someone else's program which I'd rather not have to try to fix,
especially as it technically isn't broken.
Received on Monday, 26 June 2000 01:39:28 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:44 GMT