W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2002

Re: Repairing incorrect tag minimisation was Re: Tags lacking a terminating '>' are spotted

From: Richard A. O'Keefe <ok@atlas.otago.ac.nz>
Date: Wed, 13 Feb 2002 13:22:49 +1300 (NZDT)
Message-Id: <200202130022.NAA380462@atlas.otago.ac.nz>
To: bfowler@ewitness.co.uk, html-tidy@w3.org, ok@atlas.otago.ac.nz
bfowler@ewitness.co.uk (ewitness - Ben Fowler) quotes the example
		   <body>
		   <li>1st list item
		   <li>2nd list item
being mapped to	

		   <body>
		   <ul>
		   <li>1st list item</li>
		   <li>2nd list item</li>
		   </ul>
	>That rule being adopted, Tidy could never repair anything at all.
	
	The docs give ten examples, including the <ul> container element
	that I mentioned earlier
	
This is, on the contrary, an excellent example of Tidy making a guess
about the correction, a guess which cannot be relied on in general.

(A) It's true that <ul> will fit here, but so will <ol>.
    The heuristic is "this page probably looked OK in some browser;
    it would not have had numbers, so the author probably didn't want
    numbers."  But it's only a heuristic; if someone types HTML manually
    and runs Tidy on it _before_ viewing it in a browser, it is quite
    likely to be the wrong change.

    It's _still_ a good starting point for manual completion of the repair
    even if it _is_ wrong.

(B) Let's generalise the example a little bit:
	<ul>
	<li>One item
	<li>Another item.
	<!-- the </ul> was supposed to be here -->
	This is supposed to be a new paragraph.
	<p>And so is this.
    Tidy will convert this to
	<ul>
	<li>One item</li>
	<li>Another item.
	<!-- the </ul> was supposed to be here -->
	This is supposed to be a new paragraph.</li>
	<p>And so is this.</p>
    instead of to
	<ul>
	<li>One item</li>
	<li>Another item.
	<!-- the </ul> was supposed to be here --></li>
	</ul>
	This is supposed to be a new paragraph.
	<p>And so is this.</p>

    Any time that an element (such as <ul>) can be followed by material
    that would be allowed inside it (possibly inside some nest of
    descendant elements that have omissible end-tags, like <li>), it is
    impossible for Tidy to be sure where to put the end-tags.  Placing
    them as far to the right as possible is a good rule, and the result
    is a good starting-point for manual correction, but it is only a
    heuristic and not only can go wrong, it does go wrong.

The <td nowrap>... example is very similar to <B>; a missing right
bracket, and something following the place where the right bracket
should have been that would have been legal before the right bracket.
There is no perfect rule for where to place the ">", but the rule that
has been proposed is quite as good as the rule for restoring missing
</ul> end-tags.
Received on Tuesday, 12 February 2002 19:22:54 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:51 GMT