Re: Repairing incorrect tag minimisation was Re: Tags lacking a terminating '>' are spotted

I proposed a simple rule that could recover from missing '>'.
bfowler@ewitness.co.uk (ewitness - Ben Fowler) wrote:

	(Of course Tidy should close <td> and <tr> tags).
	
I don't understand whether this refers to adding '>' or to adding </td>.
</tr> and </td> end-tags are not required for legality in HTML 3.2 or
HTML 4.01.

	I have an objection to the process that generates <td compact>
	from the input you gave.
	
	IMHO, Tidy should adopt a "first do no harm" rule.

That would be a nice rule, *IF* one could make it mean something definite.

	In other words, it should not attempt a partial repair of
	elements that it does not fully understand.

That rule being adopted, Tidy could never repair anything at all.

	In my view, because it is impossible to determine whether the
	author intended
	
		<td nowrap="nowrap">
	
	or
	
		nowrap is a deprecated attribute for table data cell elements
	
	Tidy should do nothing.
	
I fail to see how logging the problem is an advance on logging the
problem and attempting the repair.  The reason I gave the entire list
of words that could incorrectly be swallowed was to point out that it is
a very short list, and most of the "words" are not actual English words
at all (as this one is not).  For realistic examples, the odds against a
real word being swallowed in this way are surely extremely low.

	I appreciate that there is a case for changing to <td>, and
	expecting the author to re-check the page in an HTML editor
	or browser, but I personally would not do it.
	
	The best that Tidy could do is log the problem. (Or perhaps
	add a comment).
	
That is rather far from the best.  _Whichever_ is the intended correction,
"<td nowrap> is a deprecated ...." is an excellent starting point for a
manual correction.  No matter WHAT changes Tidy makes, it is always
necessary for someone to check the result.

Received on Sunday, 10 February 2002 17:27:20 UTC