- From: ewitness - Ben Fowler <bfowler@ewitness.co.uk>
- Date: Tue, 12 Feb 2002 07:58:56 +0000
- To: "Richard A. O'Keefe" <ok@atlas.otago.ac.nz>, html-tidy@w3.org
At 11:27 am +1300 11/2/02, Richard A. O'Keefe wrote: >I proposed a simple rule that could recover from missing '>'. >bfowler@ewitness.co.uk (ewitness - Ben Fowler) wrote: > > (Of course Tidy should close <td> and <tr> tags). > >I don't understand whether this refers to adding '>' or to adding </td>. ></tr> and </td> end-tags are not required for legality in HTML 3.2 or >HTML 4.01. I meant the latter. I thought that it was tidy's job to normalise SGML like this. If it is not, then I am simply wrong. The documentation for tidy includes this ambiguous text: Perfecting lists by putting in tags missed out: <body> <li>1st list item <li>2nd list item is mapped to <body> <ul> <li>1st list item</li> <li>2nd list item</li> </ul> 1) Tidy is not 'perfecting' it is 'correcting', 'amending' or my favoured term 'repairing'. 2) It has done two distinct things: a) Added a missing <ul> start tag b) Normalised omitted </li> end tags It is this 'passing over' or paraleipsis which is the rhetorical term <URL: http://www.chem.leeds.ac.uk/roget/entries/460.html > of normalising the <li> element there that would suggest that tidy normalises tags as a matter of course. Independent of the argument in favour of normalising a tidied document to a canonical form; the benefits of including the end tags I mentioned include coddling a bug in Netscape which sometimes fails to render table content when presented with legal markup omitting these end tags (some 'wysiwyg' editors omit these tags as a matter of routine), <URL: http://www.htmlhelp.com/faq/html/all.html#tables > <URL: http://www.evolt.org/article/Don_t_let_Netscape_cramp_your_styles/17/3268/ > <URL: http://www.hwg.org/resources/faqs/1cssFAQ.html#optiontags > <URL: http://www.cyber.com.au/users/clancy/clancys_tables.html > and improving the performance of layout engines when CSS is in use. <URL: http://www.htmlhelp.com/faq/html/all.html#page-inconsistent > <URL: http://www.leeds.ac.uk/acom/html/lesson10text.html > <URL: http://www.blooberry.com/indexdot/css/syntax/generalbugs.htm > <URL: http://www.webreference.com/html/tutorial11/1.html > Logically, to make a page 'tidy' we should either omit all omissible tags, or include all includable ones, I prefer the latter, and believe that I can support that preference, perhaps you differ. > I have an objection to the process that generates <td compact> > from the input you gave. > > IMHO, Tidy should adopt a "first do no harm" rule. > >That would be a nice rule, *IF* one could make it mean something definite. Surely 'first do no harm' is one of the most definite rules that there is; which is why I used that form of words. > In other words, it should not attempt a partial repair of > elements that it does not fully understand. > >That rule being adopted, Tidy could never repair anything at all. The docs give ten examples, including the <ul> container element that I mentioned earlier > In my view, because it is impossible to determine whether the > author intended > > <td nowrap="nowrap"> > > or > > nowrap is a deprecated attribute for table data cell elements > > Tidy should do nothing. > >I fail to see how logging the problem is an advance on logging the >problem and attempting the repair. The reason I gave the entire list >of words that could incorrectly be swallowed was to point out that it is >a very short list, and most of the "words" are not actual English words >at all (as this one is not). For realistic examples, the odds against a >real word being swallowed in this way are surely extremely low. All right. You have convinced me. I had thought that there was a general issue here, but I now see that it is fairly specific, and you are right. > I appreciate that there is a case for changing to <td>, and > expecting the author to re-check the page in an HTML editor > or browser, but I personally would not do it. > > The best that Tidy could do is log the problem. (Or perhaps > add a comment). > >That is rather far from the best. _Whichever_ is the intended correction, >"<td nowrap> is a deprecated ...." is an excellent starting point for a >manual correction. No matter WHAT changes Tidy makes, it is always >necessary for someone to check the result. I was taking human nature into account. I suspect that many people would run the document through tidy and call it a day. Many people are minded to take the output of computer as 'vox dei'. I would doubt that in the real world the results would be rigorously checked, and I suspect that often would not be checked at all. In the case in point, the check might have to be done in a text editor rather than an HTML editor. Ben.
Received on Tuesday, 12 February 2002 03:00:54 UTC