- From: Advocate <wotw@inetnebr.com>
- Date: Sat, 28 Aug 1999 17:35:34 -0500
- To: html-tidy@w3.org
Terry Teague <teague@mailandnews.com> wrote: >At 6:58 PM -0700 8/21/99, Terry Teague wrote: >>At 12:16 AM -0700 8/16/99, Terry Teague wrote: >>>At 6:13 PM -0500 8/15/99, Advocate wrote: >>>> <table border="1" summary="Demonstration of HTML Tidy's erroneous \ >>>> elimination of empty rows."> >>>> >>>> (Hmm, I seem to have found a bug in MacTidy 1.0b2, specifically that >>>> trailing slash in the summary. That can't be right, can it? No, it >>>> isn't being inserted by my mailer, I've double-checked that already. >>> Would like to see what happens on other platforms. >> I have now confirmed this bug occurs also on other platforms > Just to followup - this is NOT a bug. Tidy's pretty print splits up long > strings (e.g. "Demonstration of HTML Tidy's erroneous elimination of empty > rows.") across lines, using a "\". This appears to be valid HTML - i.e. if > you run the output described above through Tidy again, it finds no > problems; also I tested in Netscape, and it was happy. Only because backslash is a legal character in an attribute value of format CDATA. It is still adding information that wasn't there before, and would likely add it elsewhere where it would be in the way of visual browsers (such as a long ALT attribute). Unless someone can point me to where in the HTML specification it says that newlines in attribute values are to be escaped with "\" (which is the only purpose I could see for HTML Tidy in generating it), I will still consider it to be a bug needing repair. In <http://www.w3.org/TR/html40/types.html#type-cdata>, it is written: | o CDATA is a sequence of characters from the document character | set and may include character entities. User agents should | interpret attribute values as follows: | | o Replace character entities with characters, | o Ignore line feeds, | o Replace each carriage return or tab with a single space. | | User agents may ignore leading and trailing white space in | CDATA attribute values (e.g., " myval " may be interpreted as | "myval"). Authors should not declare attribute values with leading | or trailing white space. Note the second sub-item: "Ignore line feeds". It does not say, "Ignore '\'-escaped line feeds" or assign any special meaning to '\' in CDATA attributes. Also note the third: "Replace each carriage return... with a single space", again lacking reference to a special meaning for backslash. Thus, HTML Tidy is in error generating a backslash where there was not one before. It _is_ a bug. (HTML observation: those two sub-items pointed out above seems to convey a prejudice over end-of-line markers. Under a document with UNIX line-ends, the linefeed would be ignored; under a document with Mac line-ends, the carriage return would be translated to a space; and under a document with PC line-ends, the CR would be turned to a space and the LF ignored. And you can't mix-and-match line-ends in a single document. One has to know what kind of line-ends they are authoring with and provide or not provide a whitespace character accordingly. One could argue that one can err on the side of tolerance and always provide the whitespace, either at the end of the previous line or the start of the next, except if whitespace isn't being collapsed in that part of the document, for examples when PRE or the style rule "white-space: pre;" is being used.) -- ,=<#)-=# <http://www.war-of-the-worlds.org/> ,_--//--_, _-~_-(####)-_~-_ "Did you see that Parkins boy's body in the tunnels?" "Just (#>_--'~--~`--_<#) the photos. Worst thing I've ever seen; kid had no face."
Received on Saturday, 28 August 1999 18:36:11 UTC