- From: E. Stephen Mack <estephen@emf.net>
- Date: Sat, 12 Jul 1997 14:46:54 -0700
- To: www-html@w3.org
The HTML 4.0 specification contains the following phrase in http://www.w3.org/TR/WD-html40/struct/text.html#h-7.3.1 > A line break occurring immediately following a > start tag should be discarded, as should a line > break occurring immediately before an end tag. > This applies to all HTML elements without exceptions. (HTML 3.2 and earlier versions had a similar statement, if not as strongly worded.) Of course this rule is not always followed by popular browsers such as Navigator and IE. Even Lynx (2.4 and 2.7) renders the following with an extra space: <A HREF="http://www.w3.org/">The W3C </A> I have a recently-updated page that demonstrates how this browser behavior can affect the presentation of tables and images. The document is available at: http://www.emf.net/~estephen/htmlner/whitespacebugs.html The above page establishes the position, and documents some inconsistencies. I'd like to take the argument further and explore some esoteric parts of white space rules. I have two questions arising from ambiguities of the implications of discarding carriage returns. 1. Should multiple carriage returns also be discarded? The next sentence in the specification is: > In addition, for all elements except PRE, a sequence > of contiguous white space characters such as spaces, > horizontal tabs, form feeds and line breaks, should be > replaced by a single word space. (I agree with Arnoud's earlier comment that the specification should mention non-breaking spaces here, or else specifically disqualify non-breaking spaces if that is the intention.) "In addition" is ambiguous here. Consider the following: -----8<-----begin_code-----8<----- <A HREF="http://www.w3.org/">The W3C </A> -----8<-----end_code-----8<----- A user agent is told by the HTML 4.0 specification to collapse multiple carriage return into a single space, which turns this into: <A HREF="http://www.w3.org/">The W3C </A> (These rules of white space apply to all HTML elements, so don't object that I'm using the anchor as an example here. It applies equally well to </TD>, say.) This means that the number of carriage returns before an end tag is significant, which seems counter to HTML's white space philosophy to me; but I'm not sure of the intention here, and I'm not familiar enough with the underlying SGML end token rules. So perhaps it's intentional that two carriage returns are different than one carriage return. If we agree that there shouldn't be a difference between one carriage return and two carriage returns, perhaps the specification could demand that "Any number of line breaks" after a start tag or before an end tag be discarded. 2. What about end tags that are implied by the presence of a start tag? Consider a table example. <TABLE border="1"> <TR> <TD>Data Cell One <TD>Data Cell Two <TR> <TD>Data Cell Three <TD>Data Cell Four </TABLE> Popular browsers actually treat the following arrangement of the table elements differently: <TABLE border="1"> <TR><TD>Data Cell One</TD> <TD>Data Cell Two</TD></TR> <TR><TD>Data Cell Three</TD> <TD>Data Cell Four</TD></TR></TABLE> The removal of white space before the end tags causes the table to be rendered without any trailing spaces in the data cells. However, if the end tags are not present or if there is white space, then trailing spaces *are* rendered in the data cells. I believe the specification should state explicitly that <TD>Data Cell One <TD>Data Cell Two is equivalent to: <TD>Data Cell One </TD><TD>Data Cell Two and therefore equivalent to: <TD>Data Cell One</TD><TD>Data Cell Two The current statement in the 4.0 draft specification doesn't apply to the removal of line breaks *before* start tags. * * * The presence or absence of a single trailing space in the rendering may not seem particularly important since the structure is presented, but sometimes the presentation of the extra white space can be confusing to viewers. Consider the examples of using images in tables. A trailing space causes the images to not be rendered next to each other, when the images should be considered continuous. This has structural implications. Furthermore, an image being used as an anchor for a link such as: <A HREF="http://www.foo.com/"><IMG SRC="foo.gif"> </A> is rendered incorrectly by browsers with a trailing underlined space that is confusing to viewers. To avoid these ambiguities and behaviors, I applaud the strong wording of the current HTML 4.0 draft. I just want to make sure that the demand for line break removal is as robust as possible, so that user agents don't have any excuse for non-compliance. -- E. Stephen Mack <estephen@emf.net> http://www.emf.net/~estephen/
Received on Saturday, 12 July 1997 17:46:02 UTC