- From: Ian Hickson <py8ieh@bath.ac.uk>
- Date: Sat, 13 Feb 1999 16:15:09 +0000 (BST)
- To: kebl0820@sable.ox.ac.uk, www-html-editor@w3.org
- cc: HTML mailing list <www-html@w3.org>
On Sat, 13 Feb 1999, Tim Bagot wrote: >>> ...Does this mean that the following: >>> >>> <PRE> >>> Oranges >>> Lemons >>> </PRE> >>> >>> ...where each line is separated by a linefeed character (i.e., where >>> the document was created on Unix), and the following: >>> >>> <PRE> Oranges Lemons </PRE> >>> >>> ...should be parsed as being the same thing? >>> >>> And therefore, does it mean that any PRE blocks in HTML files that >>> have lines separated only by linefeed characters (as opposed to >>> carriage returns (SGML "RE" characters, code point 13)) should be >>> rendered as a single continuous line? > In theory at least, this would appear to be the case. Ok, that's what I figured. In that case, for the purposes of making the specification something that browser authors can actually support while not breaking every single page out there, I suggest adding the following rule to HTML4 somewhere: Any RS in the input document shall imply an RE before it, and any RE in the input document shall imply an RS after it, unless such characters are already directly adjacent with no intervening characters whatsoever (either as an RE RS pair, or an RS RE pair). For example, the following: abc &RE; def &RS; ghi &RS;&RE; jkl ...would be interpreted by strict SGML parsers as: ----------- abc def ghi jkl ----------- ...but according to the rule defined above, must be interpreted by conforming HTML4 UAs as: ----------- abc def ghi jkl ----------- This is required to circumvent the lack of a consistent cross platform agreement on which characters should indicate line ends. This rule will make section B.3.1 of HTML 4 have the expected (and almost implied) behaviour of treating CR and LF as equivalents, and will allow browsers to both correctly implement all of HTML4 and be compliant with the "de facto" standard behaviour expected of Unix web authors. Note. The rule I suggest above is so obvious I cannot work out why SGML is missing it in the first place. Surely if you have an RE (record end delimiter) it *implies* an RS (record start) delimiter immediately afterwards!? No? Otherwise, what goes between records?! > Fortunately, no implementation of HTML of which I am aware uses SGML > directly. It's the first time I've seen a positive angle on the not-using-SGML- parsers "bug"... -- Ian Hickson
Received on Saturday, 13 February 1999 11:15:18 UTC