Re: Line feeds in PRE blocks from Ian Hickson on 1999-02-13 (www-html@w3.org from February 1999)

From: Ian Hickson <py8ieh@bath.ac.uk>
Date: Sat, 13 Feb 1999 16:15:09 +0000 (BST)
To: kebl0820@sable.ox.ac.uk, www-html-editor@w3.org
cc: HTML mailing list <www-html@w3.org>
Message-ID: <Pine.GSO.4.04.9902131541370.24261-100000@midge.bath.ac.uk>

On Sat, 13 Feb 1999, Tim Bagot wrote:
>>> ...Does this mean that the following:
>>>
>>> <PRE>
>>>   Oranges
>>>   Lemons
>>> </PRE>
>>>
>>> ...where each line is separated by a linefeed character (i.e., where
>>> the document was created on Unix), and the following:
>>>
>>> <PRE>   Oranges   Lemons </PRE>
>>>
>>> ...should be parsed as being the same thing?
>>>
>>> And therefore, does it mean that any PRE blocks in HTML files that
>>> have lines separated only by linefeed characters (as opposed to
>>> carriage returns (SGML "RE" characters, code point 13)) should be
>>> rendered as a single continuous line?
> In theory at least, this would appear to be the case.
Ok, that's what I figured.

In that case, for the purposes of making the specification something that
browser authors can actually support while not breaking every single page
out there, I suggest adding the following rule to HTML4 somewhere:

   Any RS in the input document shall imply an RE before it, and any RE in
   the input document shall imply an RS after it, unless such characters
   are already directly adjacent with no intervening characters whatsoever
   (either as an RE RS pair, or an RS RE pair).

   For example, the following:

   abc &RE; def &RS; ghi &RS;&RE; jkl

   ...would be interpreted by strict SGML parsers as:

   -----------
   abc 
    def  ghi 
    jkl
   -----------

   ...but according to the rule defined above, must be interpreted by
   conforming HTML4 UAs as:

   -----------
   abc
    def 
    ghi 
    jkl
   -----------

   This is required to circumvent the lack of a consistent cross platform
   agreement on which characters should indicate line ends.

This rule will make section B.3.1 of HTML 4 have the expected (and almost
implied) behaviour of treating CR and LF as equivalents, and will allow
browsers to both correctly implement all of HTML4 and be compliant with
the "de facto" standard behaviour expected of Unix web authors.

Note. The rule I suggest above is so obvious I cannot work out why SGML is
missing it in the first place. Surely if you have an RE (record end
delimiter) it *implies* an RS (record start) delimiter immediately
afterwards!? No? Otherwise, what goes between records?!

> Fortunately, no implementation of HTML of which I am aware uses SGML
> directly.
It's the first time I've seen a positive angle on the not-using-SGML-
parsers "bug"...

-- 
Ian Hickson

Received on Saturday, 13 February 1999 11:15:18 UTC