W3C home > Mailing lists > Public > www-html@w3.org > February 1999

Re: Line feeds in PRE blocks

From: Ian Hickson <py8ieh@bath.ac.uk>
Date: Mon, 15 Feb 1999 00:15:02 +0000 (BST)
To: Inanis Brooke <alatus@earthlink.net>
cc: www-html@w3.org
Message-ID: <Pine.GSO.4.04.9902150005320.17699-100000@midge.bath.ac.uk>
On Sat, 13 Feb 1999, Inanis Brooke wrote:

>>> Also, would the answer to this question involve how the OS renders
>>> text, or is it entirely dependent upon the web client software?
>>> (my curiosity.)
>> Only the UA (user agent = web browser) is involved.
> If the web client software is the only thing involved, then am I to
> understand that the w3c specifies how a UA should react to these
> conditions, or is it something undefined?

It is very well defined (see my first post on the subject) - the only
problem is that the current definition is at odds with virtually every
use of the PRE block currently on the web.

Off-topic bit:

> Also, I have a question as to the character referenced. For example,
> I know that the ANSI character for starting a new line of text is
> not supported in ASCII, or at least, I gather that from seeing those
> annoying rectangular blocks where there should be line breaks on
> some text documents in windows notepad.

The problem there is that MS-DOS (and thus now Windows) has always
used two characters in a special sequence to define newlines, while
Unix uses only one of those two characters.

To be precise: 
 * In Unix, lines end at &#10; characters.
 * In DOS/Windows, lines end at &#13;&#10; sequences.

...where &#10; is the line-feed (LF) character, called RS by the SGML
standard (ISO8879), and &#13; is the carriage return (CR) character,
called RE by the SGML standard.

When notepad comes across a lone LF character, it displays the
'unknown character' glyph (the annoying square block). When you hit
return, it inserts both a CR and an LF, in that order, into your
document. Similarly, when Emacs comes across a CR LF pair, it starts a
new line, but ends the previous one with "^M", because that is the
symbol for a carriage return (M is the 13th letter of the alphabet).

BTW - This difference in newline endings is the reason that when
transferring ASCII files between Unix and DOS PCs, the file will
always be bigger on the DOS PC.

Ian Hickson
Received on Sunday, 14 February 1999 19:15:11 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:05:49 UTC