W3C home > Mailing lists > Public > www-html@w3.org > February 1999

Re: Line feeds in PRE blocks

From: Ian Hickson <py8ieh@bath.ac.uk>
Date: Mon, 15 Feb 1999 00:15:02 +0000 (BST)
To: Inanis Brooke <alatus@earthlink.net>
cc: www-html@w3.org
Message-ID: <Pine.GSO.4.04.9902150005320.17699-100000@midge.bath.ac.uk>
On Sat, 13 Feb 1999, Inanis Brooke wrote:

>>> Also, would the answer to this question involve how the OS renders
>>> text, or is it entirely dependent upon the web client software?
>>> (my curiosity.)
>> Only the UA (user agent = web browser) is involved.
> If the web client software is the only thing involved, then am I to
> understand that the w3c specifies how a UA should react to these
> conditions, or is it something undefined?

It is very well defined (see my first post on the subject) - the only
problem is that the current definition is at odds with virtually every
use of the PRE block currently on the web.


Off-topic bit:

> Also, I have a question as to the character referenced. For example,
> I know that the ANSI character for starting a new line of text is
> not supported in ASCII, or at least, I gather that from seeing those
> annoying rectangular blocks where there should be line breaks on
> some text documents in windows notepad.

The problem there is that MS-DOS (and thus now Windows) has always
used two characters in a special sequence to define newlines, while
Unix uses only one of those two characters.

To be precise: 
 * In Unix, lines end at &#10; characters.
 * In DOS/Windows, lines end at &#13;&#10; sequences.

...where &#10; is the line-feed (LF) character, called RS by the SGML
standard (ISO8879), and &#13; is the carriage return (CR) character,
called RE by the SGML standard.

When notepad comes across a lone LF character, it displays the
'unknown character' glyph (the annoying square block). When you hit
return, it inserts both a CR and an LF, in that order, into your
document. Similarly, when Emacs comes across a CR LF pair, it starts a
new line, but ends the previous one with "^M", because that is the
symbol for a carriage return (M is the 13th letter of the alphabet).

BTW - This difference in newline endings is the reason that when
transferring ASCII files between Unix and DOS PCs, the file will
always be bigger on the DOS PC.

-- 
Ian Hickson
Received on Sunday, 14 February 1999 19:15:11 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:15:38 GMT