W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2000

Re: Help on "whitespace"

From: P. T. Rourke <ptrourke@mediaone.net>
Date: Fri, 24 Mar 2000 11:45:35 -0600
To: "Martin Brunecky" <mbrunecky@OneRealm.com>, <html-tidy@w3.org>
Message-ID: <OF16BA2A04.79951153-ON8625686C.0057CCEC@rfdinc.com>

According to the List-Owner's own specification of HTML 3.2
( http://www.w3.org/TR/REC-html32.html ; sometimes a little easier to read
than the current 4.01 recommendation, especially for students):

"Except within literal text (e.g. the PRE element), HTML treats contiguous
sequences of white space characters as being equivalent to a single space
character (ASCII decimal 32). These rules allow authors considerable
flexibility when editing the marked-up text directly. "

Another way to put it is that any sequence of 1 or more blank spaces and 0
or more hard returns is treated as a single white space unless it occurs
within an element classed as literal text (like PRE); however, the &nbsp;
(and its numeric equivalent) entity, though displayed as a space, is
treated
for the purposes of text flow as a non-space character, and so spaces
before
and after an &nbsp; count as a space additional to the &nbsp;  .

0 blank spaces and 1 or more hard returns usually behave like a white
space,
except between most element tags (e.g., between a </ td> and a <td>, or a
</p> and a <p>), where they're ignored. A special situation: when images
are
to occur side by side, most browsers require that the close of the first
image tag be immediately followed by the open of the second image tag, like
so

<img src="one.png" /><img src="two.png" />

or so

<img src="one.png" /><img
not so

<img src="one.png" />
<img src="two.png" />

because some browsers will add a white space between the images.  (The
quotation marks around the attribute value and the slash at the end of the
empty elements are XML-valid conventions that as far as I know all browsers
in use accept; see the XHTML 1.0 proposed rec.). I don't know whether that
behavior (the hard return between images = white space behavior) is
accounted in the Recommendation or not; it is analogous to the way a hard
return is treated between text strings; for instance,

Mississi
ppi

is treated as two words, not one. Thus I imagine this behavior has
something
to do with the way image elements are described in the DTD.

P. T. Rourke

----- Original Message -----
From: Martin Brunecky
To: html-tidy@w3.org
Sent: Thursday, January 20, 2000 9:33 AM
Subject: Help on "whitespace"


Hi:
can someone point me to a document (document portion) or an article
which clearly defines the rules for treatment of white-space and
new-line characters within the HTML source ? Including the
rules for using entities such as &nbsp;

All my HTML books seem to be really vague about this subject, and
I am probably stupid 'cause I don't see how to get this info from the
DTDs.
Received on Friday, 24 March 2000 13:15:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:43 GMT