W3C home > Mailing lists > Public > public-html-comments@w3.org > April 2008

HTML 4.01 and 5.0: HTML does not process white spaces

From: Kenneth Kin Lum <kenneth.kin.lum@gmail.com>
Date: Wed, 23 Apr 2008 14:36:03 -0700
Message-ID: <d8b3b8e80804231436s26048139p55a1c2d2af326788@mail.gmail.com>
To: "Anne van Kesteren" <annevk@opera.com>, public-html-comments@w3.org, "Dave Raggett" <dsr@w3.org>, "Ian Hickson" <ian@hixie.ch>
Is the following true and accurate?   If it is, I hope something like it can
be in the next HTML spec?

HTML WILL NOT and shall not process anything related to white spaces.  ALL
white spaces will be unprocessed and go into the DOM tree.  It really is the
CSS layer that will decide what will be done to those white spaces.   For
example, setting "white-space:pre" will cause all the spaces to show "as-is"
in the original document.

So for example,
   <div>    hello world
      <div>       and again

      </div>
  </div>

all those white spaces (before the <div> or after the <div>) will go into
the DOM tree, but just that they are not rendered by virtually all CSS
engines.  When white-space:pre is specified to the body or any ancestor of
those div using javascript dynamically, then all the white-spaces will be
displayed.

This is verified on IE 7, Firefox 2.0.0.14, Safari 3.1, in which the DOM
tree actually shows those space, and using

document.getElementById("startdiv").style.whiteSpace = "pre"

for the HTML code below will make the white space appear again.   An
exception is that IE7 doesn't show the white space in the DOM tree dump, but
when the style.whiteSpace is changed to "pre", then the spaces will show for

   <div id="startdiv">            starting div
      <div>    another div
      </div>
   </div>

with the browser displaying

starting div another div
(the line document.getElementById("startdiv").style.whiteSpace = "pre"  is
used after the document has loaded.)





On Mon, Mar 31, 2008 at 8:44 AM, Anne van Kesteren <annevk@opera.com> wrote:

> On Sun, 30 Mar 2008 12:57:48 -0700, Kenneth Kin Lum <
> kenneth.kin.lum@gmail.com> wrote:
>
> > It does feel a little different from the idea that HTML is the "content"
> > and CSS is the "presentation", because if CSS decides whether the white
> > spaces that an author put in the HTML file get rendered or not, then it
> > seems like the CSS is deciding on what the content is too, as whitespace
> > characters can also be considered part of the content.
> >
>
> CSS can also decide that an entire element is not rendered using
> display:none.
>
>
>  More or less, yes. You'd have to read the parsing algorithm in HTML5 to
> > > get the exact details.
> > >
> >
> > I think one thing is that in the HTML 4.01 spec, it seem to hint at how
> > white space can be processed, such as collapsing the white spaces or how
> > it is handled in different languages.  So it may lead to readers thinking
> > that white space is first processed in the HTML layer to decide
> > whether whitespaces get stored in the parsed result, even before the CSS
> > layer can touch it.   Could the HTML spec state that white space
> > processing
> > is not done at all in the HTML layer, that all white spaces is retained
> > in the parsed result (in the DOM tree?).
> >
>
> This is already done in the "Parsing HTML Documents" section.
>
>
>
> --
> Anne van Kesteren
> <http://annevankesteren.nl/>
> <http://www.opera.com/>
>
Received on Wednesday, 23 April 2008 21:36:40 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 1 June 2011 00:13:58 GMT