W3C home > Mailing lists > Public > www-dom@w3.org > April to June 1999

Re: RFC: White Space Handling In XML Parsing

From: Arkin <arkin@trendline.co.il>
Date: Thu, 20 May 1999 09:40:31 -0400
Message-ID: <374410CF.6D0AB7C7@trendline.co.il>
CC: www-dom@w3.org
Larry,

Try the following trick with your Web browser:

<b>this is packed</b>
<b>this
is formatted</b>
<b>this    is    excessive</b>

See what happens? They all end up the same. HTML does not preserve
whitespaces by definition expect for the PRE element. By definition (and
I recommend you pay close attention to the specs), whitespaces are not
used for formatting in HTML documents, except for the PRE element.

You pretty much have the same ratio in XML documents:

<book>
  <author>John   lots-of-spaces Doe</author>
     <title>  Whatever
     </title>
  </author>
</book>

Can you spot which whitespaces are redundant?

The problems you are seeing with editors mutilating HTML documents has
to do with improper structure, not whitespace handling. You can try
running their output through Tidy to see how messy it was to begin with.

Arkin


Larry Watanabe wrote:
> 
> I think preservation of whitespace is important. Consider implementing any
> application where
>         (a) user may format XML
>         (b) program may read in the XML file
>         (c) changes may be made to the DOM
>         (d) the DOM is written out
> If the whitespace is not preserved, then the formatting is lost. This is a
> complaint
> against many html editors.
> 
> I think that rather than being in the minority, the majority of XML
> applications
> cannot make the assumption that a) their input was formatted by someone, and
> b) someone (possibly the same person) may want to view the output.
> 
>  Larry Watanabe
>  Jetform Corp
>  lwatanab@jetform.com
> 
> > -----Original Message-----
> > From: Booth, David [SMTP:booth@bluestone.com]
> > Sent: Wednesday, May 19, 1999 6:18 PM
> > To:   www-dom@w3.org
> > Cc:   'Joel A. Nava'; Bickel, Bob; Nigro, Mark
> > Subject:      RE: RFC: White Space Handling In XML Parsing
> >
> > Some general thoughts:
> >
> > 1. It would be really, really nice if the DEFAULT behavior were
> > deterministic, i.e., guaranteed to produce the same DOM tree
> > under different parser implementations.  If one is concerned
> > about parser efficiency (i.e., not unduly constraining parser
> > implementations), then an option could allow the parser to
> > produce its preferred DOM tree for maximal speed.
> > In other words, it would be nice if the specifications supported
> > an application development model of: "FIRST make it correct,
> > THEN make it fast".  This would be best supported by making
> > the default behavior be the most deterministic, and the non-default
> > behavior be the fastest (assuming one must choose between the two
> > objectives).
> >
> > 2. It seems to me that the most appropriate
> > default should be to NOT preserve whitespace, since:
> >
> >       a. This is likely to be the behavior that is
> >       desired by most applications; and
> >
> >       b. The people writing the applications that wish to
> >       PRESERVE whitespace will be a somewhat more select group
> >       of programmers; thus it is more reasonable to require
> >       them to know a little more than others, rather than
> >       the other way around.
> >
> > Personally, I like to think of XML as a very direct
> > representation of an abstract syntax tree.  Under this
> > kind of thinking, syntactic details such as whitespace
> > are irrelevant.  This viewpoint may reflect a bias of my
> > background, but I suspect I am not alone in this view.
> > This is another factor in favor of making the default
> > behavior be to NOT preserve whitespace.  However, I am
> > certainly open to other viewpoints.
> >
> > David Booth
> > Bluestone Software, Inc.      +1 609 727 4600   ext. 1740
> >
> >
> > > -----Original Message-----
> > > From: Joel A. Nava [mailto:jnava@Adobe.COM]
> > > Sent: Tuesday, May 18, 1999 8:19 AM
> > > To: www-dom@w3.org
> > > Cc: www-dom@w3.org
> > > Subject: RE: RFC: White Space Handling In XML Parsing
> > >
> > >
> > > xml:space=default implies that the application should use it's
> > > default whitespace handling mode, which could be what ever the
> > > application has decided it's default mode. If xml:space is not
> > > defined on the root element, then the application has to decide
> > > for itself what it will do. There may be some circumstances where
> > > is default is expressed the application handles things one way,
> > > and if it is not defined, then the application might handle things
> > > a different way. I agree that it is clearly permissible to say
> > > that parsers and application conforming to your RFC will have
> > > the same behavior whether default or undefined.
> > >
> > > --
> > > Joel A. Nava                  (408)536-6209
> > > Adobe Systems, Inc.         jnava@adobe.com
> > >
> > > > -----Original Message-----
> > > > From: Arkin [mailto:arkin@trendline.co.il]
> > > > Sent: Monday, May 17, 1999 6:41 AM
> > > > To: John Cowan
> > > > Cc: Joel A. Nava; www-dom@w3.org
> > > > Subject: Re: RFC: White Space Handling In XML Parsing
> > > >
> > > >
> > > >
> > > >
> > > > John Cowan wrote:
> > > > >
> > > > > Joel A. Nava wrote:
> > > > >
> > > > > > 3) Elements that do not specify a value for the 'xml:space'
> > > > > >    attribute inherit that value from the element in which
> > > > > >    they are contained up to the root element. If the root
> > > > > >    element does not specify a value for the 'xml:space'
> > > > > >    attribute, the value 'default' is assumed.
> > > > > >
> > > > > > [This is what the XML REC requires.]
> > > > >
> > > > > Actually not.  The last paragraph of clause 2.10 says:
> > > > >
> > > > > # The root element of any document is considered to have
> > > > > # no intentions as regards application space handling,
> > > > > # unless it provides a value for this attribute or the
> > > > > # attribute is declared with a default value.
> > > > >
> > > > > So there are really three possible states of "xml:space" as
> > > > inherited:
> > > > > default, preserve, and clueless.
> > > >
> > > > That pretty much depends on how you define "xml:space=default". Is
> > > > 'default' a specific behavior, or is 'default' really "no
> > > > intentions as
> > > > regards application space handling". My understanding of the
> > > > XML REC is
> > > > that there are only two possible states: preserve and default, where
> > > > default and clueless are pretty much the same.
> > > >
> > > > In no place does it really say what 'default' should be,
> > > which is what
> > > > the WS RFC attempts to clarify.
> > > >
> > > > Arkin
> > > >
> > > > >
> > > > > --
> > > > > John Cowan      http://www.ccil.org/~cowan
> > > > cowan@ccil.org
> > > > >         You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
> > > > >         You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
> > > > >                 Clear all so!  'Tis a Jute.... (Finnegans
> > > Wake 16.5)
> > > >
> > > >
> > >
Received on Thursday, 20 May 1999 09:50:40 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 22 June 2012 06:13:46 GMT