W3C home > Mailing lists > Public > www-dom@w3.org > April to June 1999

Re: RFC: White Space Handling In XML Parsing

From: Arkin <arkin@trendline.co.il>
Date: Thu, 20 May 1999 10:47:01 -0400
Message-ID: <37442065.70AD0189@trendline.co.il>
To: Larry Watanabe <LWatanab@JetForm.com>, www-dom@w3.org
Larry Watanabe wrote:
> 
> I understand that HTML does not preserve whitespace, but that does
> not mean that DOM should not preserve whitespace. Although the whitespace
> will
> not be used to format the document, it is desirable to retain the formatting
> in the
> original XML document.

Why? What purpose does this formatting serve in 90% of the cases? (Keep
in mind that white space handling is a default behavior, it can be
turned off; e.g. XSL requires to turn it off)

If you want to print out a formatted, easy to read, nested with line
wrapping, image of the DOM structure (which I guess you do), you can use
OpenXML to do that. But does it make sense to store all that excessive
spaces in memory and have the application try to avoid them as it
processes the document, just to implement this feature?

It is very clear that the input document might include redundant spaces
just for the sake of formatting the document. It is very clear that the
output document should include them for the sake of readability. It is
very clear that the document tree does not need them for any use, and
that the application should not be expected to go around them. So the
whole issue is how to define a clear path between source to DOM back to
source, that will allow us to do all that.

Arkin


> -Larry
> 
> > -----Original Message-----
> > From: Arkin [SMTP:arkin@trendline.co.il]
> > Sent: Thursday, May 20, 1999 9:41 AM
> > To:   www-dom@w3.org
> > Cc:   www-dom@w3.org
> > Subject:      Re: RFC: White Space Handling In XML Parsing
> >
> > Larry,
> >
> > Try the following trick with your Web browser:
> >
> > <b>this is packed</b>
> > <b>this
> > is formatted</b>
> > <b>this    is    excessive</b>
> >
> > See what happens? They all end up the same. HTML does not preserve
> > whitespaces by definition expect for the PRE element. By definition (and
> > I recommend you pay close attention to the specs), whitespaces are not
> > used for formatting in HTML documents, except for the PRE element.
> >
> > You pretty much have the same ratio in XML documents:
> >
> > <book>
> >   <author>John   lots-of-spaces Doe</author>
> >      <title>  Whatever
> >      </title>
> >   </author>
> > </book>
> >
> > Can you spot which whitespaces are redundant?
> >
> > The problems you are seeing with editors mutilating HTML documents has
> > to do with improper structure, not whitespace handling. You can try
> > running their output through Tidy to see how messy it was to begin with.
> >
> > Arkin
> >
> >
> > Larry Watanabe wrote:
> > >
> > > I think preservation of whitespace is important. Consider implementing
> > any
> > > application where
> > >         (a) user may format XML
> > >         (b) program may read in the XML file
> > >         (c) changes may be made to the DOM
> > >         (d) the DOM is written out
> > > If the whitespace is not preserved, then the formatting is lost. This is
> > a
> > > complaint
> > > against many html editors.
> > >
> > > I think that rather than being in the minority, the majority of XML
> > > applications
> > > cannot make the assumption that a) their input was formatted by someone,
> > and
> > > b) someone (possibly the same person) may want to view the output.
> > >
> > >  Larry Watanabe
> > >  Jetform Corp
> > >  lwatanab@jetform.com
> > >
> > > > -----Original Message-----
> > > > From: Booth, David [SMTP:booth@bluestone.com]
> > > > Sent: Wednesday, May 19, 1999 6:18 PM
> > > > To:   www-dom@w3.org
> > > > Cc:   'Joel A. Nava'; Bickel, Bob; Nigro, Mark
> > > > Subject:      RE: RFC: White Space Handling In XML Parsing
> > > >
> > > > Some general thoughts:
> > > >
> > > > 1. It would be really, really nice if the DEFAULT behavior were
> > > > deterministic, i.e., guaranteed to produce the same DOM tree
> > > > under different parser implementations.  If one is concerned
> > > > about parser efficiency (i.e., not unduly constraining parser
> > > > implementations), then an option could allow the parser to
> > > > produce its preferred DOM tree for maximal speed.
> > > > In other words, it would be nice if the specifications supported
> > > > an application development model of: "FIRST make it correct,
> > > > THEN make it fast".  This would be best supported by making
> > > > the default behavior be the most deterministic, and the non-default
> > > > behavior be the fastest (assuming one must choose between the two
> > > > objectives).
> > > >
> > > > 2. It seems to me that the most appropriate
> > > > default should be to NOT preserve whitespace, since:
> > > >
> > > >       a. This is likely to be the behavior that is
> > > >       desired by most applications; and
> > > >
> > > >       b. The people writing the applications that wish to
> > > >       PRESERVE whitespace will be a somewhat more select group
> > > >       of programmers; thus it is more reasonable to require
> > > >       them to know a little more than others, rather than
> > > >       the other way around.
> > > >
> > > > Personally, I like to think of XML as a very direct
> > > > representation of an abstract syntax tree.  Under this
> > > > kind of thinking, syntactic details such as whitespace
> > > > are irrelevant.  This viewpoint may reflect a bias of my
> > > > background, but I suspect I am not alone in this view.
> > > > This is another factor in favor of making the default
> > > > behavior be to NOT preserve whitespace.  However, I am
> > > > certainly open to other viewpoints.
> > > >
> > > > David Booth
> > > > Bluestone Software, Inc.      +1 609 727 4600   ext. 1740
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Joel A. Nava [mailto:jnava@Adobe.COM]
> > > > > Sent: Tuesday, May 18, 1999 8:19 AM
> > > > > To: www-dom@w3.org
> > > > > Cc: www-dom@w3.org
> > > > > Subject: RE: RFC: White Space Handling In XML Parsing
> > > > >
> > > > >
> > > > > xml:space=default implies that the application should use it's
> > > > > default whitespace handling mode, which could be what ever the
> > > > > application has decided it's default mode. If xml:space is not
> > > > > defined on the root element, then the application has to decide
> > > > > for itself what it will do. There may be some circumstances where
> > > > > is default is expressed the application handles things one way,
> > > > > and if it is not defined, then the application might handle things
> > > > > a different way. I agree that it is clearly permissible to say
> > > > > that parsers and application conforming to your RFC will have
> > > > > the same behavior whether default or undefined.
> > > > >
> > > > > --
> > > > > Joel A. Nava                  (408)536-6209
> > > > > Adobe Systems, Inc.         jnava@adobe.com
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Arkin [mailto:arkin@trendline.co.il]
> > > > > > Sent: Monday, May 17, 1999 6:41 AM
> > > > > > To: John Cowan
> > > > > > Cc: Joel A. Nava; www-dom@w3.org
> > > > > > Subject: Re: RFC: White Space Handling In XML Parsing
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > John Cowan wrote:
> > > > > > >
> > > > > > > Joel A. Nava wrote:
> > > > > > >
> > > > > > > > 3) Elements that do not specify a value for the 'xml:space'
> > > > > > > >    attribute inherit that value from the element in which
> > > > > > > >    they are contained up to the root element. If the root
> > > > > > > >    element does not specify a value for the 'xml:space'
> > > > > > > >    attribute, the value 'default' is assumed.
> > > > > > > >
> > > > > > > > [This is what the XML REC requires.]
> > > > > > >
> > > > > > > Actually not.  The last paragraph of clause 2.10 says:
> > > > > > >
> > > > > > > # The root element of any document is considered to have
> > > > > > > # no intentions as regards application space handling,
> > > > > > > # unless it provides a value for this attribute or the
> > > > > > > # attribute is declared with a default value.
> > > > > > >
> > > > > > > So there are really three possible states of "xml:space" as
> > > > > > inherited:
> > > > > > > default, preserve, and clueless.
> > > > > >
> > > > > > That pretty much depends on how you define "xml:space=default". Is
> > > > > > 'default' a specific behavior, or is 'default' really "no
> > > > > > intentions as
> > > > > > regards application space handling". My understanding of the
> > > > > > XML REC is
> > > > > > that there are only two possible states: preserve and default,
> > where
> > > > > > default and clueless are pretty much the same.
> > > > > >
> > > > > > In no place does it really say what 'default' should be,
> > > > > which is what
> > > > > > the WS RFC attempts to clarify.
> > > > > >
> > > > > > Arkin
> > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > John Cowan      http://www.ccil.org/~cowan
> > > > > > cowan@ccil.org
> > > > > > >         You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
> > > > > > >         You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
> > > > > > >                 Clear all so!  'Tis a Jute.... (Finnegans
> > > > > Wake 16.5)
> > > > > >
> > > > > >
> > > > >
Received on Thursday, 20 May 1999 10:57:18 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 22 June 2012 06:13:46 GMT