RE: xml:space from Julian Reschke on 2002-05-29 (w3c-dist-auth@w3.org from April to June 2002)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Wed, 29 May 2002 22:55:13 +0200
To: "Lisa Dusseault" <ldusseault@xythos.com>, "'Julian Reschke'" <julian.reschke@gmx.de>, "'Jason Crawford'" <ccjason@us.ibm.com>, "'Babich, Alan'" <ABabich@filenet.com>
Cc: <w3c-dist-auth@w3c.org>
Message-ID: <JIEGINCHMLABHJBIGKBCGENIEKAA.julian.reschke@gmx.de>

Lisa,

I don't agree at all.

First of all, whitespace is only available if a DTD tells you that this is
the case. If you don't know an element, there's no reliable way to tell
whether it's ignorable or not.

Whether CDATA is present or not isn't relevant at all. It's just a markup
construct that allows the (human) author of an XML file not to escape the
special characters "<", ">", "&" and the quote characters.

In particular:

> <foo1>  <whitespace-can-be-removed/>  </foo1>
> <foo2>  <![CDATA[Whitespace inside CDATA is important, but
> outside isn't]]>
> </foo2>
> <foo3>  All this whitespace is important.  </foo3>

In each case I can construct a case where the opposite is true.

Finally,

> No, we should follow the XML rules for where whitespace is significant.
> According to those rules, whitespace is significant inside our date and
> content length (integer) properties, inside depth elements, etc.

Yes we should to that. And in XML, you can't ignore whitespace unless you
have a validated file and a DTD which tells you where whitespace is
ignorable.

> -----Original Message-----
> From: w3c-dist-auth-request@w3.org
> [mailto:w3c-dist-auth-request@w3.org]On Behalf Of Lisa Dusseault
> Sent: Wednesday, May 29, 2002 7:57 PM
> To: 'Julian Reschke'; 'Jason Crawford'; 'Babich, Alan'
> Cc: w3c-dist-auth@w3c.org
> Subject: RE: xml:space
>
>
>
> > How do you decide that it only consists of tags (element
> > content), if it
> > *does* include whitespace?
>
> If an element is declared to contain PCDATA, like the depth element in
> 12.1.1 in RFC 2518, then whitespace is important. If an element
> is declared
> to contain other elements, e.g. activelock in 12.1 of RFC2518, then
> whitespace around the contained elements is removable according
> to the rules
> of XML parsing.
>
> If an element is not recognized, typically it is ignored.  However, WebDAV
> properties may be retrieved which are not recognized elements, it is not
> known whether they are supposed to contain elements or text, and yet it is
> still desirable to display to the user.  In this case, a couple heuristics
> may help:
>  - if the value contains unescaped < after some white space, it probably
> contains XML elements, and the whitespace before the < and after
> the end of
> the value can be removed.  However you must recurse and make the same
> evaluation before removing internal whitespace.
>  - if the value begins with "<![CDATA[", the whitespace before it can be
> removed.
>  - Otherwise white space is crucial.
>
> E.g.
> <foo1>  <whitespace-can-be-removed/>  </foo1>
> <foo2>  <![CDATA[Whitespace inside CDATA is important, but
> outside isn't]]>
> </foo2>
> <foo3>  All this whitespace is important.  </foo3>
>
> Your XML parser should be able to do this for you.  E.g. a DOM parser will
> return a regular element as the child of foo1, automatically stripping
> whitespace, and a text thing as the child of foo3 with whitespace
> preserved.
>
> > Alternatively, we could also say that ignorable WS never
> > should be sent,
> > which would make the spec much simpler.
>
> I don't think so. That would make requests/responses hard to debug. In
> practice, it's very helpful to have line returns and tabs to make
> XML easier
> to read.  It would also be adding rules on top of XML, to little purpose.
> Finally, it would be very difficult to put examples in specification text
> that conformed to this rule.
>
> > Almost. I think the easist solution is to say that whitespace
> > *always* is
> > signifcant (== ignorable whitespace SHOULD NOT be sent), and
> > then possibly
> > allow some workarounds for old known DAV: properties if we
> > need to avoid
> > breaking existing code.
>
> No, we should follow the XML rules for where whitespace is significant.
> According to those rules, whitespace is significant inside our date and
> content length (integer) properties, inside depth elements, etc.
>
> Lisa
>

Received on Wednesday, 29 May 2002 16:56:23 UTC