RE: xml:space

> How do you decide that it only consists of tags (element
> content), if it
> *does* include whitespace?

If an element is declared to contain PCDATA, like the depth element in
12.1.1 in RFC 2518, then whitespace is important. If an element is declared
to contain other elements, e.g. activelock in 12.1 of RFC2518, then
whitespace around the contained elements is removable according to the rules
of XML parsing.

If an element is not recognized, typically it is ignored.  However, WebDAV
properties may be retrieved which are not recognized elements, it is not
known whether they are supposed to contain elements or text, and yet it is
still desirable to display to the user.  In this case, a couple heuristics
may help:
 - if the value contains unescaped < after some white space, it probably
contains XML elements, and the whitespace before the < and after the end of
the value can be removed.  However you must recurse and make the same
evaluation before removing internal whitespace.
 - if the value begins with "<![CDATA[", the whitespace before it can be
removed.
 - Otherwise white space is crucial.

E.g.
<foo1>  <whitespace-can-be-removed/>  </foo1>
<foo2>  <![CDATA[Whitespace inside CDATA is important, but outside isn't]]>
</foo2>
<foo3>  All this whitespace is important.  </foo3>

Your XML parser should be able to do this for you.  E.g. a DOM parser will
return a regular element as the child of foo1, automatically stripping
whitespace, and a text thing as the child of foo3 with whitespace preserved.

> Alternatively, we could also say that ignorable WS never
> should be sent,
> which would make the spec much simpler.

I don't think so. That would make requests/responses hard to debug. In
practice, it's very helpful to have line returns and tabs to make XML easier
to read.  It would also be adding rules on top of XML, to little purpose.
Finally, it would be very difficult to put examples in specification text
that conformed to this rule.

> Almost. I think the easist solution is to say that whitespace
> *always* is
> signifcant (== ignorable whitespace SHOULD NOT be sent), and
> then possibly
> allow some workarounds for old known DAV: properties if we
> need to avoid
> breaking existing code.

No, we should follow the XML rules for where whitespace is significant.
According to those rules, whitespace is significant inside our date and
content length (integer) properties, inside depth elements, etc.

Lisa

Received on Wednesday, 29 May 2002 13:56:56 UTC