[FS] [XQuery] empty text nodes from Per Bothner on 2004-05-19 (public-qt-comments@w3.org from May 2004)

From: Per Bothner <per@bothner.com>
Date: Wed, 19 May 2004 16:40:12 -0700
To: public-qt-comments@w3.org
Message-ID: <40ABF05C.4040406@bothner.com>

Both the Formal Semantics and the internal XQuery specification seem to
allow empty text nodes, but this conflicts with the Data Model:

   6.7 Text Nodes
   6.7.1 Overview
   Text nodes must satisfy the following constraint:
     1. A text node must not contain the empty string as its content.

The formal semantics uses empty text nodes so it can suppress spaces
between adjacent enclosed expressions:

   4.7.1 Direct Element Constructors
   Normalization
   [ElementContent1 ..., ElementContentn]ElementContent-unit, n > 1
   ==
   fs:item-sequence-to-node-sequence([ ElementContent1 ]ElementContent ,
     text { "" }, ..., text { "" }, [ ElementContentn]ElementContent

   4.7.1.1 Attributes
   Normalization
   [ AttributeValueContent1 ...,
     AttributeValueContentn]AttributeContent-unit, n > 1
   ==
   fs:item-sequence-to-untypedAtomic(
    [AttributeValueContent1]]AttributeContent , text { "" }, ...,
      text {""}, [ AttributeValueContentn]AttributeContent

Note also that the rule for Dynamic Evaluation of Text Node
Constructors does not check for an empty string.  Only the
static typing consider this by returning 'text?'.

Also, the informal XQuery specification has a problem here:

   3.7.3.4 Text Node Constructors
   2. If the result of atomization is an empty sequence, no text node
   is constructed. ...

However, it says nothing about a content expression that consists of
a single empty string.  I suggest removing the above sentence, and
modifying the 3rd rule:
   3. The individual strings resulting from the previous step are merged
   into a single string by concatenating them with a single space
   character between each pair.  If the resulting string is the empty
   string, no text node is constructed (and the expressions returns an
   empty sequence).  Otherwise, the resulting string becomes the content
   of the constructed text node.

Alternatively, change the data model to allow empty text nodes.  I don't
see any particular reason to prohibit them, even if they're not in the
XML infoset.  Just replace the sentence "In addition, document and
element nodes impose the constraint that two consecutive text nodes can
never occur as adjacent siblings" by adding "nor can any of their text
node children have the empty string as the content".

My recommendation:
1. Change the Data Model to allow empty text nodes.
2. Change the Text Node Constructor to allow empty string content.
3. Change element and ocument constructor normalization so that they
remove empty text nodes *and* combine adjacent text nodes.
-- 
	--Per Bothner
per@bothner.com   http://per.bothner.com/

Received on Wednesday, 19 May 2004 19:40:16 UTC