Re: XQuery/XPath Data model comments from Marton Nagy on 2001-11-01 (www-xml-query-comments@w3.org from November 2001)

From: Marton Nagy <MARTON.NAGY@saic.com>
Date: Thu, 01 Nov 2001 17:21:52 -0500
To: Jeni Tennison <jeni@jenitennison.com>
CC: XML Query comments <www-xml-query-comments@w3.org>
Message-Id: <3BE1CB00.541D0E50@apo.saic.com>
Hi Jeni,

This is a response to the following message, which you posted to the
XML Query Working Group's comments list:

"XQuery/XPath Data model comments" at 
http://lists.w3.org/Archives/Public/www-xml-query-comments/2001Sep/0012.html


> Hi,
> 
> Here are some comments on the XQuery/XPath data model WD (dated 7th
> June). I hope they're helpful.

Yes indeed!  Thanks for providing them.  I apologize it's taken so long
to respond to them.

> 3.2 Document Order. The second paragraph states that the relative
> order of nodes in different documents is implementation-dependent but
> stable. How 'stable' is 'stable'? Within a single XPath? Within an
> XSLT stylesheet? Within multiple runs of the same stylesheet on the
> same document?

We have reworded that section and in the process removed the word
'stable' which gave rise to possible misinterpretations:

"The relative order of nodes in distinct documents is implementation-
dependent but satisfies the following property:
given two distinct documents A and B,
if a node in document A is before a node in document B,
then every node in document A is before every node in document B."

> 3.4 Schema Components and Values. The third paragraph gives xs:ID and
> xs:IDREF as examples of primitive value types, when actually they are
> derived (from xs:NCName).

Fixed.

> 3.6 Ignoring Comments, Processing Instructions, and Whitespace. The
> definition of insignificant whitespace means that a text node can only
> be classified as whitespace if its parent element has been validated
> according to an XML Schema. This seems to be very limiting; perhaps it
> could be rephrased, possibly something like:
> 
>   1. contains no characters other than white space characters (as
>   defined in XML 1.0), and
>   2. does not have a parent element with a [validity] property with
>   the value 'valid' and a [type definition] property yielding a simple
>   type definition or a complex type definition with a content type of
>   mixed.

I don't think this works.  The only way you can tell whether a text node
comprised only of white space characters is with a schema, DTD, or some
other annotation (<xsl:strip-space/>).  There's no way to infer from the
instance that structured data like:
  <name>
    <first>Jeni</first>
    <last>Tennison</last>
  </name>
should have spaces stripped, but:
  <p><b>Jeni</b> <i>Tennison</i></p>
should not.

There still are issues surrounding whitespace though, and we'll be
further investigating this area.

> 4 Nodes.
> 
> Why don't namespace nodes have parents? It's useful to be able to
> continue to traverse a tree from namespace nodes. For example, in
> stylesheets for browsing XML documents, you can only work out whether
> a namespace needs to be declared by looking at the namespace nodes on
> its ancestors (e.g. ancestor::*/namespace::*[name() = name(current())
> and . = current()]).

In order to have parents, namespace nodes must have unique identity.
This results in either a huge explosion in the number of nodes that must
be stored in the data model, or requires contextual information to be
passed around to simulate unique nodes virtually. Nevertheless, your
expression will continue to work in XPath 2.0; it doesn't depend either
on the identity of namespace nodes nor on their parentage. The namespace
axis, followed from an element, will continue to give all the namespaces
that are in-scope for that element, and this, we believe, accounts for
99% of actual usage.

> Can attributes be roots of trees? In other words, is it possible to
> have a node tree that contains a single attribute node? I don't think
> it's explicitly prohibited by the description here.

The document node cannot have an attribute child. But the model allows
any node to have no parent, and a node that has no parent (except in
the special case of a namespace node) is the root of a tree. In the case
of an attribute, an attribute that has no parent will be the only node
in its tree. (There are some issues we haven't resolved yet about
whether such a tree needs to contain a namespace node to resolve the
prefix of the attribute name.)

Current plans are that XSLT 2.0 will not use this feature of the data
model, in XSLT 2.0 the root of every tree will be a document node, and
every node will belong to such a tree. The facility to have nodes
existing outside the context of a document is to satisfy the semantics
of XQuery.

> The last couple of paragraphs in the introduction to Section 4 are
> confusing because they introduce the concept of an InfoItem object
> type. We were told in Section 1 (Introduction) that there were five
> categories of values - nodes, simple values, sequences, errors and
> schema components. InfoItem objects seem to be another type
> altogether, and it's unclear how they fit in. Can this be elaborated
> earlier in the document?

Yes, I've fixed this.  InfoItem objects are intended solely for purposes
of defining the construction of a data model instance from an XML
document (infoset).  They are not intended to be "visible" outside the
data model specification.

> 4.2 Elements.
> 
> The constructor for the element node probably should include the type
> definition in the constructor as well, for cases where the [type
> definition] of the element information item is not the same as the
> [type definition] of the [element declaration], which can occur if
> xsi:type is used in the document.

I've added this as Issue-0064.  We're currently revisiting
SchemaComponents, to see if simplifications can be made.

> Possibly it will be part of the update to incorporate schema-less and
> DTD valid data, but the declaration and type accessors should probably
> return Sequences that might be empty.

The current plan is to return SchemaComponents representing anyType or
anySimpleType.  In other words, there is no schema-less document, only
documents with a default, highly generic, schema applied.

> I think it might be useful to be able to access the [member type
> definition] property of the PSVI for element information items, to
> know exactly which type the element value is, perhaps as a separate
> accessor:
>
>   member-type : ElementNode -> Sequence(0,1)<SchemaComponent>
> 
> or altering the type accessor to return a sequence in such cases:
> 
>   type : ElementNode -> Sequence(1,2)<SchemaComponent>
> 
> or changing the type accessor to return the [member type definition]
> where appropriate. If included, the constructor should involve the
> member type definition as well.
>
> It's not clear what happens with nil elements? Is something special
> done with the type to indicate that they have a nil value or
> something?

The presence of xsi:nil="true" means that the element is empty; 
nothing special needs to be done with the type (but the element
needs to be nillable).
The typed-value() function works as follows on an empty element:
If the type of the element is string or derived from string, 
and the element does not have xsi:nil="true", 
then typed-value() returns the zero-length string.
Otherwise typed-value() returns the empty sequence.

> 4.3 Attributes.
> 
> As with the element nodes, it would be useful to access the [member
> type definition] properties of the attribute information items as well
> as their type definitions. The constructor doesn't need to incorporate
> the type definition, since that cannot be set through xsi:type, but it
> would have to include the member type definition.

Issue-0064 again.

> 4.4 Namespaces. There's a typo: "The accessors name, node-kind and
> string-value also apply to comment nodes." should read "The accessors
> name, node-kind and string-value also apply to namespace nodes."

Fixed.

> 4.7 References. It's unclear how reference nodes fit into the data
> model, or what their purpose is. As far as I can tell, document nodes
> and element nodes cannot have reference nodes as children, so I
> suspect that the parent accessor applied to a reference node will
> result in an empty sequence? In which case that should be indicated
> with:
> 
>   parent(ReferenceNode) : Sequence(0,0)<ElementNode | DocumentNode>
> 
> Some informative examples of the kind of thing that *might* be
> returned by an implementation accessing the string value of a
> reference node would be helpful.

Reference nodes have been removed from the data model.

> 5.1 Primitive Values.
> 
> There's a typo in the first paragraph, which contains 'xs:hexbinary'
> rather than 'xs:hexBinary'.

Fixed

> I think that the id accessor needs to be altered to include a document
> context, since a single IDREF value might access different element
> nodes in different documents. I think this needs to be a function
> rather than an accessor, or something?

Indeed, we recently moved it into the Functions and Operators document
as a function.

> Similarly, I think that the referent accessor of xs:anyURI requires
> some extra contextual information in case it's a relative URI rather
> than an absolute URI. Plus, given that the URI could have an XPointer
> fragment (I imagine), then shouldn't it return a sequence consisting
> of any number of any type of nodes, for full flexibility?

Also removed to F&O.

> 5.2 Derived Simple Values. The way that this section uses the term
> 'primitive' is confusing. Is the intention that the only value types
> that are supported within the data model are the primitive types from
> XML Schema? 

All values (but not all types) must be one of the primitive types from
XML Schema.

> If so, what's the purpose of separate constructors for all
> the built-in data types, as given in the F&O WD, as xf:short() will
> give exactly the same type of value as xf:decimal()? 

The value xf:short(1) and xf:decimal(1) are the same.  The types are
different.

> If that's not the
> intention, could you use something else instead of the term 'primitive
> value' within this section?

I think we correctly use the terms primitive values and primitive types,
and although the difference is subtle, I don't quite know how to improve
the text.  I'd welcome suggestions.

> 9 Equality. I wonder whether it would also be useful to define
> a string-value-equal function that could be used to test the equality
> of the string values of two nodes.

That's a Functions and Operators question, but my feeling is that we
should have a clear use case before we bloat the F&O document - right
now we're trying to trim it down to a manageable size.

> 10 Example.
> 
> The example document isn't valid according to the namespace Rec
> because there's no namespace declaration for the 'xs' prefix used for
> xs:schemaLocation. It's also not well-formed because there aren't any
> quotes around the version number in the XML declaration. I think nyou
> want:
> 
> <?xml version="1.0"?>
> <p:part xmlns:p="http://www.mywebsite.com/PartSchema"
>         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>         xsi:schemaLocation = "http://www.mywebsite.com/PartSchema
>                               http://www.mywebsite.com/PartSchema"
>         name="nutbolt">
>   <mfg>Acme</mfg>
>   <price>10.50</price>
> </p:part>

Fixed.

> I don't think that the schema is valid. The namespace declaration uses
> the wrong namespace; it's using an old namespace anyway; and it uses
> both a type attribute and the content of an xs:element to indicate the
> type, which is not legal. I think you want either:
> 
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
>            targetNamespace="http://www.mywebsite.com/PartSchema">
>   <xs:element name="part">
>     <xs:complexType>
>       <xs:element name = "mfg" type="xs:string"/>
>       <xs:element name = "price" type="xs:decimal"/>
>       <xs:attribute name = "name" type="xs:string"/>
>     </xs:complexType>
>   </xs:element>
> </xs:schema>
> 
> or:
> 
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
>            targetNamespace="http://www.mywebsite.com/PartSchema"
>            xmlns="http://www.mywebsite.com/PartSchema">
>   <xs:element name="part" type="part-type" />
>   <xs:complexType name="part-type">
>     <xs:element name = "mfg" type="xs:string"/>
>     <xs:element name = "price" type="xs:decimal"/>
>     <xs:attribute name = "name" type="xs:string"/>
>   </xs:complexType>
> </xs:schema>

I think the latter is correct, so I'll use that.

> Cheers,
> 
> Jeni


We appreciate your feedback on the XML Query specifications. Please let
us know if this response is satisfactory. If not, please respond to this 
message, explaining your concerns.

Jonathan Marsh and Marton Nagy
On behalf of the XML Query Working Group
Received on Thursday, 1 November 2001 17:13:53 UTC