RE: Comments on the XPath data model, from a DOM perspective. from Paul Cotton on 2002-05-01 (www-xml-query-comments@w3.org from May 2002)

From: Paul Cotton <pcotton@microsoft.com>
Date: Wed, 1 May 2002 04:21:46 -0400
To: "Ray Whitmer" <rayw@netscape.com>
Cc: <www-xml-query-comments@w3.org>, <mhkay@iclway.co.uk>
Message-ID: <E7AC4500EAB7A442ABA7521D18814397032F948F@tor-msg-01.northamerica.corp.microsoft>
> I don't know how this posting wound up here

I am sorry Ray but you originally posted your comments to this list not
the xpath comments list.  See [1].

>Or am I expected to subscribe to this comments list?

I apologize that Michael Kay did not copy you on his reply.  It is
better form for responders to copy the original correspondent and I try
to get responders from the XML Query WG to do this but this one slipped
thru.

/paulc

[1] ]
http://lists.w3.org/Archives/Public/www-xml-query-comments/2002Apr/0000.
html 

Paul Cotton, Microsoft Canada 
17 Eleanor Drive, Nepean, Ontario K2E 6A3 
Tel: (613) 225-5445 Fax: (425) 936-7329 
<mailto:pcotton@microsoft.com> 


> -----Original Message-----
> From: Ray Whitmer [mailto:rayw@netscape.com]
> Sent: Monday, April 29, 2002 4:43 PM
> To: www-xml-query-comments@w3.org
> Subject: Re: Comments on the XPath data model, from a DOM perspective.
> 
> Sorry, I don't know how this posting wound up here, when I thought I
> posted
> to www-xpath-comments.  Somehow I expected to be copied on a response.
Or
> am
> I expected to subscribe to this comments list?
> 
> >Sections marked ">" are from RayWhitmer:
> >
> >>* It seems clear that the XPath 2.0 specification has no type
comparable
> to
> >>the node set or other built-in types of XPath 1.0.  The concept of a
> >>typeless sequence does not seem to work as effectively.  In many
> languages,
> >arrays of
> >>objects are typed.
> >
> >In the published December drafts, the type system is not very well
> >developed. A lot of work has been done on this in the last few
months,
> some
> >of which is visible in the recent Formal Semantics draft. It has
always
> been
> >intended that XQuery should offer strong typing. In practice it will
> usually
> >be possible to detect statically that a sequence is of a particular
type,
> >e.g. a sequence of nodes or a sequence of integers, though arbitrary
> >heteregeneous sequences are permitted as the most general case.
> >
> We are more worried about the common XPath 1.0 case of a set of nodes,
> which appears to require an incompatible degradation of the API to
support
> XPath 2.0.  It may be acceptable in Lisp to do this where there is no
> typing
> and which we get the idea must have heavily influenced XPath 2.0
because
> of the choices it makes, but in other languages lists have types and
are
> not equally useful if the typing is disabled as is done for sequences
in
> XPath 2.0.
> 
> This answer does not seem to answer the question.  An API can claim to
> never break anyone by just using the most abstract object type
everywhere,
> but that is simply not useful, which is why most programming languages
> use types, and why a node list is more useful than a list.  There are
many
> things, including ordering, that apply to nodes that do not apply to
> untyped
> objects.  Just saying the new spec uses untyped everywhere does not
solve
> compatibility with the old.
> 
> >>* XPath 1.0 was based on explicitly unordered sets of nodes that
could
> be
> >>accessed in order.  XPath 2.0 claims that every sequence is ordered,
but
> >>there is not sufficient discussion of what that means, which has
caused
> >>significant confusion.  The logical conclusion could be drawn that
it is
> >>referring to document order, which is the only order it seems to
define
> >>and was the order of XPath 1.0, but this makes no sense when
considering
> >>non-node items now possible in the result sets.  Also, the
incompatible
> >>treatment of duplicates is confusing, if the sets are now ordered,
> rather
> >>than unordered, it seems pointless to not eliminate the duplicates,
but
> >>there is probably something lost between the different versions of
the
> >>specification.
> >
> >Essentially, those expressions which in XPath 1.0 returned a
"node-set"
> have
> >been redefined in XPath 2.0 to return an "ordered sequence of nodes
in
> >document order without duplicates". Since there is a one-to-one
> >correspondence between unordered node-sets and ordered node-sequences
in
> >document order, compatibility is preserved. However, XPath 2.0 can
also
> >return sequences in an order other than document order (important
when
> the
> >user of a Query wants to specify an application-oriented ordering of
the
> >results).
> >
> I thought that these, and all, return a sequence, not of nodes, but
> untyped
> objects.  While the writer of the expression may believe that the
return
> only contains nodes, that does not help at all in a formal type
system,
> and
> it confuses greatly the concept of ordering.
> 
> This is not compatible at all, unless Lisp is your language and you
always
> disregarded types anyway.
> 
> >Basically, a sequence can contain items in any order. The order of
the
> >result is determined by the semantics of the expression that created
the
> >sequence. Path expressions produce results in document order, but
other
> >expressions may produce results in a different order.
> >
> But in XPath 1.0 a node set could always be accessed in document order
and
> with guaranteed uniqueness of results.  In XPath 2.0, document order
makes
> less sense, because your items may noteven be nodes.  This seems to
> require
> different semantics than accessingthe items of a result in document
order.
> 
> >>Based upon recent discussions, it seems that the XPath 2.0
specification
> >>may not be comparable or compatible with the XPath 1.0 specification
in
> its
> >>use of these terms, but the specification needs better treatment of
the
> >>concepts, and explanation of the impact on backwards compatibility.
> >>Elimination of duplicates also seems like a significant
compatibility
> >>problem since 1.0 implementations went to great lengths to
accomplish
> >>this.
> >
> >We think we have solved all the important backwards compatibility
issues,
> >but you are right that there is a significant change in terminology
and
> that
> >we could do a lot more to explain the relationship between XPath 2.0
> terms
> >and their XPath 1.0 equivalents.
> >
> And I am still looking for evidence of a solution to the compatibility
> issues
> that were raised at the beginning such as the ordering, typing, and
> returns
> which seem to be incompatible, except for Lisp programmers in some
cases,
> let alone all the compatibility issues with the extended Lisp DOM APIs
> being
> created by the XPath group.
> 
> >>* The copy semantics of node constructors seems wrong even if it was
the
> >>only way to model the lisp semantics that the authors of XPath 2.0
seem
> >>to be using throughout the specification.  It would seem that a
> constructed
> >>node should not lose its identity when inserted into a hierarchy,
but
> >>XPath 2.0 seems to mandate that.
> >
> >In XSLT, we never make a node available for manipulation until it is
> >inserted into its hierarchy, so this problem does not appear. It is
> >potentially a problem for XQuery, where I think the semantics of
element
> >construction still require some further work. The reason it is
specified
> the
> >way it is, I think, is to ensure that nodes are immutable: you can't
have
> >the parent() accessor on the same node giving different results at
> different
> >times.
> >
> Then how are the constructor arguments passed if there is no reference
to
> them?
> 
> I think there is a reference to them before they are passed to the
> constructor,
> so the id of the copy will be different from the id of the passed
object.
> 
> >The model on namespace nodes is certainly broken in the current
draft. We
> >are still debating how best to fix it. We know that we want to relax
the
> >XPath 1.0 rules to allow namespace nodes to be shared between
elements,
> and
> >we know this has inevitable side-effects on the parentage and
ordering of
> >namespace nodes. But we haven't yet decided exactly what the new
rules
> >should be. All the proposals currently on the table still have
namespace
> >nodes belonging exclusively to a single document.
> >
> If they belong to a document, then you will have to add an
ownerDocument
> attribute, which the infoset does not have, to allow that ordering and
> identity
> checking to occur.
> 
> It is hard without resolution on the issue.
> 
> >>Requiring document order between
> >>documents to be stable requires much better document identification
than
> >>we have today, because if a document is persisted and brought back
into
> >>memory, which can happen at any time during processing, you need to
> >>be able to go back to something to reestablish the sort in the same
way.
> >
> >The stability of ordering across documents is only required within
the
> scope
> >of a single query or transformation (though I don't know if we
currently
> say
> >this very well). Given that document node identity must also be
stable
> >within this scope, I don't think it's difficult to devise
implementation
> >strategies that work, e.g. basing document order on the order of the
> >internal identifiers of the document nodes.
> >
> If you make the requirement of adding internal identifiers to the DOM
> implementation.  In a Java implementation, for example, there is no id
> available that is guaranteed to be unique for any object.
> 
> And while you may be able to wave away the issue of lifetimes, those
> working with a model such as DOM may not be able to.
> 
> >>* The model claims: "The data model does not support XML documents
that
> are
> >>not supported by the XML Information Set, for example,
non-well-formed
> >>documents and documents that don't conform to XML Namespaces."  But
the
> >>constructors seem perfectly able to construct objects which are not
> well-
> >>formed, for example, by putting "--" into the text of a comment node
or
> >>other illegal characters generally anywhere.
> >
> >I suspect you are right: there are probably quite a few error
conditions
> >that still need to be documented. The intention is to disallow
operations
> >that create an inconsistent structure, e.g. multiple attributes with
the
> >same name.
> >
> But what if these conditions do not match between DOM and XPath and
you
> then
> try to build XPath on top of DOM?
> 
> >>* The model appears to make it possible to construct text nodes that
> have
> >>empty strings, elements with multiple ajacent text nodes, and other
non-
> >>normalized result trees.
> >
> >Same comment applies.
> >
> But what if these conditions do not match between DOM and XPath and
you
> then
> try to build XPath on top of DOM?
> 
> >At present we have a set of rules for this in the XSLT specification,
and
> we
> >have a documented issue that we would like to move these rules into
the
> Data
> >Model instead. The XSLT rules go under the name of "namespace fixup",
and
> >are described essentially as a set of rules to be followed on element
> >construction to make sure that a valid infoset results.
> >
> If it were to rely on a fixup, then why pass namespace nodes to the
> element
> constructor at all?  Also, how do copy semantics work with the
namespace
> nodes
> if there is only one per document of a particular type?
> 
> Also, there is likely to be confusion with the DOM notion of namespace
> fixup,
> which is apparently not very similar in what it will fix and what it
will
> not.
> 
> When reading these sections, I have a lot of questions created by the
> over-
> simple description, naturally because you are redefining a document
object
> model.  I guess I just need to create a much longer issue list.
> 
> >>I might suggest that you thoroughly study
> >>the DOM specification and you will find many more border cases you
have
> >>missed.  Construction of a hierarchy using an API is the same
problem
> that
> >>DOM solves.
> >
> >I would hope that our problem is simpler, because the set of update
> >operations is much smaller. But I fear you may have put your finger
on a
> >problem, namely that the set of operations provided by the data model
> >actually permits sequences of operations that neither XSLT nor XQuery
> >intends to use, and we need to either explicitly disallow such
sequences
> of
> >operations, or define their effect precisely. Personally, I've never
been
> >all that happy with the construction side of the data model, because
it
> has
> >a very procedural feel to it, which seems wrong as it is designed to
> >underpin a declarative language. XPath 1.0 got round this, of course,
by
> not
> >describing data model construction at all, describing only the valid
> states
> >of the model.
> >
> I doubt that the constructors are simpler.  XPath constructors seem
quite
> a bit
> more complex due to copy constraints.  You require lots of arguments,
> copying,
> etc. and so XPath has lots of failures that in DOM occur later during
> manipulation because it does all of its construction through
arguments.
> 
> You are recreating DOM in many ways, but incompatibly.  In many cases,
DOM
> has
> solved the issues and XPath 2.0 has not. NIH should have no place at
W3C.
> We
> need resolution of the many issues now, as compatibly as possible with
> DOM,
> or they will be issues for last call and beyond.
> 
> Ray Whitmer
> rayw@netscape.com
>
Received on Wednesday, 1 May 2002 04:21:50 UTC