Re: XPath 1.0 change proposal from C. M. Sperberg-McQueen on 2013-03-14 (www-xpath-comments@w3.org from January to March 2013)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Thu, 14 Mar 2013 09:34:44 -0600
To: James Clark <jjc@jclark.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, www-xpath-comments@w3.org
Message-Id: <E760CDF1-FB38-4AC4-9D8E-6E65E6730B24@blackmesatech.com>
Thank you, James.  I agree with some of what you say, and disagree with some.

On Mar 14, 2013, at 7:52 AM, James Clark wrote:

> 
> 
> Although I appreciate Michael's work on formalizing the XPath 1.0 data model, I do not think that at this stage a major rewrite of the XPath 1.0 data model is a good idea.  

I agree.  

That's why I proposed small fixes to repair errors in the definition of the data model, and
not a major rewrite.  

> I would suggest that, after nearly 14 years, an extremely conservative policy should be adopted towards changes: changes should be made only when there is a genuine error that is manifested in discrepancies between implementations or inconsistencies between implementations and the spec.

The nature of the errors in the definition of the data model is that they amount
to discrepancies within the spec.  The rest of XPath 1.0 assumes that every 
instance of the data model, as defined in section 5, will have certain properties.
It is the job of section 5 to ensure that that is so, and in the spec as written 
the properties in question are not in fact guaranteed.  

These discrepancies are unlikely to show up in implementations of XPath 1.0
as a whole, since implementors are likely to be guided by the assumptions
manifest elsewhere in the spec more than by the details of the data model
definition.  They will, however, show up in any attempt to implement, or formalize,
the XPath 1.0 data model by itself.  That is how I became aware of the errors
in the first place.

> The change proposal claims that it was a goal of XPath 1.0 for the data model be defined without dependencies on XML 1.0.  I find this claim bizarre given that XML 1.0 is referenced normatively and the data data model definition is full of references to XML 1.0.

If there was no intent to define the data model without dependencies on XML 1.0, then at least half the
text in section 5 is pointless, unnecessary repetition of things that are obvious from the XML spec.
The choice seems to be between reading the spec as having a coherent goal which in some important
details it failed to achieve, and reading the spec as given to garrulous irrelevancies.

> The change proposal seems to be claiming that XPath 1.0 is full of bugs in need of correction because it does not meet a goal that it never had.
> 
> The change proposal also claims that it is a goal of XPath 1.0 that the data model be defined formally.  This is clearly not the case.  

By any mathematical standard, the prose of XPath 1.0 would count as informal.  But that is also
true of the prose in the change proposal.

Compared with other specs, I think the data model section of XPath 1.0 is more explicit and
formal than most.

> XPath 1.0 does not make the slightest attempt to be formal.  Rather it aims to be succinct and readily understandable.  The level of formality in the data model definition is similar to that of the rest of the spec and of companion specs (XML 1.0, XML Namespaces, XSLT 1.0).  It is also virtually impossible to be really rigorous about the construction of the data model from the XML document, without specifying this in the XML spec itself: for each syntax production the XML spec would need to explain how to corresponding data model was constructed.

I see no connection between the formality of an exposition and the title of the document
in which it appears.  The XML spec is not formal about (for example) identity criteria for
elements, because nothing in the XML spec appeals to element identity (at least, not 
for the cases it leaves indeterminate).  The XPath 1.0 spec does need to be determinate
about element node identity, and it seems bizarre to me to suggest that it could not be
more precise or careful without changes to the text of the XML 1.0 spec.

> 
> I am also not convinced that in many cases the proposed wording changes are in fact improvements.  If the WG does decide to go ahead with this change, I can make some more detailed comments.  But for the moment, I would just mention a couple of points.
> 
> XPath 1.0 does not constrain the root node to have exactly one element child. In the case where the data model is constructed from an XML document, there will of course be exactly one child.  But in other cases (eg querying into a DOM DocumentFragment) it would be unhelpful to impose such a restriction.  (XPath 1.0 is generally fairly loose -- for example, it does not define conformance -- so as to provide maximum flexibility to referencing specs.)

Thank you for this clarification.

The XML spec seems, then, to be normative for the description of the data model, except for the
parts of it that don't apply.  On this view, the XPath 1.0 spec is readily understandable only for
readers gifted with a certain degree of clairvoyance.

> 
> The reason why the spec uses terminology like "There is an element node for every element" instead of referencing particular productions is because of entity expansion.  For example, given
> 
> <!DOCTYPE doc [
> <!ENTITY e "<x>foo</x>">
> ]>
> <doc>&e;&e;</doc>
> 
> I am comfortable with saying (somewhat vaguely) that there are three elements.  

The problem is that there is nothing in the XML spec or the Infoset spec that could be
used to argue that there are three elements here, instead of two.  Many people
are comfortable saying that there are three elements here, but a count of two elements
is equally compatible with the XML specification.

> I am much less comfortable saying that there are three occurrences of the "element" production (in fact, I would say it is clear that there are only two occurrences of the "element" production).

On the contrary; after entity expansion we have a sequence of character types
matching the document production of the XML spec, but we do not necessarily
have a sequence of character tokens matching the document production.  In
the sequence of character types, there are clearly three occurrences of strings
(sequences of character types) matching the element production, even though there
are only two such string-types.  That is the difference between a string type and
an occurrence of a string type. 

-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
****************************************************************
Received on Thursday, 14 March 2013 15:35:16 UTC