RE: Saxonica comments on XProc Last-Call WD, sections 5-7 from Michael Kay on 2007-09-22 (public-xml-processing-model-comments@w3.org from September 2007)

From: Michael Kay <mike@saxonica.com>
Date: Sat, 22 Sep 2007 23:19:20 +0100
To: "'Michael Kay'" <mike@saxonica.com>, <public-xml-processing-model-comments@w3.org>
Message-ID: <015701c7fd66$9f5f6870$6401a8c0@turtle>
One correction: in comment 6 I wrote:

>restricting it to use an XSLT 1.0 match pattern (which can't reference any
variables)

It slipped my memory that this restriction in XSLT 1.0 only applies to match
patterns appearing in the match attribute of xsl:template. Match patterns
appearing elsewhere (for example in xsl:number) can indeed contain variable
references. But it remains unclear why XProc wants to use match patterns
here rather than XPath select expressions.

Michael Kay
Saxonica Limited 

> -----Original Message-----
> From: Michael Kay [mailto:mike@saxonica.com] 
> Sent: 22 September 2007 22:59
> To: 'public-xml-processing-model-comments@w3.org'
> Subject: Saxonica comments on XProc Last-Call WD, sections 5-7
> 
> 1. Editorial, section 5.1.1. It's not clear when reading the 
> proforma for p:input here that there are other proformas for 
> the same element elsewhere.
> 
> 2. Technical/Editorial. Section 5.1.2. I don't really know if 
> this is a technical problem or an editorial one. The 
> parameter mechanism seems extremely confusing. Perhaps it 
> just needs to be explained better, perhaps the design needs 
> to be improved. I don't really understand it well enough 
> after several readings to know.
> 
> 3. Technical. Section 5.7.3. The default namespace bindings 
> for option and parameter values seem horribly ad-hoc. This is 
> trying to make intelligent choices, but I fear it will just 
> be confusing. Also, I don't see how option 2 can work. It 
> seems to suggest that you have to evaluate the select 
> expression to find some nodes, whose namespaces then form the 
> context for that select expression. This can't be right, I 
> must have misunderstood something. Overall, the whole 
> namespace handling here is horribly messy. Perhaps messiness 
> and namespaces are inevitable bedfellows, but I think this 
> area deserves further thought.
> 
> 4. Clarification, section 5.13. para 5. Need to explain "any 
> other processing". Other than what? Does validation here 
> include both schema and DTD validation? (I'm not sure this is 
> practical. Some parsers always do DTD validation if there is 
> a reference to a DTD.). Why are p:document and p:load distinct?
> 
> 5. Technical (Requirements). Section 7.1.1 et al. It's not 
> clear to me that it's desirable for XProc to define 
> fine-grained update operations like this. It seems to be 
> crossing the boundary from a pipeline processor to 
> yet-another-transformation-language. I think these operations 
> can be adequately performed by invoking XSLT or XQuery 
> (especially XQuery with updates), and that is the approach 
> that should be taken.
> 
> 6. Clarification. Section 7.1.5. The spec says that delete 
> deletes, but it needs to explain that this is a deletion of 
> the subtree rooted at a selected node. If we're going to 
> provide fine-grained updating like this, restricting it to 
> use an XSLT 1.0 match pattern (which can't reference any 
> variables) seems to severely limit its utility. (also, the 
> phrase "the resulting document with the deletions" seems 
> clumsy. "after the deletions" would seem better.)
> 
> 7. Typos. Section 7.1.6. Bullet 4, "criteria ... is" [->are]. 
> Last-but-one para, "each ... has ... when they appear" 
> [->when it appears]. Final para, "attributes ... is" [->are].
> 
> 8. Technical. Section 7.1.12 It is not clear why 
> p:label-elements differs from other similar step types by 
> taking a select attribute rather than a match attribute. 
> There seems no logical reason why insert, delete etc 
> shouldn't all take a select attribute.
> 
> 9. Clarification. Section 7.1.13. What exactly is 
> "namespace-aware DTD validation"? I thought DTDs were never 
> namespace aware.
> 
> 10. Technical. Section 7.1.18. Rename is under-specified. 
> There have been considerable efforts in the XQuery WG to 
> specify a workable rename operation. The questions are (a) 
> how to deal with the case where the new name of an attribute 
> is the same as that of an existing attribute, (b) whether to 
> add or remove namespaces, (c) what to do about namespace prefixes.
> 
> 11. Technical. Section 7.1.19. Replace. The functionality 
> seems to be a subset of Viewport. Is a separate step type 
> really needed? (This is also true for Delete).
> 
> 12. Terminology. Section 7.1.25. Unescape markup. This seems 
> a rather convoluted name for the operation usually called 
> parsing. Also, the options "encoding" and "charset" seem 
> poorly named, since the value of charset is what one would 
> normally call an encoding.
> 
> 13. Ambiguity. Section 7.1.26, last para, "may not" => "might not".
> 
> 14. Technical. Section 7.1.27, Wrap. This only appears to be 
> well-defined in the case of element nodes. Text nodes, PIs 
> and comments will work, except for the group-adjacent provision.
> 
> 15. Technical/Political. Section 7.1.30, XSLT. What happens 
> if the stylesheet is an XSLT 2.0 stylesheet but is not a 
> valid XSLT 1.0 stylesheet? It seems very short-sighted for a 
> new W3C specification to mandate an obsolete version of 
> another W3C specification. Within a couple of years there may 
> well be environments that do not support an XSLT 1.0 
> processor, which will make it difficult/expensive/impossible 
> to implement a conformant XProc processor. This also applies 
> throughout to XPath. I would think that a better solution 
> here is to define a single step type XSLT, with a version 
> parameter 1.0 or 2.0, having an implementation-defined 
> default, and to say that an XProc processor must support 
> either version 1.0 or 2.0 or both.
> 
> 16. Technical. Section 7.1.30, XSLT. How can the 
> transformation use a non-XML output method such as "text" if 
> its serialization parameters on xsl:output are ignored?
> 
> 17. Technical Section 7.2.2 Schematron. It doesn't feel right 
> to me to treat assertion failures as errors. Perhaps there 
> should be an assert-valid attribute as in XML Schema Validate.
> 
> 18. Nomenclature. Section 7.2.3 XML Schema Validate. I have 
> already remarked that p:validate-xml-schema seems a poorly 
> chosen name for a step that validates an instance. 
> 
> 19. Technical. Section 7.2.3 XML Schema Validate. "Set of 
> schemas" in para 4 should be "Set of schema documents". 
> Processors should be allowed to obtain schema components from 
> sources other than these schema documents if available. It's 
> not clear how the validation is carried out in terms of the 
> various options provided in the XML Schema spec to initiate 
> validation. It's desirable to allow an initial element 
> declaration or type to be nominated, so that you can test not 
> only that the document is valid but that it is valid against 
> a particular element declaration or type. It's desirable to 
> allow an option to indicate whether xsi:schemaLocation 
> attributes within the instance should be used or not.
> 
> 20. Technical. Section 7.2.3 XML Schema Validate. Leaving it 
> implementation-defined whether PSVI annotations can be passed 
> down the pipeline seems an interoperability nightmare. Better 
> for users to say whether they expect this or not, and for a 
> dynamic error to occur if it's requested but not supported.
> 
> 21. Technical. Section 7.2.4 XQuery. (a) You need to say much 
> more about the static and dynamic context of the query. (b) 
> Many queries will operate on a single document, supplying 
> this as the default collection seems clumsy. (c) It seems 
> wrong when the query returns elements selected from the 
> source document to wrap these in document nodes, which 
> entails copying the elements and losing their identity and 
> parentage: though that perhaps suggests a different mode of 
> running XQuery in which it is used as an alternative to XPath 
> for selecting nodes rather than transforming them to new 
> nodes. (d) It seems odd to fail if the query returns things 
> other than elements, why not apply the sequence normalization 
> rules from section 2 of the serialization spec to the result? 
> (e) taking the text node descendants of <c:query> to form the 
> query seems a really bad idea, if there are elements present 
> as in <c:query><result>for $x in 1 to 10 return 
> <br/></result></c:query> then you are going to get some very 
> hard-to-understand error messages, and sometimes you will 
> actually construct a syntactically-correct query that's 
> different from the one the user wrote. I think it's better to 
> allow an XML representation of an XQuery, which can be 
> defined as follows: take the subtree rooted at the c:query 
> element; serialize it; then unescape any character or entity 
> references appearing in text nodes that occur either (i) as 
> children of c:query, or (ii) between curly braces (but not 
> within quotes), and treat the result as an XQuery 1.0 query. 
> This allows for example <c:query>if (x &lt; 3) then <a/> else 
> <b/></c:query>.
> 
> 22. Technical. Section 7.2.5 xslt2. I have already commented 
> on the relationship between 1.0 and 2.0. (a) A stylesheet 
> creating multiple result documents will allocate each of them 
> a URI. It's not clear how the processor can distinguish the 
> result documents on the basis of their URIs. Nor is it clear 
> how one would apply different serialization to different 
> result documents. Perhaps there should be an option to cause 
> secondary result documents to be serialized and written to 
> the relevant disk location rather than being returned on the 
> secondary output port. (b) there's no explicit provision to 
> run the stylesheet without a principal input document. (c) it 
> seems that only strings can be supplied as input parameters. 
> (At the very least, these should be treated as untypedAtomic 
> so they are implicitly converted to the required type. But 
> really, there's a need to supply any value that can be 
> yielded by an XPath 2.0 expression.). (d) the option 
> "allow-collections" seems poorly named. Setting this to false 
> does not disallow use of collections in the stylesheet. In 
> fact, if the stylesheet is written to use collections, one 
> might want to set this option to false to avoid interfering with this.
> 
> 23. Technical. Section 7.3. doctype-public is a public 
> identifier (so-called), not a URI.
> 
> 24. Typo. Section 7.3 "must support be supported"
> 
> Michael Kay
> Saxonica Limited
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  
> 
> 
> 
>
Received on Saturday, 22 September 2007 22:19:40 UTC