Saxonica comments on XProc Last-Call WD, sections 5-7 from Michael Kay on 2007-09-22 (public-xml-processing-model-comments@w3.org from September 2007)

From: Michael Kay <mike@saxonica.com>
Date: Sat, 22 Sep 2007 22:58:45 +0100
To: <public-xml-processing-model-comments@w3.org>
Message-ID: <015201c7fd63$bf313050$6401a8c0@turtle>
1. Editorial, section 5.1.1. It's not clear when reading the proforma for
p:input here that there are other proformas for the same element elsewhere.

2. Technical/Editorial. Section 5.1.2. I don't really know if this is a
technical problem or an editorial one. The parameter mechanism seems
extremely confusing. Perhaps it just needs to be explained better, perhaps
the design needs to be improved. I don't really understand it well enough
after several readings to know.

3. Technical. Section 5.7.3. The default namespace bindings for option and
parameter values seem horribly ad-hoc. This is trying to make intelligent
choices, but I fear it will just be confusing. Also, I don't see how option
2 can work. It seems to suggest that you have to evaluate the select
expression to find some nodes, whose namespaces then form the context for
that select expression. This can't be right, I must have misunderstood
something. Overall, the whole namespace handling here is horribly messy.
Perhaps messiness and namespaces are inevitable bedfellows, but I think this
area deserves further thought.

4. Clarification, section 5.13. para 5. Need to explain "any other
processing". Other than what? Does validation here include both schema and
DTD validation? (I'm not sure this is practical. Some parsers always do DTD
validation if there is a reference to a DTD.). Why are p:document and p:load
distinct?

5. Technical (Requirements). Section 7.1.1 et al. It's not clear to me that
it's desirable for XProc to define fine-grained update operations like this.
It seems to be crossing the boundary from a pipeline processor to
yet-another-transformation-language. I think these operations can be
adequately performed by invoking XSLT or XQuery (especially XQuery with
updates), and that is the approach that should be taken.

6. Clarification. Section 7.1.5. The spec says that delete deletes, but it
needs to explain that this is a deletion of the subtree rooted at a selected
node. If we're going to provide fine-grained updating like this, restricting
it to use an XSLT 1.0 match pattern (which can't reference any variables)
seems to severely limit its utility. (also, the phrase "the resulting
document with the deletions" seems clumsy. "after the deletions" would seem
better.)

7. Typos. Section 7.1.6. Bullet 4, "criteria ... is" [->are]. Last-but-one
para, "each ... has ... when they appear" [->when it appears]. Final para,
"attributes ... is" [->are].

8. Technical. Section 7.1.12 It is not clear why p:label-elements differs
from other similar step types by taking a select attribute rather than a
match attribute. There seems no logical reason why insert, delete etc
shouldn't all take a select attribute.

9. Clarification. Section 7.1.13. What exactly is "namespace-aware DTD
validation"? I thought DTDs were never namespace aware.

10. Technical. Section 7.1.18. Rename is under-specified. There have been
considerable efforts in the XQuery WG to specify a workable rename
operation. The questions are (a) how to deal with the case where the new
name of an attribute is the same as that of an existing attribute, (b)
whether to add or remove namespaces, (c) what to do about namespace
prefixes.

11. Technical. Section 7.1.19. Replace. The functionality seems to be a
subset of Viewport. Is a separate step type really needed? (This is also
true for Delete).

12. Terminology. Section 7.1.25. Unescape markup. This seems a rather
convoluted name for the operation usually called parsing. Also, the options
"encoding" and "charset" seem poorly named, since the value of charset is
what one would normally call an encoding.

13. Ambiguity. Section 7.1.26, last para, "may not" => "might not".

14. Technical. Section 7.1.27, Wrap. This only appears to be well-defined in
the case of element nodes. Text nodes, PIs and comments will work, except
for the group-adjacent provision.

15. Technical/Political. Section 7.1.30, XSLT. What happens if the
stylesheet is an XSLT 2.0 stylesheet but is not a valid XSLT 1.0 stylesheet?
It seems very short-sighted for a new W3C specification to mandate an
obsolete version of another W3C specification. Within a couple of years
there may well be environments that do not support an XSLT 1.0 processor,
which will make it difficult/expensive/impossible to implement a conformant
XProc processor. This also applies throughout to XPath. I would think that a
better solution here is to define a single step type XSLT, with a version
parameter 1.0 or 2.0, having an implementation-defined default, and to say
that an XProc processor must support either version 1.0 or 2.0 or both.

16. Technical. Section 7.1.30, XSLT. How can the transformation use a
non-XML output method such as "text" if its serialization parameters on
xsl:output are ignored?

17. Technical Section 7.2.2 Schematron. It doesn't feel right to me to treat
assertion failures as errors. Perhaps there should be an assert-valid
attribute as in XML Schema Validate.

18. Nomenclature. Section 7.2.3 XML Schema Validate. I have already remarked
that p:validate-xml-schema seems a poorly chosen name for a step that
validates an instance. 

19. Technical. Section 7.2.3 XML Schema Validate. "Set of schemas" in para 4
should be "Set of schema documents". Processors should be allowed to obtain
schema components from sources other than these schema documents if
available. It's not clear how the validation is carried out in terms of the
various options provided in the XML Schema spec to initiate validation. It's
desirable to allow an initial element declaration or type to be nominated,
so that you can test not only that the document is valid but that it is
valid against a particular element declaration or type. It's desirable to
allow an option to indicate whether xsi:schemaLocation attributes within the
instance should be used or not.

20. Technical. Section 7.2.3 XML Schema Validate. Leaving it
implementation-defined whether PSVI annotations can be passed down the
pipeline seems an interoperability nightmare. Better for users to say
whether they expect this or not, and for a dynamic error to occur if it's
requested but not supported.

21. Technical. Section 7.2.4 XQuery. (a) You need to say much more about the
static and dynamic context of the query. (b) Many queries will operate on a
single document, supplying this as the default collection seems clumsy. (c)
It seems wrong when the query returns elements selected from the source
document to wrap these in document nodes, which entails copying the elements
and losing their identity and parentage: though that perhaps suggests a
different mode of running XQuery in which it is used as an alternative to
XPath for selecting nodes rather than transforming them to new nodes. (d) It
seems odd to fail if the query returns things other than elements, why not
apply the sequence normalization rules from section 2 of the serialization
spec to the result? (e) taking the text node descendants of <c:query> to
form the query seems a really bad idea, if there are elements present as in
<c:query><result>for $x in 1 to 10 return <br/></result></c:query> then you
are going to get some very hard-to-understand error messages, and sometimes
you will actually construct a syntactically-correct query that's different
from the one the user wrote. I think it's better to allow an XML
representation of an XQuery, which can be defined as follows: take the
subtree rooted at the c:query element; serialize it; then unescape any
character or entity references appearing in text nodes that occur either (i)
as children of c:query, or (ii) between curly braces (but not within
quotes), and treat the result as an XQuery 1.0 query. This allows for
example <c:query>if (x &lt; 3) then <a/> else <b/></c:query>.

22. Technical. Section 7.2.5 xslt2. I have already commented on the
relationship between 1.0 and 2.0. (a) A stylesheet creating multiple result
documents will allocate each of them a URI. It's not clear how the processor
can distinguish the result documents on the basis of their URIs. Nor is it
clear how one would apply different serialization to different result
documents. Perhaps there should be an option to cause secondary result
documents to be serialized and written to the relevant disk location rather
than being returned on the secondary output port. (b) there's no explicit
provision to run the stylesheet without a principal input document. (c) it
seems that only strings can be supplied as input parameters. (At the very
least, these should be treated as untypedAtomic so they are implicitly
converted to the required type. But really, there's a need to supply any
value that can be yielded by an XPath 2.0 expression.). (d) the option
"allow-collections" seems poorly named. Setting this to false does not
disallow use of collections in the stylesheet. In fact, if the stylesheet is
written to use collections, one might want to set this option to false to
avoid interfering with this.

23. Technical. Section 7.3. doctype-public is a public identifier
(so-called), not a URI.

24. Typo. Section 7.3 "must support be supported"

Michael Kay
Saxonica Limited
Received on Saturday, 22 September 2007 21:59:00 UTC