Re: Match Pattern Proposal from Jeni Tennison on 2006-11-03 (public-xml-processing-model-wg@w3.org from November 2006)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Fri, 03 Nov 2006 09:36:16 +0000
To: public-xml-processing-model-wg@w3.org
Message-ID: <454B0D90.2000402@jenitennison.com>
Hi,

Norman Walsh wrote:
> On input, viewport, and for-each, it seems like we have three choices:
> 
> 1. Use 'select' semantics in every case.
> 2. Use 'match' semantics in every case.
> 3. Use 'select' semantics for some and 'match' for others.
> 
> I have a marginal preference for 1 or 2 on the basis that it's easier
> to explain to users. And I think select semantics are easier to explain
> and make more sense in the case of p:input, so I favor 1.
> 
> But Alex and Henry have both expressed a preference for match
> semantics at least on viewport and maybe on for-each.
> 
> What do others think?

I don't buy Alex's streamability arguments: there's a subset of XPath 
expressions that are streamable, and there's a subset of patterns that 
are streamable. Patterns don't automatically give you streamability. I 
don't think the ease of doing streamability analysis should be of high 
priority.

I'm more open to the usability arguments, but I think they can be quite 
subjective. For example, although Henry finds it easier to write "div" 
than "//div", the vast majority of XSLT newbies will naturally write 
"//div" in their match patterns. From my experience, I think newbies in 
general don't understand the difference between the two and will tend to 
use expressions as a default (and wonder why they don't work when they 
try to use one that isn't a pattern).

Looking at the distinction in XSLT, patterns are used when you are 
already looking at a specific subset of nodes. For example, the 'count' 
attribute on <xsl:number> holds a pattern because you're already only 
looking at the ancestors of the current node; in XSLT 2.0, the 
'group-starting-with' attribute on <xsl:for-each-group> holds a pattern 
because it's compared with the nodes in the selected sequence. The 
'match' on <xsl:template> only looks at the nodes that you've applied 
templates to.

It seems to me that in XProc, the subset of nodes we're examining for 
matches are "all the nodes", which isn't really a subset. Given that, 
for consistency with XSLT, I think that we should really be using an 
expression.

I'm particularly concerned by the idea of for-each using a pattern, 
given that <xsl:for-each> uses an expression. I think that the 'select' 
on input and for-each should do the same thing, since they have the same 
semantic. I don't have strong views on viewport, since the 'select' 
there has a completely different semantic (and needs to be renamed anyway).

I'm also concerned that if we use patterns then users can no longer 
(easily) do some things that they might want to do, and I'd like to see 
some discussion on that. There are three things that you can do with an 
expression that you can't easily do with a pattern: identify nodes with 
a function, use axes other than child, attribute and descendant-or-self, 
and use a positional predicate on a node set. But each of these *can* be 
written as a pattern:

For example,

   <p:for-each>
     <p:input port="doc" ... select="rdf:resources(.)" />
   </p:for-each>

might return one document per "resource" in an RDF document. The 
equivalent is:

   *[count(.|rdf:resources(/)) = count(rdf:resources(/))]

Unless you have a fairly clever implementation it's going to be pretty 
computationally expensive, and it's not something that most users will 
be able to do without consulting FAQs.

Another example is

   //dt/following-sibling::dd[1]

to get the first definition for each term in a definition list. Given 
that the context node for these expressions is always the root node, 
these are fairly easy to rewrite as patterns:

   dd[preceding-sibling::*[1][self::dt]]

Finally,

   (//div)[5]

to get the fifth <div> element in the document. The equivalent pattern is:

   div[count(preceding::div) = 4]

All of these are pretty rare, I imagine.

We might want to try to look into the future; if we were making the 
choice between XPath *2.0* expressions and XSLT *2.0* patterns, which 
would we choose? Moving to 2.0, there are more functions and operators 
that return nodes and aren't allowed in patterns. Would we consider it 
reasonable for users to do:

   /root/* except /root/head

for example? I think we would, and I think we would find it hard to 
change to allow this later on if we stuck with patterns now.

A final minor concern is that if we use patterns then we introduce a 
dependency on XSLT, and I'm not sure we want to do that.

In summary, input and for-each have the same semantic and should use an 
expression, in my view. Viewport has a different semantic, so I'd be 
happy for it to use a pattern if there were a good argument for it to do 
so, but I haven't yet heard one.

Cheers,

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com
Received on Friday, 3 November 2006 09:36:44 UTC