Re: ANNOUNCEMENT: RDFStyles: alternative to XSLT for RDF from Graham Klyne on 2003-10-29 (www-rdf-interest@w3.org from October 2003)

From: Graham Klyne <gk@ninebynine.org>
Date: Wed, 29 Oct 2003 13:12:01 +0000
To: Emmanuel Pietriga <epietriga@yahoo.fr>
Cc: www-rdf-interest@w3.org
Message-Id: <5.1.0.14.2.20031029110811.00b86b48@127.0.0.1>
I'm responding in the spirit of continued exploration, not because I have 
any particular point I wish to make...

At 14:02 28/10/03 +0100, Emmanuel Pietriga wrote:
>Graham Klyne wrote:
>
>>Indeed so.  But I still think your idea is a good one to investigate.
>>What follows are some disjointed ramblings, which you may well choose to 
>>ignore.
>>I'm no expert on XSLT, but it seems that the behaviour of 
>><xsl:apply-templates /> is to use some context from the template that 
>>invokes it, and use that in performing a further query.
>
>Aside from explicit (optional) parameters, a template called by an 
>apply-templates instruction does not get any context from the calling 
>template. Unless by context you mean the fact that it will be applied to a 
>(limited) subset of nodes (essentially, but not necessarily, children of 
>the context node in the calling template).

Yes, that's pretty much what I meant.

>>In the case of a tree query, the subsequent query may be quite clearly 
>>scoped.  In the case of a graph query, I think the scope is not so 
>>constrained because the graph may contain loops back to "higher" notes.
>
>True. That's why I was mentioning the need to define the semantics of an 
>apply-tempaltes select="*" equivalent as it cannot be transfered to the 
>case of a graph straightforwardly.

Quite so.

>>One touchstone might be this:  given an RDF graph that follows a strictly 
>>tree pattern (and RDF graphs used to represent frame-like information do 
>>tend to follow such a pattern) then the [default?] behaviour should be 
>>similar to normal XSLT behaviour.
>>So, if one imagines an RDF graph that has a single "source node", and 
>>which contains a strict branching structure, how might this work?  The 
>>initial query (I think of RDFpath as likely being RDF-query-like) must 
>>somehow select (by name, or otherwise) the source node, and then describe 
>>paths from that node, or related to that node.  For each matching query, 
>>is it possible to identify a new node that serves a similar role to the 
>>original "source node"?
>You mean what is called "context node" in standard XPath/XSLT ?

Well, in my half-baked ramblings, I was thinking that the context might be 
a collection nodes, but that was the general idea.

>I believe Sean Palmer addresses this issue in his proposal [1] (I like 
>this one btw ; it would be nice if it were extended to support selection 
>predicates modeling constraints on literal datatypes).
>
>[1] http://infomesh.net/2003/rdfpath/

I find myself wondering if that proposal couldn't be simplified.  I haven't 
given it enough attention to have a clear view.

>>Then consider the ways in which an RDF graph may be not-tree-structured:
>>- There is no single source node:
>
>s/source/root ?

Sort-of.  The term "root" had rather strong tree-associations that I was 
trying to avoid.  But then I blew it...

>>maybe the root context must be considered to be a set of nodes?  E.g.  I 
>>have found that graph queries might start with something like "find all 
>>nodes of a given type", and starting from these, paths through the graph 
>>can be traced by subsequent queries.
>
>In TreeHugger [2], the root is considered to be the RDF document. Children 
>of the root are all nodes in the graph that are the subject of one (or 
>more) statement. What you offer here seems to be a refinment of the same 
>thing (by adding constraints on the members of this set).
>
>[2] http://rdfweb.org/people/damian/treehugger/

Aha!  I was trying to avoid any particular association with the 
document/statement/graph structure, but rather was contemplating the notion 
of descent in the context of a particular query.  In this view, if there is 
a "root", then it is the entire graph (since that is what an initial query 
must be applied to), and subsequent sub-queries may be contextually 
constrained in some way by the results of previous queries.

I think the treehugger approach you describe requires that you accept that 
graph nodes may be duplicated as immediate descendents of the root, and 
also re-appear further down in the tree.

Here's a pathological graph structure I dreamed up:

   :n1a :p12 :n2a ; :p12 :n2b .
   :n1b :p12 :n2a ; :p12 :n2b .
   :n2a :p23 :n3a ; :p23 :n3b .
   :n2b :p23 :n3a ; :p23 :n3b .
   :n3a :p31 :n1a ; :p31 :n1b .
   :n3b :p31 :n1a ; :p31 :n1b .

(To visualize the structure, think of an octahedron, with n1*, n2*, n3* 
pairs of nodes being at opposing vertices.)

The point about this is that it's utterly undistinguished with respect to 
each of the nodes.  You could start at any node and find the same pattern 
of onward graph paths (modulo property renaming).  There is no 
non-arbitrary way to treat this as a tree:  which node should be *the* root?

In practice, I agree that useful RDF data isn't like this, and there is 
often a reasonably preferred way to present the structure tree-wise.  But I 
think there will always be cases in which the choice of which nodes to 
treat as leaves and which to treat as roots of subtree becomes quite 
arbitrary.  And I think such arbitrariness could lead to problems 
(different choices leading to different results).

What I am contemplating is that it is the query itself that provides the 
additional information that determines the selection of the next nodes 
along a path, rather than trying to suggest a structure that is independent 
of a query.

>>- branches may recombine, so that leaf nodes are shared between 
>>branches.  This might still be processed as if it were a tree, if the 
>>duplication thus engendered is not harmful.
>
>It might be harmful. I think it is very dependant on what the 
>transformation designer wants to do. I'm not sure about the best way to 
>address this issue.

I tend to agree...  which is why I was thinking of the (initial) query as 
supplying some context.

>>- the graph may be cyclic.  One view might be to treat this as an 
>>infinite tree.  (Something like this happens for the description of 
>>recursive functions in functional programming language implementation.)
>
>Could you give more information about this?

Ummm... I'm stretching my experience here, but let's see ...

Consider a function represented as a graph:

    sum a b c = (a+b)+c

    [sum a b c] ---> (+) ---> (+) ---> [a]
                         \        \
                          --> [c]  --> [b]

I'm trying to illustrate here how the function maps out to a structure that 
can be evaluated by traversal of the graph.  This case is easy enough, the 
function maps out to a tree, and the traversal is straightforward.  But 
consider a recursive function:

    fac n | n==1      = 1
          | otherwise = fac (n-1) * n

    [fac n] ---> (==1?) ---> [n]
        \            \  \
         \            \  --> [1]
          \            \
           \             --> (*) ---> [n]
            \     (n-1)     /
             --<---------<--

Now we have a loop.  (This isn't exactly how implementations work -- I'm 
just trying to illustrate how recursion may lead to looped 
structures.)  The recursive nature of the definition is slightly clearer if 
lambda notation is used (using \ for lambda):

     fac = (\f \n . if n == 1 then 1 else f (n-1) * n) fac

Now we're treating functions as "first class" values.  To what value should 
'fac' be bound?  The answer is the solution of the above equation.  How do 
we know there is such a solution?  It turns our that there is a way of 
"de-recursing" the equation by introducing some thing variously known as a 
"fixpoint" operator, or "Y" combinator, but application of this operator 
creates an "infinite" function definition.  Something like (borrowing from 
an email by Paul Hudak [1]):

"""
   Y = \h.(\x.h(x x))(\x.h(x x))

which, you will note, is not recursive, yet has the property that Y f = f 
(Y f), so that it is in fact a fixpoint generator.
"""

[1] http://haskell.org/pipermail/haskell-cafe/2003-October/005389.html

Then the value for fac in the above is:

   fac = Y (\f \n . if n == 1 then 1 else f (n-1) * n )

(which is a non-recursive definition for fac).

And it even works!  Observe:

    Y h ..>  (\x.h (x x)) (\x.h (x x))
        ..>  h ( ((\x.h (x x)) (\x.h (x x)) )

which leads to an infinite regress, which is the "infinite tree" to which I 
referred in my previous message.  *Except* that the value of f is not 
needed when the function does not recurse.  Taking h from the above 
definition of fac:

    h = (\f \n . if n == 1 then 1 else f (n-1))

then:

    Y h 1 ..> (\f \n . if n == 1 then 1 else f (n-1)) (Y h) 1
          ..> if 1 == 1 then 1 else (Y h) (1-1) * 1
          ..> 1   (the value of (Y h) (1-1) not being needed)

    Y h 2 ..> (\f \n . if n == 1 then 1 else f (n-1) * n) (Y h) 2
          ..> if 2 == 1 then 1 else (Y h) (2-1) * 2
          ..> (Y h) 1 * 2

where we can evaluate (Y h) 1 as above.

All of the above depends upon a "non-strict" evaluation of function 
parameters:  if we were forced to evaluate the parameter (Y h) before 
starting to evaluate the function (\f \n . if n == 1 then 1 else f (n-1)) 
to which it is is passed as an argument, then we would be defeated by the 
infinite regress in Y.

>>Another view might be to use some kind of dictionary to avoid chasing 
>>round loops -- such a dictionary might represent a new graph querying 
>>context (is this similar to tabling of predicates in Prolog derivatives 
>>like XSB?)

Why did I mention tabling?  I see it as a possible strategy for achieving 
non-strict evaluation.   If we have a table of values of (Y h) n, then an 
empty slot in the table can act as a placeholder for any particular 
value.  Only when that value is required to we need to expand the 
definition of (Y h) for that case.   So we can ignore the infinite regress 
in Y as long as the recursive function terminates.

This is all very abstract, and may seem far removed from graph querying.  I 
happen to have been deep in functional programming for the past few months 
so it colours the way I see things.  But if you find this interesting, 
there's more background at [2].

[2] http://en.wikipedia.org/wiki/Y_combinator

>>Set against these complications is the fact that the primitive relations 
>>in an RDF graph are simpler than child relations in an XML tree:  there 
>>are no attributes vs child elements to worry about, with different rules 
>>for ordering and duplication.  There are no special properties that 
>>demand treatment differently from other properties (other than possibly 
>>for optimizations?).
>
>I do not find this element/attr difference to be difficult to handle in 
>XPath/XSLT. But it is indeed good not to have this additional complexity 
>to add to an RDFPath language.

Yes.

#g
--

PS.  I just spotted an email with a remarkably direct definition of the 
fixpoint operator in Haskell (generalized to lists of mutually recursive 
functions):

[[
 > fix':: [[a->b]->a->b] -> [a->b]
 > fix' fl = self_apply (\pp -> map ($pp) fl)
 >
 > self_apply f = f g where g = f g
]]
-- http://haskell.org/pipermail/haskell-cafe/2003-October/005394.html


------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
Received on Wednesday, 29 October 2003 08:52:43 UTC