Re: [TF-PP] Scoping the design space from Paul Gearon on 2009-09-15 (public-rdf-dawg@w3.org from July to September 2009)

From: Paul Gearon <gearon@ieee.org>
Date: Tue, 15 Sep 2009 12:17:00 -0400
To: "Seaborne, Andy" <andy.seaborne@hp.com>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <a25ac1f0909150917n2a0a0abje52a2f0a895b6f8d@mail.gmail.com>
On Tue, Sep 15, 2009 at 8:38 AM, Seaborne, Andy <andy.seaborne@hp.com> wrote:
> Property paths task force.
> http://www.w3.org/2009/sparql/wiki/TaskForce:PropertyPaths
>
> My preference for discussions is on-list so everyone can join in as they see fit - if that is going to get in the way of the must-do features, then we may need to replan but I can see it as being appropriate because there are interactions with entailment (and others).
>
> My suggestion for how to proceed is to first have some email discussion about scoping then have a telecon moving towards choosing a strawman and create issues against that.  But I suggest we first work on whether certain major topics are in or out first.
>
>
> There are a number of systems that that provide implementation experience of paths.
>  ARQ [1], SPARQLer [2], PSPARQL [3], Corese [4]
> There has been some theory work as well.
>
> And from these are some scope issues to be considered such as:
>
> + variables in paths
> + being able to work with the length of the path
> + being able to work with the path found
>
> (and I'm sure there are others, large and small.)

Muglara currently has transitive predicates, but it's a general
mechanism that I've been looking to apply to other things, such as
path discovery (for small graphs, since it's log(n) in time, but EXP
in space). So I'm interested in this.

> If we take some kind of path matching as a baseline (no variables, no values, no path-valued variables), what are the implications of these issues over that baseline?
>
>
> Variables in paths: what happens about ?s ?p* ?o when there are no ?p ?
> This could also lead to easily written very expensive query executions.

It depends on the definition. Are you looking for transitive closure
for each predicate on its own, or allowing ?p to be anything between
bound values for ?s and ?o? I'm guessing the former and not the latter
(since you can't really specify that ?s and ?o have to be bound, and
even so, if the path is too long, then it's a problem with exponential
complexity).

As for simply allowing for paths of single predicates from one value
to another, you run some very real risks of blowing out on time or
space if the endpoints are not bound. Also for unbound values of ?s
and ?o you'll get every statement in the system (pathlength of just
1), which seems counter to what is probably desired (though correct).

> Having access to the length of the path needs to say what happens for loops and for multiple paths between the same two graph nodes.

I would suggest the minimum length for each, though it would appear to
be computationally hazardous (depending on the algorithm).

> Path-values variables enable functions on the path in FILTERs and SELECT expressions but need to work with results format.

I *think* I see what you're talking about. Can you provide an example please?

> Personally, I think path-valued variables are a step too far at the moment but if there is a way to leave that open for the future then that would be ideal.  I don’t understand the implication of path lengths yet - ARQ's property paths match once per end point pair how ever found.

I believe I know how to change our system. I think we'll be going to
linear time, with exp space (ie. the same memory we're using
currently, but it will take longer to get there while we count up the
distance).

> I do think having a general matching construct like
>
>        PATH(?s, pathpattern, ?o)
>
> which allows additional features in the future (c.f MATCH in Corese - maybe PATH(?pathVar, ?s, pathpattern, ?o)) but the inline syntax is also very convenient.  Maybe the inline syntax is an abbreviation for PATH(something).
>
> Comments, Thoughts, Suggestions?

When I first did this I found the hardest part was the syntax. I had
written the algorithm and it tested just fine, and I expected to hook
it into TQL easily, only to discover that it took me days to get
something I could work with that people didn't hate. Personally, I
still hate the TQL syntax (though the functionality is great).
Unfortunately, the syntax I chose looks a little like:
  PATH(?s predicate ?o)

:-)

Regards,
Paul Gearon
Received on Tuesday, 15 September 2009 16:17:37 UTC