Re: [TF-PP] Scoping the design space from Luke Wilson-Mawer on 2009-09-23 (public-rdf-dawg@w3.org from July to September 2009)

From: Luke Wilson-Mawer <luke.wilson-mawer@garlik.com>
Date: Wed, 23 Sep 2009 10:47:39 +0100
To: Paul Gearon <gearon@ieee.org>
CC: "Seaborne, Andy" <andy.seaborne@hp.com>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4AB9EEBB.8000002@garlik.com>
Paul Gearon wrote:
> On Tue, Sep 15, 2009 at 8:38 AM, Seaborne, Andy <andy.seaborne@hp.com> wrote:
>   
>> Property paths task force.
>> http://www.w3.org/2009/sparql/wiki/TaskForce:PropertyPaths
>>
>> My preference for discussions is on-list so everyone can join in as they see fit - if that is going to get in the way of the must-do features, then we may need to replan but I can see it as being appropriate because there are interactions with entailment (and others).
>>
>> My suggestion for how to proceed is to first have some email discussion about scoping then have a telecon moving towards choosing a strawman and create issues against that.  But I suggest we first work on whether certain major topics are in or out first.
>>
>>
>> There are a number of systems that that provide implementation experience of paths.
>>  ARQ [1], SPARQLer [2], PSPARQL [3], Corese [4]
>> There has been some theory work as well.
>>
>> And from these are some scope issues to be considered such as:
>>
>> + variables in paths
>> + being able to work with the length of the path
>> + being able to work with the path found
>>
>> (and I'm sure there are others, large and small.)
>>     
>
> Muglara currently has transitive predicates, but it's a general
> mechanism that I've been looking to apply to other things, such as
> path discovery (for small graphs, since it's log(n) in time, but EXP
> in space). So I'm interested in this.
>
>   
>> If we take some kind of path matching as a baseline (no variables, no values, no path-valued variables), what are the implications of these issues over that baseline?
>>
>>
>> Variables in paths: what happens about ?s ?p* ?o when there are no ?p ?
>> This could also lead to easily written very expensive query executions.
>>     
>
> It depends on the definition. Are you looking for transitive closure
> for each predicate on its own, or allowing ?p to be anything between
> bound values for ?s and ?o? I'm guessing the former and not the latter
> (since you can't really specify that ?s and ?o have to be bound, and
> even so, if the path is too long, then it's a problem with exponential
> complexity).
>   
I've probably misunderstood what ?s ?p* ?o means, but I have the 
following comments nonetheless. 

If we say that we are only looking for the transitive closure of each 
predicate on its own, I don't see how we gain any advantage by allowing 
the use of variables.  Can't we just write most (interesting) queries in 
a form like the one below?

SELECT ?s ?ob1 ?ob2 WHERE {
    {
        ?s :somePredicate* ?ob1 .

    } UNION {

        ?s :someOtherPredicate* ?ob2 .

    }
}

Naively (ignoring complexity problems), allowing ?p to take any value 
seems to produce more interesting use cases, especially in untidy real 
world data, where people might use different predicates interchangably, e.g:

SELECT ?s ?ob1 ?ob2 ?p WHERE {
       ?s ?p* ?y .
       FILTER(?p = <:somePredicate> || ?p =<:someSimilarPredicate> ) .
}

Although I don't understand how the many different values of ?p can be 
projected here.
> As for simply allowing for paths of single predicates from one value
> to another, you run some very real risks of blowing out on time or
> space if the endpoints are not bound. Also for unbound values of ?s
> and ?o you'll get every statement in the system (pathlength of just
> 1), which seems counter to what is probably desired (though correct).
>
>   
>> Having access to the length of the path needs to say what happens for loops and for multiple paths between the same two graph nodes.
>>     
>
> I would suggest the minimum length for each, though it would appear to
> be computationally hazardous (depending on the algorithm).
>
>   
>> Path-values variables enable functions on the path in FILTERs and SELECT expressions but need to work with results format.
>>     
>
> I *think* I see what you're talking about. Can you provide an example please?
>
>   
Is this the same as the problem of projecting ?p when it has many values?
>> Personally, I think path-valued variables are a step too far at the moment but if there is a way to leave that open for the future then that would be ideal.  I don’t understand the implication of path lengths yet - ARQ's property paths match once per end point pair how ever found.
>>     
>
> I believe I know how to change our system. I think we'll be going to
> linear time, with exp space (ie. the same memory we're using
> currently, but it will take longer to get there while we count up the
> distance).
>   
With the constraint of allowing ?p to take any values between ?s and ?o, 
I don't see what extra expressivity variables offer.
>> I do think having a general matching construct like
>>
>>        PATH(?s, pathpattern, ?o)
>>
>> which allows additional features in the future (c.f MATCH in Corese - maybe PATH(?pathVar, ?s, pathpattern, ?o)) but the inline syntax is also very convenient.  Maybe the inline syntax is an abbreviation for PATH(something).
>>
>> Comments, Thoughts, Suggestions?
>>     
>
> When I first did this I found the hardest part was the syntax. I had
> written the algorithm and it tested just fine, and I expected to hook
> it into TQL easily, only to discover that it took me days to get
> something I could work with that people didn't hate. Personally, I
> still hate the TQL syntax (though the functionality is great).
> Unfortunately, the syntax I chose looks a little like:
>   PATH(?s predicate ?o)
>
> :-)
>
> Regards,
> Paul Gearon
>
>
Received on Wednesday, 23 September 2009 09:48:15 UTC