Re: Description and thoughts behind option 6 (part 1 of 2) from Matthew Perry on 2012-03-30 (public-rdf-dawg@w3.org from January to March 2012)

From: Matthew Perry <matthew.perry@oracle.com>
Date: Fri, 30 Mar 2012 09:36:20 -0400
To: public-rdf-dawg@w3.org
Message-ID: <4F75B6D4.4020808@oracle.com>
Hi Andy,

Thanks for writing this up.

The more I think about this simplified version of property paths the more I like it. I think it covers 99% of the real world use cases and eliminates the fundamental complexity problems we had with unbound counting operators. A future WG can always add the { } operators to support counting semantics. The reduced feature set should ease some of the burden on implementers and also be easier for users to understand.

In my opinion, the complexity (syntax and expressiveness) of property paths is starting to get out of hand with simultaneous support for counting and non-counting semantics. It's hard for me to imagine how to explain all these variations to non-expert users. If we add too many features to property paths now, it's very hard to remove them later. Commercial software vendors have to support a release for a very long time.

Cheers,
Matt

On 3/30/2012 5:25 AM, Andy Seaborne wrote:
> This message is one of two.
>
> This first message has a description of why option 6 is the way it is and then describes the definitions.
>
> The second message gives the core of the definitions in a more formal, but not completely formal, way with some explanations.
>
> Read this message; optionally read the other one.
>
> == Option 6
>
> This takes the F&R uses cases as natural user expectations and so to suggest the path operators needed and their semantics.
>
> { ?list rdf:rest*/rdf:first ?member } is a good illustration.
>
> If the list is (1 2 1), we want the results to be
> (?member=1, ?member=2, ?member=1) i.e. 1 twice.  So "/" is not distinct.
>
> rdf:rest* is the same for either distinct or counting * on a well-formed list.  It's a chain, you get the same results either way.
>
> { :X rdf:type/rdfs:subClassOf* ?C }
>
> Here, {:X rdf:type ?V } is only going to yield distinct answers because an RDF graph is a set of triples.  You can not get duplicate values for ?V. The distinct/counting issue for "/" does not matter here.
>
> Like foaf:knows*, this use of rdfs:subClassOf* is more about which terms can be reached, not how many times.  So a definition that does  distinct result, which can be a lot faster, is all that's needed.
>
> So, we get:
>
> 6.A:   /, |, ! as there are in 2LC.
> 6.B:   *, +, ? are non-counting
> 6.C:   No DISTINCT
> 6.D:   No {} forms: {n}, {n,m}, {n,}, {,m}
>
> It is less than other proposals - you can construct use cases which it does not cover directly.  The question for the WG is whether it's better to leave those to a future WG.
>
> == Description
>
> There's the core of the formal definitions in the other message.  Here, we describe the operators.
>
> 6.A are done by transformations but it's useful to also have definitions as well (this is true of 2LC as well, but it didn't have them) because we end up talking about "evaluating a path".
>
> { X :p/:q Y } is { X :p ?V . ?v :q Y }
> etc
>
> 6.C and 6.D are quite easy.  Remove text from document.
>
> 6.B:
>
> For the remaining operators, we start by defining
>
>    operator(X path ?var)
>
> and then define
>
>    operator(?var1 path ?var2)
>
> by considering every node in the graph for ?var1.  So the core definition is operator(X path ?var) for some given RDF term (graph node) x.
>
> It's all based on a function PathPoints(X, P, N) as the set of points reached from X by exactly N steps of path P.  As this is a definition, we're not worrying about the fact there may several ways to do that; the result is a set.
>
> { x path? ?V } is the set of bindings (?V=y) where y is zero or one steps from x.  ?V is bound to each element of the set:
>
>    PathPoints(x P 0) set-union PathPoints(x P 1)
>
> { x path* ?V } is the set of bindings (?V=y) where y is zero or more steps from x -- all PathPoints(X, P, N) for N >= 0
>
> { x path+ ?V } is the set of bindings (?V=y) where y is one or more steps from x -- all PathPoints(X, P, N) for N >= 1
>
> To help implementers, there is an non-normative description of a simple function that will calculate path* -- it's a modified form of ALP from 2LC -- and it's much faster.  It's not the only way but it is a short illustration of an algorithm for path*.
>
>     Andy
>
Received on Friday, 30 March 2012 13:36:52 UTC