Description and thoughts behind option 6 (part 1 of 2) from Andy Seaborne on 2012-03-30 (public-rdf-dawg@w3.org from January to March 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 30 Mar 2012 10:25:22 +0100
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4F757C02.3010202@epimorphics.com>

This message is one of two.

This first message has a description of why option 6 is the way it is 
and then describes the definitions.

The second message gives the core of the definitions in a more formal, 
but not completely formal, way with some explanations.

Read this message; optionally read the other one.

== Option 6

This takes the F&R uses cases as natural user expectations and so to 
suggest the path operators needed and their semantics.

{ ?list rdf:rest*/rdf:first ?member } is a good illustration.

If the list is (1 2 1), we want the results to be
(?member=1, ?member=2, ?member=1) i.e. 1 twice.  So "/" is not distinct.

rdf:rest* is the same for either distinct or counting * on a well-formed 
list.  It's a chain, you get the same results either way.

{ :X rdf:type/rdfs:subClassOf* ?C }

Here, {:X rdf:type ?V } is only going to yield distinct answers because 
an RDF graph is a set of triples.  You can not get duplicate values for 
?V. The distinct/counting issue for "/" does not matter here.

Like foaf:knows*, this use of rdfs:subClassOf* is more about which terms 
can be reached, not how many times.  So a definition that does  distinct 
result, which can be a lot faster, is all that's needed.

So, we get:

6.A:   /, |, ! as there are in 2LC.
6.B:   *, +, ? are non-counting
6.C:   No DISTINCT
6.D:   No {} forms: {n}, {n,m}, {n,}, {,m}

It is less than other proposals - you can construct use cases which it 
does not cover directly.  The question for the WG is whether it's better 
to leave those to a future WG.

== Description

There's the core of the formal definitions in the other message.  Here, 
we describe the operators.

6.A are done by transformations but it's useful to also have definitions 
as well (this is true of 2LC as well, but it didn't have them) because 
we end up talking about "evaluating a path".

{ X :p/:q Y } is { X :p ?V . ?v :q Y }
etc

6.C and 6.D are quite easy.  Remove text from document.

6.B:

For the remaining operators, we start by defining

    operator(X path ?var)

and then define

    operator(?var1 path ?var2)

by considering every node in the graph for ?var1.  So the core 
definition is operator(X path ?var) for some given RDF term (graph node) x.

It's all based on a function PathPoints(X, P, N) as the set of points 
reached from X by exactly N steps of path P.  As this is a definition, 
we're not worrying about the fact there may several ways to do that; the 
result is a set.

{ x path? ?V } is the set of bindings (?V=y) where y is zero or one 
steps from x.  ?V is bound to each element of the set:

    PathPoints(x P 0) set-union PathPoints(x P 1)

{ x path* ?V } is the set of bindings (?V=y) where y is zero or more 
steps from x -- all PathPoints(X, P, N) for N >= 0

{ x path+ ?V } is the set of bindings (?V=y) where y is one or more 
steps from x -- all PathPoints(X, P, N) for N >= 1

To help implementers, there is an non-normative description of a simple 
function that will calculate path* -- it's a modified form of ALP from 
2LC -- and it's much faster.  It's not the only way but it is a short 
illustration of an algorithm for path*.

 Andy

Received on Friday, 30 March 2012 09:25:55 UTC