W3C home > Mailing lists > Public > public-rdf-dawg-comments@w3.org > February 2010

Re: Comments on SPARQL 1.1 Property Paths

From: Ivan Mikhailov <imikhailov@openlinksw.com>
Date: Wed, 03 Feb 2010 22:55:21 +0600
To: Douglas Reid <dreid@bbn.com>
Cc: public-rdf-dawg-comments@w3.org
Message-Id: <1265216121.11404.3066.camel@octo.iv.dev.null>
Hello Douglas,

We support some limited sort of property paths in OpenLink Virtuoso so we've collected some practical experience.

Nobody ever asked us about anything except a path with fixed predicate names and a transitive path with a single predicate.
Even most competent users did not worry; even those who wrote really sophisticated queries where triple patterns form chains of length 10 and more.
There are two strong reasons for that in two areas of SPARQL use:

Use Case 1.
With "physical RDF", SPARQL is painful for debugging when there's a real mess in real data.
So people dump values of intermediate variables in these chains to see where the connection goes wrong,
they comment and uncomment triple patterns in order to fine one that breaks cardinality or just makes the result-set empty due to a typo in graph/predicate IRI.
When the query is finally functional, they do not want to rewrite it one more time to get nice path notation.
Only trivial path expressions survive.

Use Case 2.
With RDB2RDF, SPARQL coding is less sensitive for trivial typos, because the SPARQL compilers know location of data in use and existing predicates.
As a result, path expressions are in wide use, especially when path predicates have primary keys of structures as subjects and members of these structures as objects.
But there's little need in complicated path expressions, because the complexity can be buried into the mapping ruleset. If there is a frequent need in
( foaf:knows | ex:meetOnce | ex:workedWith | ex:employed )
then a mapping can be extended with custom:inContactWith that will represent a union of all four.

Moreover, it is not unusual to see a specific database table that actually keeps all custom:inContactWith
and mappings for foaf:knows, ex:meetOnce, ex:workedWith, ex:employed just filter rows of that table by value of "contact type" column that should be equal to, say, 1 for "knows", 2 for "meetOnce" etc.
In this case, extending the mapping with custom:inContactWith is better than path of four because plain SQL select is better than SQL union of four branches with filters.

I continuously write SPARQL extensions so I've tried to inflate the SPARQL spec with literally dozens of features.
As soon as some real application demonstrated the unavoidable lack of SPARQL expressivity, I've extended the language and the compiler, terrorizing the rest of Virtuoso team with "cascading" feature requests.
The result is that the grammar file for Virtuoso SPARQL is 4-5 times longer than the listing in the W3C spec.
But I do not want to extend the language by features that will require too much time to study and give too few benefits for a newbie.
E.g. you know regular expressions and use them as frequently as most of developers.
Perl proves that they're good enough for most of applications.
Most, but not all: say, an equation solver for nuclear physics would require much more than Perl may offer for matching texts against patterns.
For these purposes, a REFAL programming language exists, and it's perfect for this area.
The only problem is time to learn it, so maybe 10000 people knows that it exists, 1000 knows it somehow and 100 have got paid at least once for writing a REFAL program.
I vote for Perl way :)

Negation and even [ a owl:ObjectProperty ] trick will not help much even for such a simple and practical case as a path with ?p such that ?p is _:1, _:2 etc, but not anything else.
BTW it's a trivial test for SPARUL maturity: in Virtuoso I can write a SPARQL MODIFY statement that inserts or removes a given element in a given position of an ordered list, shifting the tail; in pure SPARQL I can not.

> I might want to discover whether or not ex:GeorgeWBush and ex:TonyBlair have a connection in the graph, ignoring the fact that they are both (rdf:type ex:Politician)s. 

So please do that without worrying about rdf:type. This requires an "any predicate" step in path, but not more.
With "any predicate" available, use a combination of potentially empty path that allows inverses and a final step that does not allow inverses.
I don't expect any commonly used predicate with ex:Politician subject and Bush or Blair object, so connect these two militarists with the described a path and it will work:

ex:GeorgeWBush (AnyPredicate | ^AnyPredicate)* ?proxy . ?proxy AnyPredicate ex:TonyBlair .

I'd consider the use of variables in path, but before that I'd consider
macro definitions that would add syntax sugar by taste without placing
it to the language spec.

Best Regards,

Ivan Mikhailov
OpenLink Software
Received on Wednesday, 3 February 2010 16:55:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 3 February 2010 16:55:56 GMT