Re: ANN: LDPath - a path-based query language for querying the Linked Data Cloud from Sebastian Schaffert on 2011-12-02 (public-lod@w3.org from December 2011)

From: Sebastian Schaffert <sebastian.schaffert@salzburgresearch.at>
Date: Fri, 2 Dec 2011 10:17:15 +0100
To: Joshua Shinavier <josh@fortytwo.net>
Cc: Linking Open Data <public-lod@w3.org>, "semantic-web@w3.org >> semantic-web@w3.org" <semantic-web@w3.org>
Message-Id: <96391A11-3CE8-4BC8-A450-C1CAA80CAE26@salzburgresearch.at>
Hi Joshua,

Am 01.12.2011 um 18:48 schrieb Joshua Shinavier:

> 
> [...]
>> The statement regarding "more in line with Linked Data" is in comparison with SPARQL, which is too expressive to query distributed resources, especially when they are represented in the way Linked Data resources are represented.
> 
> 
> SQUIN [1] is a good example of using SPARQL to query the Web of Linked
> Data.  LinkedDataSail provides similar functionality.  Of course, not
> all SPARQL queries are appropriate for querying Linked Data, but it's
> certainly posible to write queries which are.  What with SPARQL 1.1
> Property Paths, it's hard to come up with simple path queries over
> Linked Data which you can't express in SPARQL.  

The point for me is not whether I can express a path query in SPARQL or not. Of course I can, because SPARQL is a very expressive and powerful language. However, as you say, not all SPARQL queries are appropriate for querying Linked Data. Even more so, they will return misleading results if they are building on the common Linked Data link traversal, because the results will not be complete. Consider a query for all resources that link to http://dbpedia.org/resource/Berlin - easily expressible in SPARQL but hard to return the complete result.

The danger is, if you give SPARQL to people, they will also use all possible constructs of SPARQL, not knowing that the results will not be correct or complete. The challenge is to design a language that is restricted enough to not yield misleading results for Linked Data and still expressive enough to make use of the full potential, but without repeating things you can already do in the host language (like Java or Groovy). 

Restricting to a path language is a very good approach because a path through a graph is a common idiom. SPARQL property paths alone are however not enough, because they rely on the surrounding SPARQL environment for e.g. doing filters and test. In LDPath we essentially used SPARQL property paths, but we took the node test concept from XPath to provide the filtering functionality.


> IMO, some of the main
> selling points of a dedicated path language are a less restricted set
> of primitive functions, general-purpose control constructs like loops
> and recursion, and the ability to reuse simple queries to build up
> more complex ones, all of which add to the expressivity of the
> language.

I would not like to have too many control flow constructs in a path language. Of course, in Riddle you have a scripting language in mind, and your functional programming approach fits well with path constructs. But in my scenarios, the path language will always be embedded in some host language - Java or Groovy or Scala or maybe Javascript - that already provides these control flow constructs. So why confuse the user with additional syntax if it is already available.

That said, I really like the way you design Riddle and try to provide as much expressivity as possible while still remaining within the bounds of Linked Data. As an independent language, this is extremely powerful. And I did a similar thing in my PhD some years ago (but then for XML), so I find this approach also very nice. :-)


> 
>> Regarding the comparison you have below: they indeed look very similar. :-)  Personally, however, I prefer the XPath-style because it is what people know from file systems and XPath. Our users typically don't yet have a deep understanding of RDF or of functional programming, so we want to stick to what they already know as much as possible.
> 
> 
> Even better, in my opinion, is a syntax which piggybacks on another,
> mainstream language.  One of the nice things about the Gremlin
> language I mentioned is that comes with Groovy and Scala bindings,
> which makes it especially friendly for anyone familiar with either of
> those languages.  Ripple now has Clojure (Lisp) bindings.  That way,
> the user only needs to learn the abstract syntax of the language, and
> it allows the implementation to be more easily embedded in other
> applications.  In your case, if it's XPath your users are familiar
> with, perhaps there would be some benefit in making LDPath an actual
> subset of XPath.

It actually is - as far as it makes sense. However, XPath was developed with a different data model in mind - one where there is a defined document order, a tree structure, no cycles, backwards axes, a different way of identifying the language of an element, etc. But you are right, and if we have more time we will try to align the language more with XPath, especially the function sets they have there.

Creating a Groovy-like API (bindings as you call it) could, however, also be an interesting exercise and useful in some situations. And it won't even need a parser … ;-)


Greetings,

Sebastian
-- 
| Dr. Sebastian Schaffert          sebastian.schaffert@salzburgresearch.at
| Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
| Head of Knowledge and Media Technologies Group          +43 662 2288 423
| Jakob-Haringer Strasse 5/II
| A-5020 Salzburg
Received on Friday, 2 December 2011 09:18:18 UTC