Re: Thoughts on querying JSON-LD

On Mon, Jul 23, 2012 at 4:43 PM, Gregg Kellogg <gregg@greggkellogg.net> wrote:
> One of the consequences of pursuing using a connect/graphiphy/link-type API call to create an in-memory graph out of JSON-LD as an alternative to framing is that this does not come with a query mechanism.
>
> Existing JSON Query Libraries
>
> There are some packages specifically designed for querying JSON structures:
>
> * Dojo JSON Query [1] allows for a simple query expression against JSON structures using some JSON extensions. For example:
>
>     data = {foo:"bar"};
>     results = dojox.json.query("$.foo",data);
>     results -> "bar"
>
> This is very property-centric, and definitely tree-based, not allowing graph-like structures to be easily searched.
>
> * JSONPath [2], is used by JSON Query, and allows XPath-like expression for searching through JSON structures. For example, consider the following expression:
>
>     x.store.book[0].title
>
>    or
>
>     x['store']['book'][0]['title']
>
> As JSON-LD really represents a graph structure, then graph query patterns make sense. SPARQL is based on BGP (the Basic Graph Pattern), along with an algebra that allows for more complicated queries. In theory, implementing BGP for JSON-LD would allow for full execution of SPARQL SELECT/ASK/DESCRIBE/CONSTRUCT operations in a JSON-LD centric way.
>
> SPARQL Algebra [3]
>
> A SPARQL expression is typically transfered to an abstract syntax, which can be represented using S-Expressions. For example, consider the following query:
>
>     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
>     SELECT *
>     WHERE {
>         ?x foaf:name "Gregg Kellogg";
>              foaf:knows ?y .
>        ?y foaf:name ?n .
>     }
>
> This query might be used to find the names of all the people I know. In the SPARQL Algebra, this could be represented as follows:
>
>     (prefix ((foaf: <http://xmlns.com/foaf/0.1/>))
>      (bgp
>       (triple ?x foaf:name "Gregg Kellogg")
>       (triple ?x foaf:knows ?y)
>       (triple ?y foaf:name ?n)
>      )
>     )
>
> Basically, such a query falls into a BGP, where patterns are in the form of triples with variables used in certain positions. Executing such a query is a matter of applying a pattern and filtering results based upon successive patterns. In this case, ?x stands for the set of subjects having the foaf:name predicate with a value of "Gregg Kellogg". The second pattern further filters those subjects to those having a foaf:knows property and binds the set of objects found to the ?y variable (a universal quantifier). The last filter then looks for those subjects bound to ?y having a foaf:name variable and binds those results to the ?n variable.
>
> In JSON-LD, this could be expressed using a subject definition having variables and something similar to setting the framing "explicit inclusion flag" to false. For example, the first two patterns could be combined as follows:
>
>     {
>       "@id": "?x",
>       "foaf:name": "Gregg Kellogg",
>       "foaf:knows": "?y"
>     }
>
> After performing flattening and expansion, this could be fairly easily matched against the subject definitions within the top-level array using mechanism similar to those in framing.
>
> A second pass could run the third pattern:
>
>     { "@id": "?y", "foaf:name": "?n" }
>
> Typically, this would use the ?y bound variable to limit the set of subject definitions inspected, but this could also be done (less optimally) afterwards.
>
> Alternatively, a more sophisticated query engine, might create a nested query as follows:
>
>     {
>       "@id": "?x",
>       "foaf:name": "Gregg Kellogg",
>       "foaf:knows": {
>         "@id": "?y",
>         "foaf:name": "?z"
>       }
>     }
>
> Aside from actual variable binding, this can clearly relate to the existing framing algorithm to return a result-set that matches the stated form, although actual framing of the results is not required, because they should either directly identify subject definitions in the original dataset, which is presumably already connected as a graph, or create results (similar to CONSTRUCT, or framing today) that can also be turned into a graph.
>
> As a consequence, implementing a complete SPARQL implementation using a JSON-LD data model, instead of a triple/quad-store becomes a distinct possibility.
>
> Gregg
>
> [1] http://dojotoolkit.org/reference-guide/1.7/dojox/json/query.html
> [2] http://goessner.net/articles/JsonPath/
> [3] http://www.w3.org/TR/sparql11-query/#sparqlQuery

Received on Tuesday, 24 July 2012 04:05:28 UTC