Re: Framing and Query from Gregg Kellogg on 2016-10-11 (public-linked-json@w3.org from October 2016)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Tue, 11 Oct 2016 12:03:53 -0700
To: George Svarovsky <gsvarovsky@idbs.com>
Cc: Markus Lanthaler <markus.lanthaler@gmx.net>, Linked JSON <public-linked-json@w3.org>
Message-Id: <FAA40BAD-11F3-4593-9452-ABD09FAAEE57@greggkellogg.net>
> On Oct 11, 2016, at 3:02 AM, George Svarovsky <gsvarovsky@idbs.com> wrote:
> 
> Hi Gregg, I'm glad to be here and I hope I can be of help.
> 
> I've taken the liberty of renaming this thread, and capturing the main recent salient points on this topic from the previous thread:
> 
> Gregg >>> Additionally, the Framing algorithm [2] has proven to be important, but work on the specification was never complete, and implementations  have moved beyond what was documented in any case.
> Markus >> It is certainly handy but I'm not sure there's agreement on what exactly it should be. Initially it was just (or at least mostly) about re-framing an existing graph... I think what a lot of people (myself included) actually want and need is to query a graph and control the serialization of the result. Maybe we should start with a discussion on the role of framing!?
> George >> I have a particular interest in framing, and I concur with Markus that what I actually want is (some degree of) graph query.
> Gregg > I know there has been some discussion on more sophisticated querying, but I’m not aware of any specific proposals. And, for my part, it seems to me that SPARQL Construct pretty much handles these use cases, other than for named graphs. It seems to me that trying to do something very significant could easily be a rat-hole, but it’s worth a discussion.
>> 
>> Another possibility I considered at one point was a JSON-LD based query specification language that would parse to the SPARQL Abstract Algebra (or simply generate SPARQL syntax), with triples derived from the JSON-LD used as the implicit dataset. This is probably more constrained, and leaves the messy query bits to a mature specification. This is significant enough, that it probably requires a specification separate from framing, and presumes that it’s the SPARQL syntax that is the issue being addressed.
> 
> The first internal POC I did with JSON-LD included a JSON query specification language, very closely related to a number of JSON query syntaxes such as MongoDB, FreeBase, Backbone-ORM and TaffyDB. In common with these it was deliberately limited in its capabilities, particularly for joins (ironically); but it was heavily invested in JSON-LD, effectively being a super-set with query operators. It was intended to be backed by our native Oracle schema, but it actually found more traction as an API to JSON-LD in elasticsearch.
> 
> I can go into more detail on that if there's interest. But in the meantime, earlier this year another POC led me to using an actual Triplestore for the first time, and I spent some happy hours fighting with constructing SPARQL in Node.js. Long story short, I ended up doing precisely what you (Gregg) just suggested :) I've shared it on GitHub and NPM [1].

The fact that the data model for JSON-LD is, in fact, RDF, makes SPARQL a natural choice for doing queries. Of course, other graph query algorithms could be adapted, but I suspect we’ll run into impedance issues, given that many of these are Property Graph based, not RDF graph. Also, SPARQL gives the opportunity to include Entailment Regimes as part of the solution space. I would probably tend to start with a more limited mapping to SPARQL Query, though.

Your JSON-RQL looks similar to what I was thinking, but I think we probably need separate @construct and @where sections, similar to how SPARQL CONSTRUCT works.

GraphQL also looks interesting, and could be a natural for JSON-LD based on its syntax. However, I’m concerned that as we go through it, we’ll find things that don’t match up as well given the RDF data model. But, there’s no reason that we would need to choose a single query mechanism, and perhaps there’s room for both GraphQL- and SPARQL-based approaches.

>> I think there are several ways we could go:
>> 
>> 1) Improve framing based on the existing algorithms which provide some degree of manipulating and limiting the framed data based on existing relationships.
>> 2) Consider a way to include a variable syntax, and how this might be used for both matching and constructing data
> 
> While I'm a fan of query-by-example, I think in the general case there's too much complexity in interlacing the Query (pattern-matching existing relationships), with the Frame (the structure I want to return). Personally, I've always ended up separating these concerns in the syntax. However, I think it does come down to how powerful you want your query language to be. GraphQL [2] happily combines the two into one tree, because its query syntax is very limited, deliberately. Trying to do the full power of SPARQL in this way would surely be messy. But these languages have different, almost non-overlapping, sweet-spots--one is for building application APIs, the other for database APIs.

Indeed.

>> 3) Consider the implications of using SPARQL via de-serialization from JSON-LD to the RDF data model, performing a SPARQL query operation, and re-serializing back to JSON-LD and framing using some variation of the existing algorithms.
> 
> I'm not sure what you mean here. Can you elaborate?

My though was to use SPARQL bouncing through RDF. Basically the following steps:

1) Specify query in SPARQL, perhaps using a JSON-LD inspired syntactic variation mapping to the SPARQL Algebra.
2) Turn the JSON-LD to be “framed” into RDF, and use as the dataset against which the SPARQL query (construct) is run.
3) Serialize the constructed RDF using the format of the @construct clause hinted at above, to frame the results.

Just a wild shot at what this might look like:
{
  "@context": {
    "dc": "http://purl.org/dc/elements/1.1/",
    "ex": "http://example.org/vocab#"
  },
  "@construct": {
    "@id": "?lib",
    "@type": "ex:Library",
    "ex:contains": {
      "@id": "?book",
      "@type": "ex:Book",
      "dc:creator": "?creator",
      "?bp": "?bo",
      "ex:contains": {
        "@id": "?chapter",
        "@type": "ex:Chapter",
        "?cp": "?co"
      }
    }
  },
  "@where": {
    "@id": "?lib",
    "@type": "ex:Library",
    "ex:contains": {
      "@id": "?book",
      "@type": "ex:Book",
      "dc:creator": "?creator",
      "?bp": "?bo",
      "ex:contains": {
        "@id": "?chapter",
        "@type": "ex:Chapter",
        "?cp": "?co"
      }
    }
  }
}


The @construct part forms a frame, where objects are repeated as necessary based on subject matches. This roughly would translate to the following SPARQL Query:

PREFIX dc11: <http://purl.org/dc/elements/1.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

CONSTRUCT {
  ?lib a ex:Library; ex:contains ?book .
  ?book a ex:Book; dc:creator ?creator; ?bp ?bo .
  ?chapter a ex:Chapter; ?cp ?co .
}
WHERE {
  ?lib a ex:Library; ex:contains ?book .
  ?book a ex:Book; dc:creator ?creator; ?bp ?bo .
  ?chapter a ex:Chapter; ?cp ?co .
}

Or, directly to the Algebra:

(prefix
 (
  (dc11: <http://purl.org/dc/elements/1.1/>)
  (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
  (xsd: <http://www.w3.org/2001/XMLSchema#>))
 (construct
  (
   (triple ?lib a ex:Library)
   (triple ?lib ex:contains ?book)
   (triple ?book a ex:Book)
   (triple ?book dc:creator ?creator)
   (triple ?book ?bp ?bo)
   (triple ?chapter a ex:Chapter)
   (triple ?chapter ?cp ?co))
  (bgp
   (triple ?lib a ex:Library)
   (triple ?lib ex:contains ?book)
   (triple ?book a ex:Book)
   (triple ?book dc:creator ?creator)
   (triple ?book ?bp ?bo)
   (triple ?chapter a ex:Chapter)
   (triple ?chapter ?cp ?co)) ))

Of course, in this case, the @construct and @where bits are symmetrical, and perhaps there’s a shortcut for this case, but in general, the @construct and @where are only related via variable bindings.

Gregg

>> I’m certainly interested in hearing suggestions on other approaches, along with some use cases/examples.
> 
> [1] https://github.com/gsvarovsky/json-rql
> [2] http://graphql.org/
> 
> -----Original Message-----
> From: Gregg Kellogg [mailto:gregg@greggkellogg.net]
> Sent: 10 October 2016 23:51
> To: George Svarovsky <gsvarovsky@idbs.com>
> Cc: Markus Lanthaler <markus.lanthaler@gmx.net>; Linked JSON <public-linked-json@w3.org>
> Subject: Re: Reactivating the CG to work on updated versions of the specs
> 
>> On Oct 10, 2016, at 2:32 AM, George Svarovsky <gsvarovsky@idbs.com> wrote:
>> 
>> Hi Markus & Gregg & everyone
> 
> Hi George, glad to have you! Please consider joining the Community Group [1], which simplifies IP issues.
> 
>> I've worked with JSON-LD since 2013, for IDBS internal POC work, including prototype APIs and indexing in elasticsearch. I'd like to make it the lingua franca of our foundational APIs going forward. So although I'm not currently a 'heavy user', I'd like to become one! and I'd be very happy to be involved in the new wave of progress.
>> 
>> I have a particular interest in framing, and I concur with Markus that what I actually want is (some degree of) graph query. I have some thoughts, which I'll write out in a new thread.
> 
> I think there are several ways we could go:
> 
> 1) Improve framing based on the existing algorithms which provide some degree of manipulating and limiting the framed data based on existing relationships.
> 2) Consider a way to include a variable syntax, and how this might be used for both matching and constructing data
> 3) Consider the implications of using SPARQL via de-serialization from JSON-LD to the RDF data model, performing a SPARQL query operation, and re-serializing back to JSON-LD and framing using some variation of the existing algorithms.
> 
> I’m certainly interested in hearing suggestions on other approaches, along with some use cases/examples.
> 
>> Otherwise do let me know the best way I can help…
> 
> Excellent.
> 
>> George
>> 
>> George Svarovsky | Technical Director | IDBS gsvarovsky@idbs.com |
>> www.idbs.com | @gsvarovsky
> 
> Gregg
> 
> [1] https://www.w3.org/community/json-ld/participants
> 
>> -----Original Message-----
>> From: Markus Lanthaler [mailto:markus.lanthaler@gmx.net]
>> Sent: 10 October 2016 09:55
>> To: 'Linked JSON' <public-linked-json@w3.org>
>> Subject: RE: Reactivating the CG to work on updated versions of the
>> specs
>> 
>> It is great to see you taking the initiative on this Gregg!
>> 
>> On 30 Sep 2016 at 11:31, Gregg Kellogg wrote:
>>> JSON-LD 1.0 and JSON-LD API 1.0 have been out and successful for many years now.
>>> JSON-LD has succeeded beyond the wildest dreams of the CG, thanks to broad adoption.
>> 
>> Indeed!
>> 
>> 
>>> Additionally, the Framing algorithm [2] has proven to be important,
>>> but work on the specification was never complete, and implementations
>>> have moved beyond what was documented in any case.
>> 
>> It is certainly handy but I'm not sure there's agreement on what exactly it should be. Initially it was just (or at least mostly) about re-framing an existing graph... I think what a lot of people (myself included) actually want and need is to query a graph and control the serialization of the result. Maybe we should start with a discussion on the role of framing!?
>> 
>> 
>>> I think it’s time to get back to these documents to create a future
>>> 1.1 Community Group release of the specifications;
>> 
>> 1.1 sounds like minor tweaks to the existing official W3C specifications but some of the discussions and proposals I just saw go way beyond that. What do you consider to be in scope for 1.1?
>> 
>> 
>>> At this point, I’d be happy to see active engagement on the mailing
>>> list to move these issues forward; I’m prepared to do the heavy
>>> lifting on the specification documents, and to maintain tests and my
>>> own Ruby implementation to match. Hopefully, other implementors and
>>> heavy users can actively engage in making this happen (perhaps an
>>> hour a week). It may be that we’ll want to start up the bi-weekly calls we used to discuss and resolve on these issues prior to moving into the RDF WG.
>> 
>> I'd definitely like to help with this but unfortunately my spare cycles are quite limited.
>> 
>> 
>> Cheers,
>> Markus
>> 
>> 
>> --
>> Markus Lanthaler
>> @markuslanthaler
>> 
>> 
>> The content of this e-mail, including any attachments, is confidential and may be commercially sensitive. If you are not, or believe you may not be, the intended recipient, please advise the sender immediately by return e-mail, delete this e-mail and destroy any copies.
> 
> 
> The content of this e-mail, including any attachments, is confidential and may be commercially sensitive. If you are not, or believe you may not be, the intended recipient, please advise the sender immediately by return e-mail, delete this e-mail and destroy any copies.
Received on Tuesday, 11 October 2016 19:04:29 UTC