Re: Extending the SPARQL Query Results JSON format for RDF* from Jeen Broekstra on 2020-08-04 (public-rdf-star@w3.org from August 2020)

From: Jeen Broekstra <jeen@fastmail.com>
Date: Wed, 05 Aug 2020 09:11:27 +1000
To: public-rdf-star@w3.org
Message-Id: <fedbfad0-3b69-4a76-be42-fdd6ed1266e7@www.fastmail.com>
On Tue, Aug 4, 2020, at 21:00, Andy Seaborne wrote:
> It's early days in the support for RDF* in Jena so it would be practical 
> to makes changes to converge on common approaches.
> 
> Apache Jena supports the same results format except the keywords are 
> "subject" "predicate" and "object", not "s", "p", "o". It uses "triple".

That's a good idea I think, and something we may want to adopt in RDF4J as well - we've covered our behinds somewhat by labeling the entire thing "experimental", so we have a little leeway in potentially-breaking changes. Anyway regardless, it shows a convergence would be useful: we now have three implementations, all slightly different.

How/where would be a good place to draft and publish something? Is the SPARQL 1.2 CG a reasonable place for this perhaps? Or do we want to keep RDF* separate from that discussion for now?

> The same design is in application/sparql-results+xml
> 
> RDF* is done within the existing content types.
> 
> There are pros and cons for new MIME types vs using existing MINE types.
> 
> I'm anticipating that some early RDF* usage will be adding some RDF* to 
> existing data as well as seeing new datasets using RDF*. Keeping the 
> existing data/apps working unchanged nudged in the direction of using 
> existing MINE types lowers the barrier for use. In the Jena 
> implementation if the RDF*/SPARQL* features are not used, they don't 
> have an observable  performance impact.

It's not so much the performance I worry about, it's more a backward compatibility thing. Imagine an endpoint that starts appending its dataset with RDF* annotations, and multiple existing clients that query that endpoint. If you support the query result response by extending the existing content type, that existing client can suddenly start receiving a response it can't process on an existing query (after all you can get back an RDF* annotation as a result even if your query is just regular SPARQL). 

RDF4J currently handles this by only sending the extended syntax when a client explicitly accepts the new content-type. If a client asks for "regular" json results, we instead encode any annotated triple in the result as an IRI (basically by base 64-encoding the N-triples representation of the statement and minting a urn out of it on the spot). It may not be able to fully interpret this kind of result value, but at least it won't break the parser. 

Given that client software will need to be updated anyway to *properly* do useful things with RDF* data in query results, the addition of a MIME-type seems little additional burden. 

> One use case that has arisen is wanting to manage the triples annotating 
> other triples separately from the data it refers to.  This is both to 
> help in data management and also to help with the modelling issues [1]

I don't follow how this relates to the syntax formats to be honest, but isn't that essentially what Separate Assertions (SA) mode gives you? In SA mode you could have the annotations in a separate named graph (or a separate database if you want) from the actual facts being annotated. 

> Jena can also read Eclipse RDF4J format result sets :-)

Showoff :)

Cheers,

Jeen
Received on Tuesday, 4 August 2020 23:12:04 UTC