Re: Extending the SPARQL Query Results JSON format for RDF* from Andy Seaborne on 2020-08-05 (public-rdf-star@w3.org from August 2020)

From: Andy Seaborne <andy@apache.org>
Date: Wed, 5 Aug 2020 15:08:42 +0100
To: public-rdf-star@w3.org
Message-ID: <8f2e2d0c-a16c-dea3-73d9-28201538f101@apache.org>
On 05/08/2020 00:11, Jeen Broekstra wrote:
> On Tue, Aug 4, 2020, at 21:00, Andy Seaborne wrote:
>> It's early days in the support for RDF* in Jena so it would be practical
>> to makes changes to converge on common approaches.
>>
>> Apache Jena supports the same results format except the keywords are
>> "subject" "predicate" and "object", not "s", "p", "o". It uses "triple".
>
> That's a good idea I think, and something we may want to adopt in 
> RDF4J as well - we've covered our behinds somewhat by labeling the 
> entire thing "experimental", so we have a little leeway in 
> potentially-breaking changes. Anyway regardless, it shows a 
> convergence would be useful: we now have three implementations, all 
> slightly different.
>
> How/where would be a good place to draft and publish something? Is the 
> SPARQL 1.2 CG a reasonable place for this perhaps? Or do we want to 
> keep RDF* separate from that discussion for now?

Neutral.

>
>> The same design is in application/sparql-results+xml
>>
>> RDF* is done within the existing content types.
>>
>> There are pros and cons for new MIME types vs using existing MINE types.
>>
>> I'm anticipating that some early RDF* usage will be adding some RDF* to
>> existing data as well as seeing new datasets using RDF*. Keeping the
>> existing data/apps working unchanged nudged in the direction of using
>> existing MINE types lowers the barrier for use. In the Jena
>> implementation if the RDF*/SPARQL* features are not used, they don't
>> have an observable  performance impact.
>
> It's not so much the performance I worry about, it's more a backward 
> compatibility thing. Imagine an endpoint that starts appending its 
> dataset with RDF* annotations, and multiple existing clients that 
> query that endpoint. If you support the query result response by 
> extending the existing content type, that existing client can suddenly 
> start receiving a response it can't process on an existing query 
> (after all you can get back an RDF* annotation as a result even if 
> your query is just regular SPARQL).
>
> RDF4J currently handles this by only sending the extended syntax when 
> a client explicitly accepts the new content-type. If a client asks for 
> "regular" json results, we instead encode any annotated triple in the 
> result as an IRI (basically by base 64-encoding the N-triples 
> representation of the statement and minting a urn out of it on the 
> spot). It may not be able to fully interpret this kind of result 
> value, but at least it won't break the parser.

Related: for the java-typed URI, RDF* triple terns had to go under IRI / 
bnode because they can appear in the subject position. Yet they are 
conceptually literals.

>
> Given that client software will need to be updated anyway to 
> /properly/ do useful things with RDF* data in query results, the 
> addition of a MIME-type seems little additional burden.

Right, I don't disagree with that - there are pros and cons for either way.

I don't think it is always direct application-server, but also app-other 
software/library-server and intermediate software isn't aware of the app 
using/not-understanding RDF*. Even some libraries make application 
access to MIME type control quite difficult because they present a 
simplification and hide the MIME-foo to return a whatever-prog-language 
datastructure.

 From experience, MIME types are only patchily understood by users. Some 
users deeply understand and care about the web aspects, some are data 
specialistic who see it and a lot of HTTP as just a mechanism they have 
to use, more getting in the way, "just give me the data!". Which is fine 
- we can't expect everyone to know all the details of everything.

>> One use case that has arisen is wanting to manage the triples annotating
>> other triples separately from the data it refers to.  This is both to
>> help in data management and also to help with the modelling issues [1]
>
> I don't follow how this relates to the syntax formats to be honest, 
> but isn't that essentially what Separate Assertions (SA) mode gives 
> you? In SA mode you could have the annotations in a separate named 
> graph (or a separate database if you want) from the actual facts being 
> annotated.

Yes. This is what Ontotext GraphDB documentation says as well.

If the app wants assertion as well, feeding the parser outstream though 
a pipeline to assert the triple is easy - the reverse, AS from PG 
parsing, would not be. Ditto API implications.

I'd be interested to know what other implementations experience is.

There are two places where SA/PG surfaces, and there is also a SPARQL* 
non-consequence matching grounded <<>>

I'll write some impl experience notes. The implementation is a fairly 
straight forward implementation as described in the paper and the 
consequences (e.g. VALUES) rolled out.

>> Jena can also read Eclipse RDF4J format result sets :-)
>
> Showoff :)
Reading documentation considered harmful?
>
> Cheers,
>
> Jeen

     Andy
Received on Wednesday, 5 August 2020 14:08:59 UTC