Re: RDF/JSON from Gregg Kellogg on 2013-04-10 (public-rdf-wg@w3.org from April 2013)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Wed, 10 Apr 2013 13:31:18 -0700
To: Arnaud Le Hors <lehors@us.ibm.com>
Cc: "public-rdf-wg@w3.org" <public-rdf-wg@w3.org>
Message-Id: <0ADDB9A0-8C54-4E97-B3F1-90B2EE13E3F5@greggkellogg.net>
On Apr 10, 2013, at 11:01 AM, Arnaud Le Hors <lehors@us.ibm.com> wrote:

> Hi all, 
> as a follow up to the discussion we started on today's call I'd like to share the following write-up one of my colleagues provided to explain why they want to be able to use RDF/JSON as well as JSON-LD. His argument is based on very practical experience programming with both formats. 
> 
> Gregg, I don't mean to ignore the response you sent me offline, maybe you can repost it here so that everyone can participate in the discussion. 

Sure, I've folded inline below.

> ----- 
> We like JSON-LD. One use-case we have for it is to store RDF data in a JSON database like MongoDB or CouchDB. We store in the database the RDF data serialized in JSON-LD. 

I have some slides on using JSON-LD with MongoDB: http://www.slideshare.net/gkellogg1/jsonld-and-mongodb.

> Although JSON-LD is working well for us in the database, it turns out to be very clumsy to work with in the programming languages we use, even in Javascript where there should be a natural fit. From a programming point of view we have found that RDF-JSON is much more convenient than JSON-LD. Here is an example to illustrate why. 

As mentioned below, I find that JSON-LD works well with Backbone.js: http://backbonejs.org.

> We have many documents that look like the following, in Turtle format. Assume this is the representation of the resource http://acme.com/resourceX 
> 
> @prefix dc: <http://purl.org/dc/terms/>. 
> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. 
> <http://acme.com/resourceX> 
>         dc:title "resource X"; 
>         … other triples here … 
>         dc:description "description of resource X". 
> <http://acme.com/collectionM> rdfs:member <http://acme.com/resourceX>. 
> 
> We have lots of these, for varying values of X and M (more Xs than Ms generally). 
> 
> My programming task is to extract the triples of resourceX if resourceX is an rdfs:member of CollectionM. 
> 
> Let's consider first an RDF-JSON representation of the above Turtle: 
> 
> {"http://acme.com/resourceX" : 
>     {"http://purl.org/dc/terms/title":"resource X", 
>     "http://purl.org/dc/terms/description":"description of resource X", 
>     ... other predicates ... 
>     }, 
>  "http://acme.com/collectionM" : 
>     {"http://www.w3.org/2000/01/rdf-schema#member" : {"@id" : "http://acme.com/resourceX"}} 
> } 
> 
> Here is the Javascript code to perform my programming task: 
> 
> result = {}; 
> if ('http://acme.com/collectionM' in representation) { 
>     subject = representation['http://acme.com/collectionM']['http://www.w3.org/2000/01/rdf-schema#member']['@id']; 
>     for (var predicate in representation[subject]) { 
>         result[predicate] = representation[subject][predicate]; 
>         } 
>     } 
> 
> Now here is the same resource in JSON-LD format: 
> 
> [ 
>     {"@id" : "http://acme.com/resourceX", 
>     "http://purl.org/dc/terms/title":"resource X", 
>     "http://purl.org/dc/terms/description":"description of resource X", 
>     ... other predicates ... 
>     }, 
>     {"@id": "http://acme.com/collectionM", 
>     "http://www.w3.org/2000/01/rdf-schema#member" : {"@id" : "http://acme.com/resourceX"} 
>     } 
> ] 
> 
> 
> I could have made this array the value of an '@graph' property – it would not have changed the example much. Here is the corresponding Javascript: 
> 
> result = {}; 
> for (var collectionNode in representation) { 
>     if (collectionNode['@id'] == 'http://acme.com/collectionM') { 
>         subject = subjectNode['http://www.w3.org/2000/01/rdf-schema#member']['@id']; 
>         for (var subjectNode in representation) { 
>             if (subjectNode['@id'] == subject) { 
>                 for (var predicate in subjectNode) { 
>                     if (predicate != '@id') { // '@id' is not a predicate 
>                         result[predicate] = subjectNode[predicate]; 
>                         } 
>                     } 
>                 break; 
>                 } 
>             } 
>         break; 
>         } 
>     } 
> 
> As you can see, this is much more complicated than what I have to write with RDF-JSON. 
> This is just one example, but in our experience it is typical – I have not made up an atypical example to make a point – and it doesn't actually matter if the language is Javascript, Python or Ruby. The essential difference derives from the following: 
> 
> 1.        You often know in the code what subject you are looking for, either as a constant or in the value of a variable. With RDF-JSON, you just index the structure with that key. With JSON-LD, you have to loop through the subjectNodes looking for the one whose '@id' matches your known subject. You could use fancier programming constructs, like select or reduce, to find the subjectNode you are looking for, but it still does not match the simplicity of a simple hash/dictionary access in RDF-JSON. 
> 2.        If you are looking for predicates, you have to filter out the '@id' entries, which are artifacts of the format and don't correspond to triples. 
> 
> We like JSON-LD and it has a use in our applications. However, it is not good for everything, and we are finding that a mixture of JSON-LD with RDF-JSON is much more useful than either one of them alone. (We are also using RDFa, but that is a separate story.) We do not think W3C should try to decide whether RDF-JSON or JSON-LD is better, and picking one as a winner. That would be very similar to trying to decide whether dictionary/hash/associative-array (RDF-JSON) is better than list/array (JSON-LD) and forcing a programming language to have only one of them. We'd like to see W3C recognize both, perhaps with some tutorial material that shows when to use each.

A feature you may have overlooked in JSON-LD is Data Indexing [1]. Data indexing does allow you to organize your data to use key-based access to values. Note that this addresses values of properties, rather than the subjects themselves. This is most useful when you might have several values that are separated by some context, such as the language of a literal.

However, what you're really getting to is the fact that subjects are all in separate objects within an array, making it impossible to index by subject IRI. Note that there is actually an internal format [2] that does just this, which is used as part of the flattening algorithm [3].

The group did spend time considering a concept called subject maps [4]. This would have created a structure very similar to RDF/JSON's, for example:

{
 "@context": {
   "ex": "http://example.com/",
   "name": "http://xmlns.com/foaf/0.1/name",
   "homepage": { "@id": "http://xmlns.com/foaf/0.1/homepage", "@type": "@id" },
   "knows": { "@id": "http://xmlns.com/foaf/0.1/knows", "@container": "@id" }
 },
 "@id": "ex:Markus",
 "name": "Markus",
 "homepage": "http://www.markus-lanthaler.com/",
 "knows": {
   "ex:Niklas": {
     "name": "Niklas",
     "homepage": "http://neverspace.net/"
   }
 }
}

In this case, the node with IRI ex:Markus has a foaf:knows relationship with ex:Niklas, which has as it's values the properties of Niklas. We debated this feature for some time, and resolved to not support it for JSON-LD 1.0 [5]. Ultimately, the feature didn't have enough support, but there were no technical grounds to not include it (other than the principle of minimalism). With documents about to go out as LC, we can't really add such a feature now. However, if this were to come in as an LC comment, and there were sufficient other reasons to do so, the documents could be taken back to the WG for a second LC. Obviously, no one really want's to do that, but if you have a use case that really demands it, I think it would be better to do this.

As an alternative, there are a number of following thoughts for JSON-LD which could be done in the form of a note. I could see a note on the use of "@container": "@id" to provide a subject map capability, but we'd need to consider the implications for expansion and compaction. I think this would be a more consistent means of providing such a feature which has a migration path to JSON-LD.

The mechanism that Niklas and I favored is about connecting a graph in memory. I find the existing array approach to describing nodes is very compatible with the Backbone.js collection interface. In my work, I treat the result set as an array of resources, which are turned into models. Then I have a process of replacing node references with in-memory references to the objects. This allows the entire graph to be explored within JavaScript. I have to say, using such mechanisms, I've found JSON-LD fairly easy to use in JavaScript.

With Backbone.js, it's fairly trivial code to do such access, but it does of course create an in-memory index based on the identifiers. (Examples at the end of the slide deck).

> Regards. 
> --
> Arnaud  Le Hors - Software Standards Architect - IBM Software Group

Gregg

[1] http://json-ld.org/spec/latest/json-ld/#data-indexing
[2] http://json-ld.org/spec/latest/json-ld-api/#node-map-generation
[3] http://json-ld.org/spec/latest/json-ld-api/#flattening-algorithm
[4] https://github.com/json-ld/json-ld.org/issues/134
[5] http://json-ld.org/minutes/2012-09-04/#resolution-1
Received on Wednesday, 10 April 2013 20:31:49 UTC