Re: Potential Formal Object from DERI over JSON-LD from Gregg Kellogg on 2012-10-18 (public-rdf-wg@w3.org from October 2012)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Thu, 18 Oct 2012 14:59:12 -0400
To: Gavin Carothers <gavin@carothers.name>
CC: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, Michael Hausenblas <michael.hausenblas@deri.org>, RDF WG <public-rdf-wg@w3.org>
Message-ID: <548D8EC0-0256-401E-9643-89271F16966D@greggkellogg.net>
On Oct 18, 2012, at 11:38 AM, Gavin Carothers <gavin@carothers.name> wrote:

> On Thu, Oct 18, 2012 at 11:21 AM, Gregg Kellogg <gregg@greggkellogg.net> wrote:
>> On Oct 18, 2012, at 7:02 AM, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
>> 
>>> There are two questions that I have continued to have about JSON-LD.
>>> 
>>> 1/ Is JSON-LD a serialization syntax for all RDF graphs?
>>> 2/ Is JSON-LD only a serialization syntax for RDF graphs?
>> 
>> TL;DR: Yes to 1., almost to 2.
>> 
>>> Could the interested parties state straight up their answers to these questions?
>>> 
>>> 
>>> The opinions below are mine alone.  I have included them here to give some
>>> rationale as to why I want answers to the above questions to be on record.
>>> 
>>> If the answer to the second question is true, i.e., every JSON-LD structure
>>> corresponds to an RDF graph and there is no more information in the JSON-LD
>>> structure, then it is obvious to me that JSON-LD work should go forward in the
>>> RDF WG.
>>> 
>>> If the answer to the first question is true, i.e, every RDF graph can be
>>> written as a JSON-LD structure and recovered from that structure unchanged,
>>> but not the second, then the situation is somewhat murky.  It seems to me that
>>> there should be some convincing argument why the RDF WG is recommending
>>> something larger than RDF, and the more there is in JSON-LD (ordering, etc.,
>>> etc.) the more convincing this argument has to be.  In this case it may be
>>> better to have some other status for the JSON-LD documents, or even for the
>>> RDF WG to simply point to the JSON-LD documents in one of its documents.
>>> 
>>> If neither are true, then I don't see any reason for the RDF WG be interested
>>> in JSON-LD.
>> 
>> The answer to the first question is definitely true: JSON-LD can represent every RDF graph, and can represent any Dataset in a manner equivalent to TriG. Round tripping from JSON-LD to the RDF abstract model and back can re-create exactly the same JSON-LD (with caveats for native representations of literals and corner-cases of the @container: @language feature (discussed more below).
>> 
>> The same is true in the other direction, taking an arbitrary RDF graph (or dataset), serializing to JSON-LD will yield an equivalent RDF graph (or dataset), modulo native literal representations and BNode identifiers.
>> 
>> The difference in native literal representations (integer, double, boolean) is equivalent to Turtle representation issues.
>> 
>> The answer to the second question "Is JSON-LD only a serialization syntax for RDF graphs", strictly speaking is no.
>> 
>> * JSON-LD syntactically allows greater use of BNodes. A BNode can be used as a property, or as a graph name. This is a consequence of the allowed values being IRI, compact IRI or term, which allow BNodes. I would be fine with adding a caveat that such use is incompatible with the RDF data model, but syntactically restricting these values is more difficult in JSON-LD than Turtle. If necessary, I would also support strengthening the spec to normatively restrict this usage.
>> 
>> * JSON-LD includes other syntactic structures for representing information in a more convenient way for developers: specifically property generators and language containers. We've gone to some lengths to ensure that property generators are fully round-tripable through RDF (see [1] and [2]. Language containers were added recently, and to support full round-tripping within JSON-LD (that is expand and re-compact), syntactic elements are added to node definitions to ensure that the compacted JSON-LD allocates object to each language tag the same way it was originally expressed. This could be considered data-model information that is outside the RDF data model, but I consider it to be a minor syntactic convention specifically to deal with an odd corner case with language containers.
>> 
>> Basically, language containers are intened to support "." access to property values, when the values have language information. For example, consider the following:
>> 
>> {
>>  "@context": {
>>    "label": {
>>      "@id": "http://example.com/label",
>>      "@container": "@language"
>>    }
>>  },
>>  "@id": "http://buckingham.uk/queenie",
>>  "label": {
>>    "en": "The Queen",
>>    "de": "Die Koenigin"
>>  }
>> }
>> 
>> This is semantically equivalent to the following JSON-LD and Turtle:
>> 
>> [
>>  {
>>    "@id": "http://buckingham.uk/queenie",
>>    "http://example.com/label": [
>>      { "@value": "The Queen", "@language": "en" },
>>      { "@value": "Die Königin", "@language": "de" }
>>    ]
>>  }
>> ]
>> 
>> <http://buckingham.uk/queenie> <http://example.com/label">
>>  "The Queen"^^en, "Die Königin"^^de.
> 
> <http://buckingham.uk/queenie> <http://example.com/label>
>  "The Queen"@en, "Die Königin"@de .
> 
> Pedantic, but well, we are nothing if not pedantic. ;)

Not pedantic, just syntactically correct! Thanks.

>> The corner case comes when someone uses the @container: @language form of a property, but adds a node definition as a value (perfectly reasonable):
>> 
>> {
>>  "@context": {
>>    "label": {
>>      "@id": "http://example.com/label",
>>      "@container": "@language"
>>    }
>>  },
>>  "@id": "http://buckingham.uk/queenie",
>>  "label": {
>>    "en": ["The Queen", {"@id": "http://example.com/the_queen"}],
>>    "de": ["Die Königin", {"@id": "http://example.de/die_königin"}]
>>  }
>> }
>> 
>> Representing this as Turtle yields the following:
>> 
>> <http://buckingham.uk/queenie> <http://example.com/label">
>>  "The Queen"^^en, "Die Königin"^^de, <http://example.com/the_queen>, <http://example.de/die_königin>.
>> 
>> Lost is the original association with the language key for each object reference. Of course, this is, at best, poor modeling, but is an important consideration when working strictly within the JSON-LD frame. The solution we've agreed to is to add syntactic information to the node references to retain the original language association:
>> 
>> [
>>  {
>>    "@id": "http://buckingham.uk/queenie",
>>    "http://example.com/label": [
>>      { "@value": "The Queen", "@language": "en" },
>>      { "@value": "Die Königin", "@language": "de" },
>>      {"@id": "http://example.com/the_queen", "@language": "en"},
>>      {"@id": "http://example.de/de_königin", "@language": "de"}
>>    ]
>>  }
>> ]
>> 
> 
> Huh, so that seems to be implying that resources can be tagged with a
> language? Yeah, that's clearly not RDF, and not RDF by rather a lot.
> Would it be reasonable to instead have rules that cause that mean
> something like:
> 
> <http://example.de/de_königin> dc:language "de" .
> 
> I mean, I understand the need to keep track of the language of
> resources, that's just not how you do it in RDF.

Well, it's really intended to just be book-keeping, not really trying to impact the RDF data model.

We did consider representations such as SKOS-XL, and there may be other ways we didn't explore. For my part, I was always pretty ambivalent about requiring JSON-LD round-tripping for features that really represent anti-patterns. I'd like to see the RDF WG give the JSON-LD task force some specific direction on how to deal with cases like this. It would be unfortunate if this means it's no longer a suitable solution for Drupal, but that may be the fallout. We could look for solutions that are more strongly RDF to see if this would work, for example, as Gavin suggest's, if it expanded to the following instead:

[
 {
   "@id": "http://buckingham.uk/queenie",
   "http://example.com/label": [
     { "@value": "The Queen", "@language": "en" },
     { "@value": "Die Königin", "@language": "de" },
     {"http://purl.org/dc/terms/language": {"@id": "http://example.com/the_queen"}},
     {"http://purl.org/dc/terms/language": "de", {"@id": "http://example.de/de_königin}}
   ]
 }
]

This would be equivalent to the following RDF:

<http://buckingham.uk/queenie> <http://example.com/label">
  "The Queen"@en,
  "Die Königin"@de,
  [dc:language "en"; rdf:value <http://example.com/the_queen>], 
  [dc:language "de"; rdf:value <http://example.de/die_königin>].

It's more challenging to round-trip, as you can't distinguish between something expanded as stated JSON-LD, and something that came about because of the expansion rules for language containers. It also builds in dependencies on certain vocabularies into core JSON-LD processing, which I'm not happy with. However, it is better modeled and I think we could probably make this (or similar) work.

Gregg

>> The @language addition to the node references contains no semantic information, but it does allow round-tripping from compact to expanded form and back again.
>> 
>> AFAIK, this is the only area where JSON-LD really extends the RDF data model. If we were to drop this, Drupal would not be able to use JSON-LD, but that may be a necessary concession if the WG insists on strict conformance to the RDF data model.
>> 
>> Gregg
>> 
>> [1] https://github.com/json-ld/json-ld.org/issues/133
>> [2] https://github.com/json-ld/json-ld.org/issues/159
>> 
>>> Peter F. Patel-Schneider
>>> 
>>> 
>>> 
>> 
>>
Received on Thursday, 18 October 2012 18:59:32 UTC