Re: JSON-LD should be an RDF syntax from David Booth on 2013-04-24 (public-rdf-comments@w3.org from April 2013)

From: David Booth <david@dbooth.org>
Date: Wed, 24 Apr 2013 10:52:03 -0400
To: Manu Sporny <msporny@digitalbazaar.com>
CC: public-rdf-comments@w3.org
Message-ID: <5177F193.70304@dbooth.org>
Hi Manu,

Thanks for your remarks.  I don't agree with all of them, and just for 
completeness I'll note in-line below which ones and why, but rather than 
focus on those details I think it would be better to discuss this at a 
higher level, because you brought up a very interest point about 
potentially skolemizing blank nodes, and I think that raises the 
possibility of a different path for addressing the issue that JSON-LD 
should be an RDF syntax.

To my mind, the central problem that needs to be addressed is that, at 
present, the draft of JSON-LD
http://dvcs.w3.org/hg/json-ld/raw-file/default/spec/latest/json-ld/index.html
reads as an attempt to divorce Linked Data (and JSON-LD) from RDF.  This 
is evidenced in several places throughout the document.  For example, 
the definition of Linked Data in the introduction fails to mention RDF 
at all:
[[
Linked Data is a technique for creating a network of inter-connected 
data across different documents and Web sites. In general, Linked Data 
has four properties: 1) it uses IRIs to name things; 2) it uses HTTP 
IRIs for those names; 3) the name IRIs, when dereferenced, provide more 
information about the thing; and 4) the data expresses links to data on 
other Web sites.
]]

I suggest fixing that omission by inserting the words "based on RDF" 
into the first sentence, to read: "Linked Data is a technique, based on 
RDF, for creating a network of inter-connected data across different 
documents and Web sites."

The same sentiment of divorcing JSON-LD from RDF is evidenced in other 
places in the document as well, such as in phrases like "converted to 
RDF", and in the definition of a JSON-LD data model that is completely 
separate from the standard RDF data model, complete with parallel terms 
such a "blank node" and "blank node identifier".  Left as is, the world 
would have parallel and competing standards for Linked Data: those based 
on JSON-LD and its data model, blank nodes, etc., and those based on RDF 
and its data model, blank nodes, etc., because JSON-LD is *not* RDF.

One might claim that JSON-LD *can* be used as a serialization of RDF, 
and therefore JSON-LD *is* already based on RDF.  But that argument does 
not hold water, because that same claim can be made of *any* language! 
  *Any* language can be viewed as a serialization of RDF, given an 
appropriate mapping.  Indeed, the whole purpose of GRDDL was to enable 
such mappings to be easily defined from XML and HTML.  Many people have 
defined mappings from CSV to RDF, and from many other things to RDF.  We 
do not need a JSON syntax that *can* be mapped to RDF.  We need a JSON 
syntax that *is* a standard serialization of RDF, based on the RDF data 
model and RDF semantics -- not a parallel (but subtly different) data 
model, terminology and semantics.   JSON-LD at present defines a 
parallel universe that looks confusingly similar to RDF -- even 
co-opting terms such as "blank node".

I am sympathetic to -- and fully support -- the goal of making JSON-LD 
easy for people to use, **without knowing anything more about RDF than 
what they learn about JSON-LD,**.  But I also think it is critical that 
JSON-LD still be normatively based on RDF and grounded in the RDF data 
model and semantics.  And I think it is also pretty clear in the charter
http://www.w3.org/2012/ldp/charter
that the work of the group was intended to be **based on RDF** -- not 
"inspired by RDF" or "similar to RDF" or "addressing the same goals as 
RDF".   In other words, the LD working group should define a JSON-based 
"RDF serialization syntax", as the charter calls it.

Can the group achieve both of these aims?  I think so.  And I think one 
way to achieve it would be to define a normative mapping between the 
JSON syntax and the RDF abstract syntax, by using skolemization in 
places where prohibited blank nodes would otherwise appear, such as in 
the predicate position of an RDF triple.

Specific suggestions:

1. Insert "based on RDF" to the definition of Linked Data, as explained 
above.

2. Define a *normative* bi-directional mapping of a JSON profile to and 
from the RDF abstract syntax, so that the JSON profile *is* a 
serialization of RDF, and is fully grounded in the RDF data model and 
semantics.

3. Use skolemized URIs in the normative mapping to prevent mapping JSON 
syntax to illegal RDF.

4. Make editorial changes to avoid implying that JSON-LD is not RDF. 
For example, change "Convert to RDF" to "Convert to Turtle" or perhaps 
"Convert to RDF Abstract Syntax".

5. Define normative names for, and clearly differentiate between, the 
JSON serialization of RDF and JSON-LD, such that JSON-LD *is* a JSON 
serialization of RDF, with additional constraints for Linked Data (such 
as URIs use "http:" prefix, etc.).  They do not necessarily have to be 
defined in two separate documents.  They could be defined in a single 
document called "JSON-RDF and JSON-LD", for example.  People that use 
the JSON RDF serialization for purposes other than Linked Data need to 
be able to easily and clearly talk about that serialization *without* 
wrongly implying adherence to the additional Linked Data requirements 
imposed by JSON-LD, and *without* having to explain that those 
requirements can be ignored in this case.

If there is one thing we all should have learned from the Semantic Web, 
it is the value of assigning an unambiguous name to every important 
concept.  A JSON serialization of RDF is a *very* important concept and 
deserves its own unambiguous name, distinct from JSON-LD.

BTW, regarding the name "JSON-RDF", when I first read your response at
http://lists.w3.org/Archives/Public/public-rdf-comments/2013Mar/0036.html
saying "We couldn't use JSON-RDF because a variation on the name was 
already taken", I assumed you meant that RDF/JSON was a defunct 
proposal, and I was going to suggest that if it is defunct, then there 
would be little harm in noting it as defunct, and using the term 
"JSON-RDF".  But when I view the RDF/JSON document now at
https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-json/index.html#
I see it is dated "24 April 2013" and it says that is is a product of 
the RDF working group.  So what's going on?  Why is the W3C 
standardizing *two* potential JSON serializations of RDF?  How are they 
related or different?  If this is a W3C activity then these activities 
should be coordinated, *one* should be picked -- that's what standards 
are for -- and the name JSON-RDF can be used for that one.

6. Some small editorial fixes:

"Since JSON-LD is 100% compatible with JSON" would be better phrased as 
"Since JSON-LD is a restricted form of JSON", because saying that 
JSON-LD is compatible with JSON wrongly suggests that JSON-LD is *not* 
JSON, when in fact it is.

s/secrete agents/secret agents/

Thanks,
David Booth


On 04/09/2013 09:59 AM, Manu Sporny wrote:
> Apologies for the delayed response David, been slammed, responses to
> your responses below...
>
> On 03/26/2013 04:22 PM, David Booth wrote:
>>> In JSON-LD graph names can be IRIs or blank nodes whereas in RDF
>>> graph names have to be IRIs.
>>
>> Blank nodes already cause more grief than any other RDF feature.  I
>> do not think it makes sense to promote or condone the expansion of
>> their use.  Where are the compelling use cases for this?
>
> In JSON, there are implicit blank nodes everywhere, both as predicates,
> and as subject identifiers. So, the compelling use case is interpreting
> JSON as RDF. The other major compelling use case is to not require JSON
> developers to change their workflow by forcing them to give everything
> an IRI identifier.
>
> You may not know this, but I took your exact position when designing
> JSON-LD. The first version of JSON-LD did not support blank nodes and I
> took a pretty hard line stance because of the complexity that they
> introduce. However, experience showed us that this was the wrong
> position to take. There are many cases that we've hit during the
> development of JSON-LD where it became pretty obvious that having blank
> nodes would simplify the markup for the vast majority of Web developers.

I guess I was not clear enough.  I was specifically talking about the 
use of blank nodes as graph names -- not uses of blank nodes that are 
permitted in RDF.

>
>>> In JSON-LD properties can be IRIs or blank nodes whereas in RDF
>>> properties (predicates) have to be IRIs.
>>
>> Ditto.  Blank nodes already cause more grief than any other RDF
>> feature. I do not think it makes sense to promote or condone the
>> expansion of their use.  Where are the compelling use cases for
>> this?
>
> See above.

Again, I guess I was not clear enough.  I was specifically talking about 
the use of blank nodes as properties -- not uses of blank nodes that are 
permitted in RDF.

>
>>> In JSON-LD lists are part of the data model whereas in RDF they are
>>> part of a vocabulary, namely [RDF-SCHEMA].
>>
>> That would make JSON-LD *not* be a superset of RDF.  While I agree
>> with the goal of making lists easier to use in RDF -- I think that
>> would be great -- I think it is important not to deviate from the RDF
>> model.
>
> JSON-LD is a super-set of RDF You can still express lists as rdf:first /
> rdf:rest statements in JSON-LD. When you convert JSON-LD native lists to
> RDF, the rdf:first / rdf:rest pattern is used.

Oh, okay.

>
>>> The JSON-LD CG felt that these features were compelling enough to
>>> keep them in the specification in the hopes that RDF will
>>> eventually align with the data model. We tried to do this in a way
>>> that was acceptable to the RDF community.
>>
>> But failed "due to the colorful variety of opinions on the matter"?
>> How can you have it both ways, except by acknowledging that this
>> splinters the community?
>
> Both the JSON-LD CG and the RDF WG have agreed on a compromise. If there
> is agreement on the path forward, we are not splintering the community.
>
>>> The other reason that we define a data model in JSON-LD is to make
>>> it easier for developers to pick up on Linked Data concepts without
>>> having to climb the very steep learning curve brought about by
>>> having to read the myriad of RDF specifications.
>>
>> That sounds fine and useful *provided* that JSON-LD is consistent
>> with RDF.  At present it isn't.
>
> That's an all-or-nothing strategy. We have opted for a strategy of
> logical compromise and consensus. The consensus has led to a very simple
> to understand data model (JSON-LD) that is a gateway into RDF for Web
> developers. It solves a long-standing problem in the RDF community.

A conformance strategy *should* be all-or-nothing, or at least as close 
to it as it can be.  (Though I imagine you think it *is* as close as it 
can be!)

>
>>> What we did find consensus around was to allow JSON-LD to deviate
>>> in very specific ways in an attempt to gain some implementation
>>> insight as to whether or not these extensions to RDF were worth
>>> pursuing in RDF 2.0.
>>
>> Field experience can and should be obtained by *vendor* *extensions*
>> -- not by standardizing N competing RDF-like languages (even if N==2)
>> and letting those standards fight it out in the marketplace.  I do
>> not believe that it would be in the best interest of the RDF
>> community or the W3C to fracture the market by standardizing multiple
>> competing RDF-like languages.
>
> RDF isn't a language, it's a data model.

I disagree, but it is probably immaterial to this discussion.  Yes, RDF 
is a data model, but it is also a language, expressed in an abstract syntax.

> As for the languages, RDF
> already has a variety of syntaxes that have different data models -
> Microdata, N3, JSON-LD. That ship has already sailed, the important
> thing is to make sure that these extensions are created and defined in a
> way that can be folded back into the RDF data model if successful.

I don't know what you're claiming here, but to my mind that ship has 
*not* sailed.  Microdata was not intended to be an RDF serialization, 
though it can transformed into conforming RDF.   But the fact that 
something can be transformed into RDF is not saying much, because 
*anything* can be transformed into RDF.  GRDDL provides a convenient way 
to transform XML into RDF, for example.

>
> For example, the native lists datatype in TURTLE and JSON-LD is now a
> strong indicator that the RDF 2.0 data model should probably have a
> native lists type.
>
>>> JSON-LD is about a JSON serialization for Linked Data. Linked Data
>>> typically asks that IRIs are dereferenceable so that more
>>> information can be gleaned from the identifiers. The spec doesn't,
>>> however, require that all IRIs used in JSON-LD are dereferencable.
>>
>> Apologies, I was not clear.  The reason I said that a JSON
>> serialization of RDF should not require IRIs to be dereferenceable --
>> even as a "SHOULD" requirement -- is because I am distinguishing
>> between a serialization of RDF *in* *general* -- not specifically for
>> Linked Data -- and a serialization of RDF that is intended
>> specifically for Linked Data.  As my original comment goes on to say,
>> I think it is important to cleanly layer one spec on another, and
>> there are *many* non-LD RDF uses that would benefit from a JSON
>> serialization of RDF.
>
> I don't see how the JSON-LD specification prevents uses?
>
>>> We couldn't use JSON-RDF because a variation on the name was
>>> already taken:
>>
>> Sorry for being unclear.  My point was not so much about the name,
>> but about the concept of defining a JSON serialization of RDF *in*
>> *general* -- not just for Linked Data -- and then defining an LD
>> version on top of that.
>
> We had tried this approach at one point, but the "Linked Data" spec
> ended up being so small that we just folded it back into JSON-LD. There
> was no reason to have a tiny 10 page spec that just modified the
> underlying "JSON-RDF" mechanism by effectively re-writing portions of
> the spec. It would confuse Web developers and create much bouncing about
> between the JSON-RDF and JSON-LD specs.

A JSON-LD spec should not in any way *modify* the underlying JSON-RDF 
spec -- just cleanly layer on top of it.

I'm not convinced that this "bouncing about" issue could not be 
addressed in other ways, such as through tutorials or even automatically 
transcluding portions of the JSON-RDF spec in the JSON-LD spec.

>
>>> What prevents these applications from using JSON-LD?
>>
>> If I have an RDF application, and I want it to accept a JSON
>> serialization of RDF, what must I tell my customers?  "It accepts
>> JSON-LD *except* that it does not support the following
>> non-standard-RDF features, ... blah blah blah ... and furthermore the
>> application does *not* expect all of your IRIs to be dereferenceable,
>> because this is merely an RDF application -- not a Linked Data
>> application -- but the W3C did not define a JSON serialization of
>> RDF, so we had to use JSON-LD instead."   Fail.
>
> That is an especially atrocious way to communicate with your customers. :)
>
> You don't have to tell your customers any of this. You just tell them
> that you accept JSON-LD and if you see a blank node in the graph or
> predicate position, you either use a database that supports that, or you
> skolemize if not.
>
>> It would be far better if I could simply say: "the app accepts
>> JSON-RDF" (where I'm using the term JSON-RDF to mean a JSON
>> serialization of RDF, but I don't really care if it is called
>> "JSON-RDF").  And it would be so easy for the working group to
>> instead simply define a JSON serialization *of* *RDF*, and then
>> define a Linked Data serialization on *top* of that.  This would
>> provide a clean layering, a clean separation of concerns: plenty of
>> upside, and almost no downside.
>
> I disagree, we tried this and it ended up being a terrible approach.
>
>>> We had explored this idea very early in the JSON-LD days and came
>>> to the conclusion that JSON developers don't work with their data
>>> in this way. That is, for the vast majority of the in-the-wild JSON
>>> markup we looked at, JSON developers did not use any sort of
>>> triple-based mechanism to work with their data. Rather, they used
>>> JSON objects - simple key-value pairs to work with their data. This
>>> design paradigm was the one that was used for JSON-LD because it
>>> was the one that developers were (and still are) using when they
>>> use JSON.
>>
>> The fact that developers don't use triples is completely irrelevant.
>
> If you think this, you are missing one of the core insights that the
> JSON-LD builds upon.
>
>> Developers are free to use any internal data representation they
>> want when they use RDF -- including hash tables, objects, whatever.
>
> Sure, but most of them just want something that works with their
> language of choice. JSON-LD just works with their language of choice.
>
> A common mistake that many smart developers and designers make is
> assuming that since it's fairly obvious what a proper data
> representation should look like that it's going to be just as obvious to
> most Web developers. The reality is that most Web developers just want
> something that works and don't want to think about the solution too
> deeply because they have a thousand other things related to their
> application that they have to think about.
>
> JSON-LD is about reducing the cognitive load placed on developers that
> want to build a Linked Data application (and effectively use RDF).
> Saying that developers are "free to use any internal data representation
> they want" ignores the the fact that it places an unnecessary cognitive
> load on them when the choice doesn't need to be made by developers in
> the vast majority of cases.
>
>> The LDP working group can perfectly well define JSON-LD **in terms
>> of** a JSON serialization of RDF.  The difference between the two:
>> The JSON serialization of RDF would be just that -- a serialization
>> of RDF, just as Turtle or NTriples are serializations of RDF.
>
> But TURTLE has it's own native list type, doesn't it?

Only the surface syntax -- not the model.

> So, it doesn't
> even fit your definition you use earlier in this e-mail. What about N3,
> which has Formulae and Literal subjects?

N3 is not a W3C standard.

>
>> Whereas JSON-LD would place further restrictions on that
>> serialization to specifically support the needs of the Linked Data
>> Platform, such as saying that every IRI SHOULD be de-referenceable to
>> information about the identified resource.
>
> If an application wants to break that suggestion, it can. That's why
> it's a SHOULD and not a MUST.
>
> -- manu
>
Received on Wednesday, 24 April 2013 14:52:35 UTC