Re: JSON-LD should be an RDF syntax from David Booth on 2013-05-21 (public-rdf-comments@w3.org from May 2013)

From: David Booth <david@dbooth.org>
Date: Tue, 21 May 2013 09:38:27 -0400
To: Manu Sporny <msporny@digitalbazaar.com>
CC: public-rdf-comments@w3.org
Message-ID: <519B78D3.1030900@dbooth.org>
Hi Manu,

What is the status of these comments?  I have not received any official 
response to them and I see nothing listed on the RDF issues list for 
them -- either open or closed:
http://www.w3.org/2011/rdf-wg/track/issues
Are these being tracked somewhere else?  Are they scheduled to be 
addressed later?

Also, one other point came to my attention since submitting those comments:

  - I notice that the RDF working group charter specifically *requires* 
that JSON-LD be a syntax for RDF.  Under "Required features", section 
2.2 states:
http://www.w3.org/2010/09/rdf-wg-charter.html#turtle
"Define and standardize a JSON Syntax for RDF".  To be blunt, JSON-LD is 
not a "syntax for RDF" unless it is *normatively* based on the RDF 
semantics.  Otherwise, it has no more official status as a "syntax for 
RDF" than any other format -- XML, CSV or whatever -- that *could* be 
mapped to RDF by someone's favorite mapping.  So apparently my comments 
below accord with the charter.

Thanks,
David

On 04/24/2013 10:52 AM, David Booth wrote:
> Hi Manu,
>
> Thanks for your remarks.  I don't agree with all of them, and just for
> completeness I'll note in-line below which ones and why, but rather than
> focus on those details I think it would be better to discuss this at a
> higher level, because you brought up a very interest point about
> potentially skolemizing blank nodes, and I think that raises the
> possibility of a different path for addressing the issue that JSON-LD
> should be an RDF syntax.
>
> To my mind, the central problem that needs to be addressed is that, at
> present, the draft of JSON-LD
> http://dvcs.w3.org/hg/json-ld/raw-file/default/spec/latest/json-ld/index.html
>
> reads as an attempt to divorce Linked Data (and JSON-LD) from RDF.  This
> is evidenced in several places throughout the document.  For example,
> the definition of Linked Data in the introduction fails to mention RDF
> at all:
> [[
> Linked Data is a technique for creating a network of inter-connected
> data across different documents and Web sites. In general, Linked Data
> has four properties: 1) it uses IRIs to name things; 2) it uses HTTP
> IRIs for those names; 3) the name IRIs, when dereferenced, provide more
> information about the thing; and 4) the data expresses links to data on
> other Web sites.
> ]]
>
> I suggest fixing that omission by inserting the words "based on RDF"
> into the first sentence, to read: "Linked Data is a technique, based on
> RDF, for creating a network of inter-connected data across different
> documents and Web sites."
>
> The same sentiment of divorcing JSON-LD from RDF is evidenced in other
> places in the document as well, such as in phrases like "converted to
> RDF", and in the definition of a JSON-LD data model that is completely
> separate from the standard RDF data model, complete with parallel terms
> such a "blank node" and "blank node identifier".  Left as is, the world
> would have parallel and competing standards for Linked Data: those based
> on JSON-LD and its data model, blank nodes, etc., and those based on RDF
> and its data model, blank nodes, etc., because JSON-LD is *not* RDF.
>
> One might claim that JSON-LD *can* be used as a serialization of RDF,
> and therefore JSON-LD *is* already based on RDF.  But that argument does
> not hold water, because that same claim can be made of *any* language!
>   *Any* language can be viewed as a serialization of RDF, given an
> appropriate mapping.  Indeed, the whole purpose of GRDDL was to enable
> such mappings to be easily defined from XML and HTML.  Many people have
> defined mappings from CSV to RDF, and from many other things to RDF.  We
> do not need a JSON syntax that *can* be mapped to RDF.  We need a JSON
> syntax that *is* a standard serialization of RDF, based on the RDF data
> model and RDF semantics -- not a parallel (but subtly different) data
> model, terminology and semantics.   JSON-LD at present defines a
> parallel universe that looks confusingly similar to RDF -- even
> co-opting terms such as "blank node".
>
> I am sympathetic to -- and fully support -- the goal of making JSON-LD
> easy for people to use, **without knowing anything more about RDF than
> what they learn about JSON-LD,**.  But I also think it is critical that
> JSON-LD still be normatively based on RDF and grounded in the RDF data
> model and semantics.  And I think it is also pretty clear in the charter
> http://www.w3.org/2012/ldp/charter
> that the work of the group was intended to be **based on RDF** -- not
> "inspired by RDF" or "similar to RDF" or "addressing the same goals as
> RDF".   In other words, the LD working group should define a JSON-based
> "RDF serialization syntax", as the charter calls it.
>
> Can the group achieve both of these aims?  I think so.  And I think one
> way to achieve it would be to define a normative mapping between the
> JSON syntax and the RDF abstract syntax, by using skolemization in
> places where prohibited blank nodes would otherwise appear, such as in
> the predicate position of an RDF triple.
>
> Specific suggestions:
>
> 1. Insert "based on RDF" to the definition of Linked Data, as explained
> above.
>
> 2. Define a *normative* bi-directional mapping of a JSON profile to and
> from the RDF abstract syntax, so that the JSON profile *is* a
> serialization of RDF, and is fully grounded in the RDF data model and
> semantics.
>
> 3. Use skolemized URIs in the normative mapping to prevent mapping JSON
> syntax to illegal RDF.
>
> 4. Make editorial changes to avoid implying that JSON-LD is not RDF. For
> example, change "Convert to RDF" to "Convert to Turtle" or perhaps
> "Convert to RDF Abstract Syntax".
>
> 5. Define normative names for, and clearly differentiate between, the
> JSON serialization of RDF and JSON-LD, such that JSON-LD *is* a JSON
> serialization of RDF, with additional constraints for Linked Data (such
> as URIs use "http:" prefix, etc.).  They do not necessarily have to be
> defined in two separate documents.  They could be defined in a single
> document called "JSON-RDF and JSON-LD", for example.  People that use
> the JSON RDF serialization for purposes other than Linked Data need to
> be able to easily and clearly talk about that serialization *without*
> wrongly implying adherence to the additional Linked Data requirements
> imposed by JSON-LD, and *without* having to explain that those
> requirements can be ignored in this case.
>
> If there is one thing we all should have learned from the Semantic Web,
> it is the value of assigning an unambiguous name to every important
> concept.  A JSON serialization of RDF is a *very* important concept and
> deserves its own unambiguous name, distinct from JSON-LD.
>
> BTW, regarding the name "JSON-RDF", when I first read your response at
> http://lists.w3.org/Archives/Public/public-rdf-comments/2013Mar/0036.html
> saying "We couldn't use JSON-RDF because a variation on the name was
> already taken", I assumed you meant that RDF/JSON was a defunct
> proposal, and I was going to suggest that if it is defunct, then there
> would be little harm in noting it as defunct, and using the term
> "JSON-RDF".  But when I view the RDF/JSON document now at
> https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-json/index.html#
> I see it is dated "24 April 2013" and it says that is is a product of
> the RDF working group.  So what's going on?  Why is the W3C
> standardizing *two* potential JSON serializations of RDF?  How are they
> related or different?  If this is a W3C activity then these activities
> should be coordinated, *one* should be picked -- that's what standards
> are for -- and the name JSON-RDF can be used for that one.
>
> 6. Some small editorial fixes:
>
> "Since JSON-LD is 100% compatible with JSON" would be better phrased as
> "Since JSON-LD is a restricted form of JSON", because saying that
> JSON-LD is compatible with JSON wrongly suggests that JSON-LD is *not*
> JSON, when in fact it is.
>
> s/secrete agents/secret agents/
>
> Thanks,
> David Booth
>
>
> On 04/09/2013 09:59 AM, Manu Sporny wrote:
>> Apologies for the delayed response David, been slammed, responses to
>> your responses below...
>>
>> On 03/26/2013 04:22 PM, David Booth wrote:
>>>> In JSON-LD graph names can be IRIs or blank nodes whereas in RDF
>>>> graph names have to be IRIs.
>>>
>>> Blank nodes already cause more grief than any other RDF feature.  I
>>> do not think it makes sense to promote or condone the expansion of
>>> their use.  Where are the compelling use cases for this?
>>
>> In JSON, there are implicit blank nodes everywhere, both as predicates,
>> and as subject identifiers. So, the compelling use case is interpreting
>> JSON as RDF. The other major compelling use case is to not require JSON
>> developers to change their workflow by forcing them to give everything
>> an IRI identifier.
>>
>> You may not know this, but I took your exact position when designing
>> JSON-LD. The first version of JSON-LD did not support blank nodes and I
>> took a pretty hard line stance because of the complexity that they
>> introduce. However, experience showed us that this was the wrong
>> position to take. There are many cases that we've hit during the
>> development of JSON-LD where it became pretty obvious that having blank
>> nodes would simplify the markup for the vast majority of Web developers.
>
> I guess I was not clear enough.  I was specifically talking about the
> use of blank nodes as graph names -- not uses of blank nodes that are
> permitted in RDF.
>
>>
>>>> In JSON-LD properties can be IRIs or blank nodes whereas in RDF
>>>> properties (predicates) have to be IRIs.
>>>
>>> Ditto.  Blank nodes already cause more grief than any other RDF
>>> feature. I do not think it makes sense to promote or condone the
>>> expansion of their use.  Where are the compelling use cases for
>>> this?
>>
>> See above.
>
> Again, I guess I was not clear enough.  I was specifically talking about
> the use of blank nodes as properties -- not uses of blank nodes that are
> permitted in RDF.
>
>>
>>>> In JSON-LD lists are part of the data model whereas in RDF they are
>>>> part of a vocabulary, namely [RDF-SCHEMA].
>>>
>>> That would make JSON-LD *not* be a superset of RDF.  While I agree
>>> with the goal of making lists easier to use in RDF -- I think that
>>> would be great -- I think it is important not to deviate from the RDF
>>> model.
>>
>> JSON-LD is a super-set of RDF You can still express lists as rdf:first /
>> rdf:rest statements in JSON-LD. When you convert JSON-LD native lists to
>> RDF, the rdf:first / rdf:rest pattern is used.
>
> Oh, okay.
>
>>
>>>> The JSON-LD CG felt that these features were compelling enough to
>>>> keep them in the specification in the hopes that RDF will
>>>> eventually align with the data model. We tried to do this in a way
>>>> that was acceptable to the RDF community.
>>>
>>> But failed "due to the colorful variety of opinions on the matter"?
>>> How can you have it both ways, except by acknowledging that this
>>> splinters the community?
>>
>> Both the JSON-LD CG and the RDF WG have agreed on a compromise. If there
>> is agreement on the path forward, we are not splintering the community.
>>
>>>> The other reason that we define a data model in JSON-LD is to make
>>>> it easier for developers to pick up on Linked Data concepts without
>>>> having to climb the very steep learning curve brought about by
>>>> having to read the myriad of RDF specifications.
>>>
>>> That sounds fine and useful *provided* that JSON-LD is consistent
>>> with RDF.  At present it isn't.
>>
>> That's an all-or-nothing strategy. We have opted for a strategy of
>> logical compromise and consensus. The consensus has led to a very simple
>> to understand data model (JSON-LD) that is a gateway into RDF for Web
>> developers. It solves a long-standing problem in the RDF community.
>
> A conformance strategy *should* be all-or-nothing, or at least as close
> to it as it can be.  (Though I imagine you think it *is* as close as it
> can be!)
>
>>
>>>> What we did find consensus around was to allow JSON-LD to deviate
>>>> in very specific ways in an attempt to gain some implementation
>>>> insight as to whether or not these extensions to RDF were worth
>>>> pursuing in RDF 2.0.
>>>
>>> Field experience can and should be obtained by *vendor* *extensions*
>>> -- not by standardizing N competing RDF-like languages (even if N==2)
>>> and letting those standards fight it out in the marketplace.  I do
>>> not believe that it would be in the best interest of the RDF
>>> community or the W3C to fracture the market by standardizing multiple
>>> competing RDF-like languages.
>>
>> RDF isn't a language, it's a data model.
>
> I disagree, but it is probably immaterial to this discussion.  Yes, RDF
> is a data model, but it is also a language, expressed in an abstract
> syntax.
>
>> As for the languages, RDF
>> already has a variety of syntaxes that have different data models -
>> Microdata, N3, JSON-LD. That ship has already sailed, the important
>> thing is to make sure that these extensions are created and defined in a
>> way that can be folded back into the RDF data model if successful.
>
> I don't know what you're claiming here, but to my mind that ship has
> *not* sailed.  Microdata was not intended to be an RDF serialization,
> though it can transformed into conforming RDF.   But the fact that
> something can be transformed into RDF is not saying much, because
> *anything* can be transformed into RDF.  GRDDL provides a convenient way
> to transform XML into RDF, for example.
>
>>
>> For example, the native lists datatype in TURTLE and JSON-LD is now a
>> strong indicator that the RDF 2.0 data model should probably have a
>> native lists type.
>>
>>>> JSON-LD is about a JSON serialization for Linked Data. Linked Data
>>>> typically asks that IRIs are dereferenceable so that more
>>>> information can be gleaned from the identifiers. The spec doesn't,
>>>> however, require that all IRIs used in JSON-LD are dereferencable.
>>>
>>> Apologies, I was not clear.  The reason I said that a JSON
>>> serialization of RDF should not require IRIs to be dereferenceable --
>>> even as a "SHOULD" requirement -- is because I am distinguishing
>>> between a serialization of RDF *in* *general* -- not specifically for
>>> Linked Data -- and a serialization of RDF that is intended
>>> specifically for Linked Data.  As my original comment goes on to say,
>>> I think it is important to cleanly layer one spec on another, and
>>> there are *many* non-LD RDF uses that would benefit from a JSON
>>> serialization of RDF.
>>
>> I don't see how the JSON-LD specification prevents uses?
>>
>>>> We couldn't use JSON-RDF because a variation on the name was
>>>> already taken:
>>>
>>> Sorry for being unclear.  My point was not so much about the name,
>>> but about the concept of defining a JSON serialization of RDF *in*
>>> *general* -- not just for Linked Data -- and then defining an LD
>>> version on top of that.
>>
>> We had tried this approach at one point, but the "Linked Data" spec
>> ended up being so small that we just folded it back into JSON-LD. There
>> was no reason to have a tiny 10 page spec that just modified the
>> underlying "JSON-RDF" mechanism by effectively re-writing portions of
>> the spec. It would confuse Web developers and create much bouncing about
>> between the JSON-RDF and JSON-LD specs.
>
> A JSON-LD spec should not in any way *modify* the underlying JSON-RDF
> spec -- just cleanly layer on top of it.
>
> I'm not convinced that this "bouncing about" issue could not be
> addressed in other ways, such as through tutorials or even automatically
> transcluding portions of the JSON-RDF spec in the JSON-LD spec.
>
>>
>>>> What prevents these applications from using JSON-LD?
>>>
>>> If I have an RDF application, and I want it to accept a JSON
>>> serialization of RDF, what must I tell my customers?  "It accepts
>>> JSON-LD *except* that it does not support the following
>>> non-standard-RDF features, ... blah blah blah ... and furthermore the
>>> application does *not* expect all of your IRIs to be dereferenceable,
>>> because this is merely an RDF application -- not a Linked Data
>>> application -- but the W3C did not define a JSON serialization of
>>> RDF, so we had to use JSON-LD instead."   Fail.
>>
>> That is an especially atrocious way to communicate with your
>> customers. :)
>>
>> You don't have to tell your customers any of this. You just tell them
>> that you accept JSON-LD and if you see a blank node in the graph or
>> predicate position, you either use a database that supports that, or you
>> skolemize if not.
>>
>>> It would be far better if I could simply say: "the app accepts
>>> JSON-RDF" (where I'm using the term JSON-RDF to mean a JSON
>>> serialization of RDF, but I don't really care if it is called
>>> "JSON-RDF").  And it would be so easy for the working group to
>>> instead simply define a JSON serialization *of* *RDF*, and then
>>> define a Linked Data serialization on *top* of that.  This would
>>> provide a clean layering, a clean separation of concerns: plenty of
>>> upside, and almost no downside.
>>
>> I disagree, we tried this and it ended up being a terrible approach.
>>
>>>> We had explored this idea very early in the JSON-LD days and came
>>>> to the conclusion that JSON developers don't work with their data
>>>> in this way. That is, for the vast majority of the in-the-wild JSON
>>>> markup we looked at, JSON developers did not use any sort of
>>>> triple-based mechanism to work with their data. Rather, they used
>>>> JSON objects - simple key-value pairs to work with their data. This
>>>> design paradigm was the one that was used for JSON-LD because it
>>>> was the one that developers were (and still are) using when they
>>>> use JSON.
>>>
>>> The fact that developers don't use triples is completely irrelevant.
>>
>> If you think this, you are missing one of the core insights that the
>> JSON-LD builds upon.
>>
>>> Developers are free to use any internal data representation they
>>> want when they use RDF -- including hash tables, objects, whatever.
>>
>> Sure, but most of them just want something that works with their
>> language of choice. JSON-LD just works with their language of choice.
>>
>> A common mistake that many smart developers and designers make is
>> assuming that since it's fairly obvious what a proper data
>> representation should look like that it's going to be just as obvious to
>> most Web developers. The reality is that most Web developers just want
>> something that works and don't want to think about the solution too
>> deeply because they have a thousand other things related to their
>> application that they have to think about.
>>
>> JSON-LD is about reducing the cognitive load placed on developers that
>> want to build a Linked Data application (and effectively use RDF).
>> Saying that developers are "free to use any internal data representation
>> they want" ignores the the fact that it places an unnecessary cognitive
>> load on them when the choice doesn't need to be made by developers in
>> the vast majority of cases.
>>
>>> The LDP working group can perfectly well define JSON-LD **in terms
>>> of** a JSON serialization of RDF.  The difference between the two:
>>> The JSON serialization of RDF would be just that -- a serialization
>>> of RDF, just as Turtle or NTriples are serializations of RDF.
>>
>> But TURTLE has it's own native list type, doesn't it?
>
> Only the surface syntax -- not the model.
>
>> So, it doesn't
>> even fit your definition you use earlier in this e-mail. What about N3,
>> which has Formulae and Literal subjects?
>
> N3 is not a W3C standard.
>
>>
>>> Whereas JSON-LD would place further restrictions on that
>>> serialization to specifically support the needs of the Linked Data
>>> Platform, such as saying that every IRI SHOULD be de-referenceable to
>>> information about the identified resource.
>>
>> If an application wants to break that suggestion, it can. That's why
>> it's a SHOULD and not a MUST.
>>
>> -- manu
>>
Received on Tuesday, 21 May 2013 13:38:59 UTC