RE: Official response to RDF-ISSUE-132: JSON-LD/RDF Alignment from Markus Lanthaler on 2013-06-08 (public-rdf-comments@w3.org from June 2013)

From: Markus Lanthaler <markus.lanthaler@gmx.net>
Date: Sat, 8 Jun 2013 14:28:29 +0200
To: "'public-rdf-comments'" <public-rdf-comments@w3.org>
Message-ID: <011601ce6443$af68f200$0e3ad600$@lanthaler@gmx.net>
On Friday, June 07, 2013 1:55 AM, David Booth wrote:
> On 05/21/2013 02:19 PM, Manu Sporny wrote:
>> Hopefully it is clear that the decision to leave "based on RDF" out of
>> the Linked Data definition was thoroughly and carefully considered. In
>> the end, the group decided not to tie RDF and Linked Data together
>> because it would be conflating a data publishing concept (Linked Data)
>> with an abstract data model (RDF).
>>
>> In the end, the group decided against tightly coupling Linked Data and
>> RDF because:
>>
>> 1. It would conflate two different concepts.
>
> It is extremely misleading to suggest that tightly coupling Linked Data 
> and RDF "conflates" two different concepts, when the fact is that Linked 
> Data -- in the established sense of the term -- is *based* on RDF.

IMHO, RDF != Linked Data. Nothing in RDF requires IRIs to be dereferenceable
- but of course you can use RDF to express Linked Data if you somehow
communicate out-of-band that those *identifiers* in there are also locators.


> It is clear from reading the JSON-LD group's discussion log
> http://json-ld.org/minutes/2011-07-04/#topic-3
> that the group wanted to avoid reference to RDF, and hence -- exceeding 
> its authority -- the group invented a new definition for "Linked Data" 
> to suit this purpose.  Some individuals even appear to have convinced 
> themselves that this new definition is the *real* definition of the 
> term!  It is not.

I think we are talking about the text in the (non-normative) introduction.
Let me quote it:

   Linked Data is a technique for creating a network of inter-connected
   data across different documents and Web sites. In general, Linked Data
   has four properties:
     1) it uses IRIs to name things;
     2) it uses HTTP IRIs for those names;
     3) the name IRIs, when dereferenced, provide more information about
        the thing; and
     4) the data expresses links to data on other Web sites.
   These properties allow data published on the
   Web to work much like Web pages do today. One can start at one piece
   of Linked Data, and follow the links to other pieces of data that are
   hosted on different sites across the Web.

Here are TBL's (current, 2009) Linked Data principles:

  1) Use URIs as names for things 
  2) Use HTTP URIs so that people can look up those names. 
  3) When someone looks up a URI, provide useful information,
     using the standards (RDF*, SPARQL) 
  4) Include links to other URIs. so that they can discover more things

So I think all we are arguing about here is the "(RDF*, SPARQL)" in (3),
right?
Now let's look at the at the original 2006 version of the Linked Data
principles as Kingsley proposed:

  1) Use URIs as names for things
  2) Use HTTP URIs so that people can look up those names.
  3) When someone looks up a URI, provide useful information.
  4) Include links to other URIs. so that they can discover more things.

http://web.archive.org/web/20061201121454/http://www.w3.org/DesignIssues/Lin
kedData.html

Surprisingly exactly that "(RDF*, SPARQL)" remark was missing when the term
was coined. We can continue forever to argue about whether it is needed or
not. We can also argue whether it is possible to "provide useful
information" by using an abstract data model, i.e., RDF. When you
dereference a URI, you'll get back a representation which is in a concrete
syntax. So, it would be more correct to say 

  3) When someone looks up a URI, provide useful information,
     using a standard format which can be interpreted as RDF

Would that add any value given that you can interpret (convert) every format
to RDF? I doubt so. This group (myself included) is convinced that doing so
would scare of a large portion of the target group, i.e., average web
developers.


> The term "Linked Data" has a well-established meaning within semantic 
> web community.  The JSON-LD group would be *misleading* the public by 
> stating or implying that Linked Data is not necessarily based on RDF.

RDF is an abstract data model whereas Linked Data is a concept. Everything
can be expressed in RDF. In that paragraph we are describing the concept for
people not familiar with it. Clearly, the "semantic web community" is not
the intended target group of that paragraph. Not with the best will in the
world can I see how this is "misleading the public".


> If certain members of the JSON-LD group wish to re-architect Linked Data 
> and the Semantic Web to be based on JSON instead of RDF, they are free 
> to make that *proposal* on their own time, but that is *not* how Linked 
> Data and the Semantic Web are currently architected, and that is not 
> what the RDF working group was chartered to do.  The working group was 
> chartered to "Define and standardize a JSON Syntax for RDF . . . an RDF 
> serialization":
> http://www.w3.org/2011/01/rdf-wg-charter

We are definitely not trying to re-architect Linked Data. What we are trying
to do is to bring it to the masses. I think we agree that the semantic web
community you are talking about has a miserable track record for doing so.
Mentioning RDF in the first paragraph of the spec would certainly not help
us in that regard. Unfortunately, a lot of people simply stop listening when
they hear the three magic letters R D F.

We just try to explain them the underlying principles in simple terms to get
them interested and motivated enough to read the rest. The end of the spec
makes JSON-LD's relationship to RDF crystal clear (IMO at least) and
contains a whole lot of examples for people from the semantic web community
already familiar with e.g. Turtle or RDFa. Those people don't need to read
the introduction, they know the basics already.


> Why does the definition of "Linked Data" matter so much?  Messaging 
> matters!  It can have a huge real-life impact.  (Colossal recent example 
> in politics: The messaging that President Bush used to justify starting 
> the Iraq war, which has ended up costing trillions of dollars and over 
> 100,000 civilians killed!)

I just ignore this remark.


> The coining of the term "Linked Data" by TimBL was the single most 
> important advance in messaging in the entire history of the Semantic 
> Web.  One of the biggest problems the Semantic Web had was the term 
> "Semantic Web" itself, because: (a) it is intimidating and confusing; 
> and (b) it is misleading, because people wrongly associate it with the 
> semantics of natural language processing.  It has been difficult over 
> the years to get the messaging simple and clear -- and the ugliness of 
> RDF/XML certainly didn't help -- and the term "Linked Data" helps 
> substantially.

Exactly, it is a marketing term. Dan wrote excellent piece on that so I
won't rehash it here:

http://lists.w3.org/Archives/Public/www-archive/2012Oct/0119.html

The truth is that people strongly associate RDF with RDF/XML. In fact, it is
difficult to have conversations without conflating RDF the data model and
its serialization formats.


> If the JSON-LD spec were to adopt a definition of "Linked Data" that 
> differs in such a critical way from the established meaning of this 
> term, it would be misleading the public and would create confusion in 
> the community.

Sorry, but I just can't see how it is doing that.


> To be clear, the current resolution of this point is NOT satisfactory.
>
> A simple and neutral way to resolve this problem would be to just quote 
> TimBL's original definition of the term.  This is what other documents 
> have done, and would not require endless wordsmithing debates.  I 
> suggest doing that and linking to TimBL's original Linked Data document. 
> (Credit: thanks to Arnaud Le Hors for making this suggestion while we 
> were talking at SemTech.)

I suppose by "TimBL's original definition" you don't really mean the
original 2006 version, right?


>> 2. It is the groups experience that Web developers have an aversion to
>> RDF as a complex technology due to RDF/XML and other technologies that
>> do not represent the current RDF world. It doesn't matter if these
>> aversions are based on reality - the aversion exists, so we try to
>> downplay RDF as much as possible in the JSON-LD spec.
>
> I agree with the goal of keeping it simple for Web developers, but I 
> think the downplaying has gone to the point of hiding it, and that is 
> harmful.  If developers' view of RDF is going to change, they need to 
> know that it *is* RDF that they are using when they use JSON-LD. If

And you think that developers won't understand that from the last paragraph
in the introduction

   Developers that require any of the facilities listed above or
   need to serialize an RDF graph or dataset [RDF11-CONCEPTS] in a
   JSON-based syntax will find JSON-LD of interest.

or any of the sections specifically discussing the relationship of JSON-LD
and RDF?


> they see how easy it is to use JSON-LD, it will stand on its own merits, 
> even if it does say "RDF inside".  To my mind, the goal should not be to 
> *hide* the fact that it is JSON-LD is RDF, but to make JSON-LD 100% 
> usable by those who do not wish to learn anything *else* about RDF -- 
> i.e., anything beyond what they learn in the JSON-LD spec.

That's exactly what we try to do. By "hiding RDF" we try to increase the
chances that they "see how easy it is to use JSON-LD" instead of stopping to
read after the first paragraph because of an aversion to RDF.


>> 3. There is no technical problem that is solved by referencing RDF in
>> the definition of Linked Data.
>
> No, but as explained above, it is a very important messaging issue.
>
>> 4. If we were to add RDF to the definition of Linked Data, there would
>> just be another set of objections to the inclusion of RDF in the
>> definition of Linked Data.
>
> Then those objections should be addressed head-on anyway, because the 
> term "Linked Data" has an important and well-established meaning in the 
> community, and that includes the fact that Linked Data is RDF. Otherwise 
> those who wish to divorce Linked Data from RDF will be misleading the 
> public when they talk about "Linked Data" and mean something else, or 
> they talk about "conflating" Linked Data with RDF, when in fact Linked 
> Data *is* RDF.

Linked Data != RDF. RDF without a single dereferenceable IRI is still valid
RDF but it certainly isn't Linked Data by any means.


>>> 2. Define a *normative* bi-directional mapping of a JSON profile to
>>> and from the RDF abstract syntax, so that the JSON profile *is* a
>>> serialization of RDF, and is fully grounded in the RDF data model and
>>> semantics.
>>
>> We already do this here:
>>
>> http://www.w3.org/TR/json-ld/#transformation-from-json-ld-to-rdf
>
> No, it doesn't.  That section explicitly says: "This section is 
> non-normative".

JSON-LD consists of two specs, the syntax spec and the algorithms and API
spec. The normative transformation to RDF can be found here:

http://www.w3.org/TR/json-ld-api/#convert-to-rdf-algorithm


>> There have been arguments in the past to specify an additional subset of
>> JSON-LD that is a direct mapping to the RDF Abstract Syntax, but no one
>> has provided a compelling technical reason to do so.
>>
>> Additionally, creating two profiles of JSON-LD could have worse
>> consequences than the ones you outline in your e-mails. For example,
>> some implementers may only implement the subset and not the full version
>> of JSON-LD, which would create a really bad interoperability problem.
>
> The issue here was about alignment: JSON-LD saying that URIs "SHOULD" 
> (RFC2119) be dereferenceable, while RDF makes no such requirement. 

Exactly, and still you argue that Linked Data === RDF.


> However, in discussing this at SemTech with Greg Kellogg and Arnaud, 
> Arnaud suggested that instead of defining a profile of JSON-LD that 
> drops the "SHOULD", it would be better to encourage RDF to *include* 
> such as a "SHOULD".  I think that's a great idea.

I tried that already, see: https://www.w3.org/2011/rdf-wg/track/issues/103


> To be clear, I withdraw my suggestion that a separate profile of JSON-LD 
> be defined.

Great.


>> The extra features in JSON-LD, such as blank nodes as graph names, are a
>> requirement for the Web Payments work as well as the RDF digital
>> signatures work. So, we can't remove them without causing damage to
>> those initiatives.
>>
>> If an author wants to use a version of JSON-LD that is fully grounded in
>> the RDF data model, they should not use the JSON-LD features listed in
>> those bullet points, or they should convert their non-RDF data to
>> something that RDF can understand (more on this below).
>
> It sounds like my suggestion to use skolemized URIs to avoid that 
> problem was not understood, so I'll try to clarify.  The point is to 
> ensure that JSON-LD is fully grounded in the RDF model, so that JSON-LD 
> truly *is* an RDF serialization.  To achieve that in cases where a naive 
> mapping from the JSON-LD syntax to the RDF model would produce a blank 
> node in a position that RDF does not allow, I was suggesting that the 
> JSON-LD spec *normatively* state that skolemized URIs MUST be used in 
> the RDF model in those places.  I'll explain more below about how those 
> skolem URIs are chosen.
>
>>
>>> 3. Use skolemized URIs in the normative mapping to prevent mapping
>>> JSON syntax to illegal RDF.
>>
>> This is already stated as an option in a normative section:
>>
>> http://www.w3.org/TR/json-ld/#relationship-to-rdf
>>
>> We do not make this mandatory because there are several other legitimate
>> ways to convert blank nodes to something that RDF can interpret. For
>> example: 1) normalizing and getting a hash identifier for the subgraph
>> attached to the blank node property or blank graph, 2) creating a
>> counter-based solution for blank node naming, 3) minting a new global
>> IRI for the blank node, 4) transforming to a data model that allows
>> blank node properties and blank graphs, etc. There is no single correct
>> approach.
>
> That doesn't matter, as my next comment below will explain.
>
>>
>> Additionally, skolemization will not work unless all systems exchanging
>> the skolem IRIs do so in a standard way, and there is currently no
>> standard way of skolemizing.
>
> *Some* standardization is needed, but it does not need to specify all 
> details about the skolem URI.  All that's really important is that: (a) 
> a skolem URI somehow be created; and (b) such skolem URIs can be 
> reliably *recognized* as skolem URIs.  Beyond that, it doesn't matter if 
> some implementations use counters, some using hashing techniques and 
> some use other techniques.

Another important aspect that's missing in your list above is that skolem
IRIs have to be unique and that's exactly what makes it so difficult to
create them in a distributed system.


> The RDF 1.1 spec does specify how skolem URIs can be created so that 
> they can be reliably recognized -- by use of a the .well-known 
> convention -- so the JSON-LD could reference this technique in 
> specifying how blank nodes are avoided in places where the RDF model 
> does not allow them.

The general problem I have with this approach is that a skolem IRI just
allows you to work around a limitation in a serialization format or a
concrete implementation. In RDF, the data model, it is still a blank node
(that's why it is important to be able to reliably recognize them). And
again we conflate the data model and the serialization formats..


> To be clear, the current resolution of this point is NOT satisfactory. 
> Please further consider the suggestion of requiring skolem URIs in those 
> circumstances.

Are skolem IRIs blank nodes or not according to you? If so, how does it help
to require them?


>>> 4. Make editorial changes to avoid implying that JSON-LD is not RDF.
>>>   For example, change "Convert to RDF" to "Convert to Turtle" or
>>> perhaps "Convert to RDF Abstract Syntax".
>>
>> The group agrees with changing the title of the section to "Convert to
>> RDF Abstract Syntax".
>
> Thank you.  But there are several other places also where the wording 
> implies that JSON-LD is not RDF.  Appendix C is rife with them. I 
> started to list them, but immediately ran into the problem that this 
> section -- particularly the part before C.1 -- needs to be rewritten 
> once JSON-LD is actually a normative serialization of RDF, and is fully 
> grounded in the RDF model.

JSON-LD is not RDF. Turtle is neither. Both are serialization formats with a
mapping to RDF, an abstract data model.


> The whole discussion of the JSON-LD data model as distinct from the RDF 
> data model also suggests that JSON-LD is not RDF.  It is also confusing 
> to define a JSON-LD data model in addition to a JSON-LD document's RDF 
> data model.  This confusion will be eliminated by making JSON-LD a 
> normative serialization of RDF, fully grounded in the RDF model.

We added the Data Model section since the RDF WG asked us to do so. I don't
see compelling reasons to revisit that decision.


Cheers,
Markus


--
Markus Lanthaler
@markuslanthaler
Received on Saturday, 8 June 2013 12:29:08 UTC