RE: Official response to RDF-ISSUE-132: JSON-LD/RDF Alignment from Markus Lanthaler on 2013-06-09 (public-rdf-comments@w3.org from June 2013)

From: Markus Lanthaler <markus.lanthaler@gmx.net>
Date: Sun, 9 Jun 2013 11:58:08 +0200
To: "'public-rdf-comments'" <public-rdf-comments@w3.org>
Message-ID: <007201ce64f7$d8de9a90$8a9bcfb0$@lanthaler@gmx.net>
On Sunday, June 09, 2013 7:57 AM, Peter Ansell wrote:
On 9 June 2013 04:30, Markus Lanthaler wrote:
On Saturday, June 08, 2013 5:13 PM, Sven R. Kunze wrote:
>>> My preliminary conclusion: <<<It's just the same as we can do the same
>>> with both.>>>
>>> Correct me if I am wrong but having "native literals" does not make
>>> any difference as most RDF tools and JSON-LD tools will give me a
>>> variable filled with a datatype specific to the programming language I
>>> use.
>> 
>> Well, you might not care, but other people care :-P JSON-LD has e.g.
>> native numbers and (probably more interesting) lists. In RDF
>> everything is a opaque string that can only be interpreted, i.e.,
>> converted to a number in your programming language, if you understand
>> the data type. So to speak, JSON-LD has a built-in data type for
>> numbers.
> 
> I think you are not fully comprehending the importance of numeric
> precision for an RDF-compatible data format if you are referring to
> this as a feature. If the JSON numeric datatypes were compatible with
> the XMLSchema numeric datatypes then there would be no issue, but
> supporting this just to save a few quotes is not going to help in the
> long term. All of the main programming models, except notably
> Javascript/JSON, understands the XMLSchema numeric datatypes.

I think you misunderstood what I was trying to bring across because I probably didn't make it clear enough. So let me try again. Sven was wondering why a separate data model is required and couldn't see how "native literals" make a difference. I was trying to explain that RDF does not have any built-in native types but represents everything as a string alongside a datatype - including numbers. Due to historic reasons, RDF is tightly coupled to the XSD types. Now the problem is that a JSON number can't be mapped 1:1 to any of the existing XMLSchema types if, at the same time, off-the-shelf JSON parsers are used. The reason is that a JSON number has unlimited range and precision but parsers don't - most of them parse numbers into native 64bit floating point numbers or integers. Most of them don't even complain even the number is too large to fit.

This is one of the issues that caused most pain in the development of JSON-LD. Every design has its own pros and cons:

a) We could have minted a new  datatype "JsonNumber". That would not only mean that most RDF tools wouldn't be able to interpret it but also require custom JSON parsers to extract the number without losing any precision. So while theoretically the best option, it is the worst in practice.

b) We could forbid native JSON numbers in JSON-LD. This would eliminate the mapping problems because no mapping would be necessary. Unfortunately, it would also mean that JSON-LD wouldn't be idiomatic JSON anymore. I think the effect this would have on adoption would be unacceptable.

c) We find a tradeoff between those two extremes. That's what we tried to do. We map JSON numbers to XSD datatypes and vice versa and explain clearly what the consequences are. We also have a mechanism to represent XSD numbers as typed literals (i.e., JSON strings) if precision or lexical representation matters.

So, as you can see, this is a tradeoff. We are not doing this to "save a few quotes" as you claim.


>>> I really do not care of the serialization as the work on the data
>>> is done in code.
>>> So, it gives me the impression that JSON-LD is just another
>>> serialization for RDF... But why a different data model?
>> 
>> Because it allows things that RDF (currently) doesn't, i.e., blank
>> nodes as graph names and predicates.
> 
> For what it is worth, if blank node predicates were useful, then they
> would have made it into RDF in the past. They are too fragile, in
> terms of consistent data interpretation for round-tripping and
> merging, to be widely deployed, and creating a new format just to
> allow them and a few other syntactic sugars is not going to help
> Linked Data, as the entire concept is based around interoperability.
> If you can't reliably merge and manipulate Linked Data from different
> locations then your Linked Data is sub-optimal.

That's somewhat besides the point. You could say, that in JSON (not JSON-LD) every property is a blank node. It only has locally valid semantics. As you rightly pointed out, this can't be used to reliable merge data. In JSON-LD you map such properties to IRIs to make their semantics explicit (and globally valid).

Sometimes however it is not possible or not desired to map all properties to IRIs. We decided to drop such properties when transforming JSON-LD documents because we can't recognize which properties are the same and which are not. So that's exactly the problem you describe. The point now is that people would like to convert only parts of their documents to Linked Data but at the same time transform them without losing the other properties during the transformation. Blank nodes allow us to achieve that.

Here's a practical example I discovered a while ago on Twitter which tries to map GeoJSON to JSON-LD:
  http://pleiades.stoa.org/places/628932/json

You see that a whole lot of properties are mapped to bnode ids in the context. This feature allows to migrate JSON data gradually to Linked Data.


> Blank nodes as graph names are neither here nor there, as they haven't
> actually been widely used for anything other than backups or trivial
> purposes so far, as RDF so far allows their removal or modification
> without consequence. There are in an entirely different situation to
> blank node predicates that must be reliably round-tripped for the
> entire RDF Abstract Model to be consistently used.

Well, AFAICT this is still being discussed in the RDF WG and the most recent resolution is that datasets can use blank nodes as graph names:
  https://www.w3.org/2013/meeting/rdf-wg/2013-05-15#resolution_2


> The draft of the Web Payments specification, where a single JSON-LD
> profile forms the only possible serialisation and all of the
> transformations are defined on the JSON graph, and not on the RDF
> Abstract Model, seems to make it quite clear that JSON-LD is destined
> to be its own ecosystem and compatibility with RDF is only intended to
> be for migration. [...]

I'm not familiar enough with the We Payments work to comment this. I'll leave this to others. I can't follow your reasoning though why that "make[s] it quite clear that JSON-LD is destined to be its own ecosystem and compatibility with RDF is only intended to be for migration"? There's no JSON-LD police that enforces how it has to be used. I'm happy with the adoption JSON-LD has got so far and for sure not going to blame people for doing so. While I know that the people behind Web Payments very well know what they are doing, it is very reasonable to introduce such a disrupting approach gradually.

Once it is deployed widely enough, people will see that programming against a specific structure is counter-productive and that it isn't necessary anymore because everything has unambiguous, globally valid semantics.

Funnily enough, I recently wrote a paper titled "Model Your Application Domain, Not Your JSON Structures" which criticizes exactly what you are describing. Here's the link if you are interested:
  http://m.lanthi.com/wsrest2013-paper


> If the JSON-LD group
> were serious about RDF compatibility, they would at require that any
> valid JSON-LD document be accepted by Web Payments (and future
> standards) for them to be able to reuse the specification in another
> W3C publication. The ideal would be to allow any RDF serialisation,
> based on content negotiation.

Yeah, it is exactly this kind of sledgehammer approach that helped adoption of semantic web technologies so much in the past.

 
>>> > JSON-LD is not RDF. Turtle is neither. Both are serialization formats
>>> > with a mapping to RDF, an abstract data model.
>>>
>>> Thank you for that clarification!!! Finally, I got it. But why the
>>> heck isn't the spec mentioning it?
>>
>> Quoting the spec (Relationship to RDF):
>> 
>>    Summarized these differences mean that JSON-LD is capable of serializing
>>    any RDF graph or dataset and most, but not all, JSON-LD documents can be
>>    directly transformed to RDF.
>> 
>> Isn't that saying exactly the same?
> 
> I think you underestimate the number of documents that will have
> issues translating JSON numeric datatypes to RDF and back. Considering
> that the first specification to reuse JSON-LD is the very number-
> sensitive Web Payments draft, it seems a little naive to be brushing
> off that issue.

So you would rather have something which maps perfectly but won't be adopted? I certainly don't. I'm willing to bet that the amount of JSON data out there exceeds that of RDF by some orders of magnitudes. Apparently the problem doesn't exist in practice or can be worked around easily.


>>> But as long as am I not able to make clear
>>> statements of how everything fits together, I feel like an idiot
>>> talking drivel.
>> 
>> The goal of standardization is to find a compromise that can be
>> accepted by all involved participants. We worked long and hard on this
>> and I think we found a compromise which overall makes all of us
>> equally happy (or unhappy for that matter).
>
> The compromise so far seems to be heading towards JSON-LD not being
> released by the W3C RDF Working Group as the authors have not
> committed to conforming with the main abstract specification produced
> by the working group, and they have not conformed to the charter of
> the working group to create "a JSON serialisation of the RDF Abstract
> Model".

I think we made it crystal clear many many times that that's not the case. That being said, I'm proud of the group working so hard to find the best possible compromise between theoretical pureness and usefulness in practice. Just blindly serializing the abstract model in JSON doesn't bring any advantages. I know I can't convince of that fact but I hope that you acknowledge the successes JSON-LD already has. I'm convinced that a large part of that is due to the fact that it feels like idiomatic JSON.


>> What we are doing right now is minor-wordsmithing without any
>> technical result at the end of a very long process. It is important to
>> have these discussions but at some point it just doesn't make much
>> sense anymore to rehash the same discussions over and over again. It
>> just has to be accepted that there are different opinions.
> 
> Different opinions are what makes workable long-term standards more
> useful than the results of a typical committee driven process. If
> there are continued discussions without suitable conclusions then it
> may mean there is an underlying issue.

What we two are discussing here is something completely different from what I was discussing with both David and Sven. 


>> People are waiting for this technology to become a proper standard.
>> Some are waiting for a long time and I think most of us deeply
>> involved in this standardization process are exhausted after all the
>> hard work we've done over the years. It's time to ship.
>
> If you can't ship your product before the W3C RDF Working Group rubber
> stamps this particular format then you are in trouble and your
> business model is flawed. If on the other hand you had defined your
> application directly on RDF you would not have this delay, as users
> would have a number of stable, well-defined, backup formats to
> communicate using both now, and in the future when JSON-LD stabilises
> as "the JSON serialisation of RDF".

I acknowledge your opinion but think reality works slightly different.


> I am not saying that JSON-LD is a bad standard, but relating it to RDF
> (even non-normatively as it is currently in the main JSON-LD
> specification) brings with its a responsibility to maintain the
> existing community. Even more so since you chose to promote and
> develop it so close to the RDF specificiation. If it were any other
> format developed outside of the RDF working group, and specifically to
> not be an RDF serialisation then we wouldn't be commenting this way,
> as it would have its own ecosystem and that would be fine. We may even
> promote the mapping of JSON-LD back to the abstract RDF model with
> partial interoperability instead of criticising it for not even having
> the goal of reasonable interoperability.

It is always easy to criticize a specific solution. We've worked very hard for a couple of years on this. All the development was completely open and we welcomed everyone. We brought the work into the RDF WG a year ago. Nevertheless, there were almost no contributions from RDF WG members.

I you have specific proposals to make, make them now. As you very well know, we'll discuss all of them. But just criticizing design decisions without providing new insights or superior solutions doesn't bring as forward and honestly I have better things to do than to write the same mails over and over again.


Cheers,
Markus



--
Markus Lanthaler
@markuslanthaler
Received on Sunday, 9 June 2013 09:58:45 UTC