Re: Official response to RDF-ISSUE-132: JSON-LD/RDF Alignment from Peter Ansell on 2013-06-10 (public-rdf-comments@w3.org from June 2013)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Mon, 10 Jun 2013 17:36:33 +1000
To: Markus Lanthaler <markus.lanthaler@gmx.net>
Cc: public-rdf-comments <public-rdf-comments@w3.org>
Message-ID: <CAGYFOCT=AW+fK-oiC1-TO6E8+-yuQespJSFtedkc7qybby_Tdw@mail.gmail.com>
On 9 June 2013 19:58, Markus Lanthaler <markus.lanthaler@gmx.net> wrote:

> On Sunday, June 09, 2013 7:57 AM, Peter Ansell wrote:
> On 9 June 2013 04:30, Markus Lanthaler wrote:
> On Saturday, June 08, 2013 5:13 PM, Sven R. Kunze wrote:
> >>> My preliminary conclusion: <<<It's just the same as we can do the same
> >>> with both.>>>
> >>> Correct me if I am wrong but having "native literals" does not make
> >>> any difference as most RDF tools and JSON-LD tools will give me a
> >>> variable filled with a datatype specific to the programming language I
> >>> use.
> >>
> >> Well, you might not care, but other people care :-P JSON-LD has e.g.
> >> native numbers and (probably more interesting) lists. In RDF
> >> everything is a opaque string that can only be interpreted, i.e.,
> >> converted to a number in your programming language, if you understand
> >> the data type. So to speak, JSON-LD has a built-in data type for
> >> numbers.
> >
> > I think you are not fully comprehending the importance of numeric
> > precision for an RDF-compatible data format if you are referring to
> > this as a feature. If the JSON numeric datatypes were compatible with
> > the XMLSchema numeric datatypes then there would be no issue, but
> > supporting this just to save a few quotes is not going to help in the
> > long term. All of the main programming models, except notably
> > Javascript/JSON, understands the XMLSchema numeric datatypes.
>
> I think you misunderstood what I was trying to bring across because I
> probably didn't make it clear enough. So let me try again. Sven was
> wondering why a separate data model is required and couldn't see how
> "native literals" make a difference. I was trying to explain that RDF does
> not have any built-in native types but represents everything as a string
> alongside a datatype - including numbers. Due to historic reasons, RDF is
> tightly coupled to the XSD types. Now the problem is that a JSON number
> can't be mapped 1:1 to any of the existing XMLSchema types if, at the same
> time, off-the-shelf JSON parsers are used. The reason is that a JSON number
> has unlimited range and precision but parsers don't - most of them parse
> numbers into native 64bit floating point numbers or integers. Most of them
> don't even complain even the number is too large to fit.
>
> This is one of the issues that caused most pain in the development of
> JSON-LD. Every design has its own pros and cons:
>
> a) We could have minted a new  datatype "JsonNumber". That would not only
> mean that most RDF tools wouldn't be able to interpret it but also require
> custom JSON parsers to extract the number without losing any precision. So
> while theoretically the best option, it is the worst in practice.
>
> b) We could forbid native JSON numbers in JSON-LD. This would eliminate
> the mapping problems because no mapping would be necessary. Unfortunately,
> it would also mean that JSON-LD wouldn't be idiomatic JSON anymore. I think
> the effect this would have on adoption would be unacceptable.
>

If JSON-LD was aiming to be an RDF serialisation, and not just an
acceptable target for migration of RDF to the format that "everyone"
(currently) uses, then this may not be viewed in the same way.

It definitely seems like being idiomatic JSON is the main goal of JSON-LD,
making it all the more confusing whether the RDF WG has fully anticipated
the choice to go with JSON-LD as their "JSON serialisation of the RDF
Abstract Model".


> c) We find a tradeoff between those two extremes. That's what we tried to
> do. We map JSON numbers to XSD datatypes and vice versa and explain clearly
> what the consequences are. We also have a mechanism to represent XSD
> numbers as typed literals (i.e., JSON strings) if precision or lexical
> representation matters.
>
> So, as you can see, this is a tradeoff. We are not doing this to "save a
> few quotes" as you claim.
>
>
I don't think the authors of the specification are in the best position to
fully reflect on the significance of having a partial mapping between what
will be two distinct valid uses of JSON-LD. The people in the best postion
to comment on the significance may be commercial users of a numeric
sensitive application such as Web Payments that is being driven solely by
JSON-LD and hence has no fallback position.


>
> >>> I really do not care of the serialization as the work on the data
> >>> is done in code.
> >>> So, it gives me the impression that JSON-LD is just another
> >>> serialization for RDF... But why a different data model?
> >>
> >> Because it allows things that RDF (currently) doesn't, i.e., blank
> >> nodes as graph names and predicates.
> >
> > For what it is worth, if blank node predicates were useful, then they
> > would have made it into RDF in the past. They are too fragile, in
> > terms of consistent data interpretation for round-tripping and
> > merging, to be widely deployed, and creating a new format just to
> > allow them and a few other syntactic sugars is not going to help
> > Linked Data, as the entire concept is based around interoperability.
> > If you can't reliably merge and manipulate Linked Data from different
> > locations then your Linked Data is sub-optimal.
>
> That's somewhat besides the point. You could say, that in JSON (not
> JSON-LD) every property is a blank node. It only has locally valid
> semantics. As you rightly pointed out, this can't be used to reliable merge
> data. In JSON-LD you map such properties to IRIs to make their semantics
> explicit (and globally valid).
>
> Sometimes however it is not possible or not desired to map all properties
> to IRIs. We decided to drop such properties when transforming JSON-LD
> documents because we can't recognize which properties are the same and
> which are not. So that's exactly the problem you describe. The point now is
> that people would like to convert only parts of their documents to Linked
> Data but at the same time transform them without losing the other
> properties during the transformation. Blank nodes allow us to achieve that.
>
>
All properties should be easily mapped to IRIs. Even if the IRIs are
temporary they are still easier to work with than blank nodes. I don't
accept that there are properties which cannot be mapped to IRIs. Linked
Data is not perfect (whether it is JSON-LD or RDF), but it will be
uninterpretable in both cases in the long term if it relies on blank nodes
to identify properties. By design, blank nodes ensure that there is no easy
way to reuse the item, which implies that there would be no way to reuse
the Linked Data in a consistent fashion of documents are merged or split,
as the same property would have different definitions in each of the merged
documents.

It seems like encouraging everyone to always use idiomatic JSON, which is
not directly reusable without JSON-LD annotations, as you say, is more
important for JSON-LD than ensuring some basic principles of useful "Linked
Data".


> Here's a practical example I discovered a while ago on Twitter which tries
> to map GeoJSON to JSON-LD:
>   http://pleiades.stoa.org/places/628932/json
>
> You see that a whole lot of properties are mapped to bnode ids in the
> context. This feature allows to migrate JSON data gradually to Linked Data.
>
>
I don't agree that JSON-LD structures should be designed to enable partial
migration while being expected to be roundtripped using the JSON-LD
algorithms. If people want to use JSON-LD algorithms they should expect to
not to have access to any properties that they have not mapped after the
transformations are complete. Anyone that is partially taking up JSON-LD
will still be processing their data internally as JSON in their
applications anyway, and hence only supporting a single JSON serialisation
of the document, so there should not need to be cases in practice where the
data is called JSON-LD before they finish the migration.


> > Blank nodes as graph names are neither here nor there, as they haven't
> > actually been widely used for anything other than backups or trivial
> > purposes so far, as RDF so far allows their removal or modification
> > without consequence. There are in an entirely different situation to
> > blank node predicates that must be reliably round-tripped for the
> > entire RDF Abstract Model to be consistently used.
>
> Well, AFAICT this is still being discussed in the RDF WG and the most
> recent resolution is that datasets can use blank nodes as graph names:
>   https://www.w3.org/2013/meeting/rdf-wg/2013-05-15#resolution_2
>
>
I don't mind either way, as I do not preserve dataset identifiers in my
applications anyway, given that I rely on a quad-store with one position
for internal-use only, and not a quin-store. The older drafts of N-Quads
allowed blank nodes for datasets anyway, even if the W3C draft does not.


>
> > The draft of the Web Payments specification, where a single JSON-LD
> > profile forms the only possible serialisation and all of the
> > transformations are defined on the JSON graph, and not on the RDF
> > Abstract Model, seems to make it quite clear that JSON-LD is destined
> > to be its own ecosystem and compatibility with RDF is only intended to
> > be for migration. [...]
>
> I'm not familiar enough with the We Payments work to comment this. I'll
> leave this to others. I can't follow your reasoning though why that
> "make[s] it quite clear that JSON-LD is destined to be its own ecosystem
> and compatibility with RDF is only intended to be for migration"? There's
> no JSON-LD police that enforces how it has to be used. I'm happy with the
> adoption JSON-LD has got so far and for sure not going to blame people for
> doing so. While I know that the people behind Web Payments very well know
> what they are doing, it is very reasonable to introduce such a disrupting
> approach gradually.
>
>
The first W3C publication that is expected to reuse JSON-LD should be a
valid use case for discussion during the construction of the standard, as
it could highlight these usability issues before they are set in concrete.


> Once it is deployed widely enough, people will see that programming
> against a specific structure is counter-productive and that it isn't
> necessary anymore because everything has unambiguous, globally valid
> semantics.
>
>
The Web Payments specification as it read the last time I looked at in the
last few weeks emphasised the exact keys and @context that are to be
expected for valid documents. I may have misunderstood the significance of
that though. Hopefully people don't see the support for partial migration
as anything other than that. Ideally the partial migration parts should be
non-normative to discourage the perception that partial solutions are
acceptable as Linked Data without further changes.

Funnily enough, I recently wrote a paper titled "Model Your Application
> Domain, Not Your JSON Structures" which criticizes exactly what you are
> describing. Here's the link if you are interested:
>   http://m.lanthi.com/wsrest2013-paper
>
>
> > If the JSON-LD group
> > were serious about RDF compatibility, they would at require that any
> > valid JSON-LD document be accepted by Web Payments (and future
> > standards) for them to be able to reuse the specification in another
> > W3C publication. The ideal would be to allow any RDF serialisation,
> > based on content negotiation.
>
> Yeah, it is exactly this kind of sledgehammer approach that helped
> adoption of semantic web technologies so much in the past.
>
>
Sarcasm aside, informed users should not be correlating difficulties with
RDF/XML as difficulties with the RDF Abstract Model anymore, so the past
may be irrelevant from that perspective. If JSON-LD is afraid that
references to the scary three letters "R" "D" and "F" will ensure the
downfall of the format among hipster web developers than it is quite out of
place discussing the format on "public-*r*d*f*-comments.

By defining different profiles of JSON-LD, users are still going to have
similar difficulties with manually identifying what the meaning of a
particular document is unless it is in their expected profile. If they
already need to perform standard transformations with their black-box
JSON-LD translator to get a JSON object to their expected profile ,so that
they can proceed with JSON queries outside of the JSON-LD engine, then
there is no difference to them if their JSON-LD transformations
transparently worked on other RDF formats also, say Turtle for example.

If the ability of the RDF Abstract Model to be interpreted independent of
document formats is a sledgehammer then so be it. I am fine with a
sledgehammer myself, as it is more powerful than getting stuck in one of
the JSON-LD profiles.


> >>> > JSON-LD is not RDF. Turtle is neither. Both are serialization formats
> >>> > with a mapping to RDF, an abstract data model.
> >>>
> >>> Thank you for that clarification!!! Finally, I got it. But why the
> >>> heck isn't the spec mentioning it?
> >>
> >> Quoting the spec (Relationship to RDF):
> >>
> >>    Summarized these differences mean that JSON-LD is capable of
> serializing
> >>    any RDF graph or dataset and most, but not all, JSON-LD documents
> can be
> >>    directly transformed to RDF.
> >>
> >> Isn't that saying exactly the same?
> >
> > I think you underestimate the number of documents that will have
> > issues translating JSON numeric datatypes to RDF and back. Considering
> > that the first specification to reuse JSON-LD is the very number-
> > sensitive Web Payments draft, it seems a little naive to be brushing
> > off that issue.
>
> So you would rather have something which maps perfectly but won't be
> adopted? I certainly don't. I'm willing to bet that the amount of JSON data
> out there exceeds that of RDF by some orders of magnitudes. Apparently the
> problem doesn't exist in practice or can be worked around easily.
>
>
Web Payments may be the first time that people have relied on the JSON
native numeric datatypes for precise numerical representations. If anyone
has pre-existing examples of people relying on its exact semantics for a
financial or scientific applications then feel free to pipe up and show how
the native datatypes are being used for sensitive computations.

I really don't understand the obsession with saying that the amount of JSON
data exceeds RDF. Has JSON ever been used as a long-term data format with
many consumers and many producers of the same document format. All of the
JSON APIs that I have worked with define their own formats and consumers
write their applications to work with a single producer. That makes their
use of the format moot IMO for long term users. RDF Linked Data producers
have an entirely different long-term goal from the outset.

The JSON-LD specification needs to decide whether it will aim to be
everything to commercial web API producers who have no intention of sharing
documents between themselves, or stay compatible with the existing RDF
Linked Data producers to enable roundtripping of data through other formats.


> >>> But as long as am I not able to make clear
> >>> statements of how everything fits together, I feel like an idiot
> >>> talking drivel.
> >>
> >> The goal of standardization is to find a compromise that can be
> >> accepted by all involved participants. We worked long and hard on this
> >> and I think we found a compromise which overall makes all of us
> >> equally happy (or unhappy for that matter).
> >
> > The compromise so far seems to be heading towards JSON-LD not being
> > released by the W3C RDF Working Group as the authors have not
> > committed to conforming with the main abstract specification produced
> > by the working group, and they have not conformed to the charter of
> > the working group to create "a JSON serialisation of the RDF Abstract
> > Model".
>
> I think we made it crystal clear many many times that that's not the case.
> That being said, I'm proud of the group working so hard to find the best
> possible compromise between theoretical pureness and usefulness in
> practice. Just blindly serializing the abstract model in JSON doesn't bring
> any advantages. I know I can't convince of that fact but I hope that you
> acknowledge the successes JSON-LD already has. I'm convinced that a large
> part of that is due to the fact that it feels like idiomatic JSON.
>
>
I don't agree that idiomatic JSON is as advantagous as you say it is. If
users are going to take up a JSON serialisation of RDF it will be because
they recognise the advantages of the RDF model, independent of exactly how
their document is produced or represented by them as the original
producers.

Implying that they should be able to continue only accepting idiomatic
JSON, because otherwise noone will use JSON-LD, seems to go directly
against the support for different profiles in the current JSON-LD draft. If
this was a goal, then JSON-LD should only support a single authoritative
serialisation of each JSON object, say the existing serialisation from each
of the many different producers, to ensure full backwards compatibility.
That would enable producers to continue using idiomatic JSON and not need
to modify their infrastructure ever. By requiring them to understand the
massive JSON-LD specification, which IMO is only complex as it aims to
support everyone except for RDF users. The retort that is continually used
is that RDF users are virtually non-existent and can safely be ignored
after the RDF WG rubber-stamps the attempt at supporting RDF in one
direction only due to migration issues, for the most part.


>
> >> What we are doing right now is minor-wordsmithing without any
> >> technical result at the end of a very long process. It is important to
> >> have these discussions but at some point it just doesn't make much
> >> sense anymore to rehash the same discussions over and over again. It
> >> just has to be accepted that there are different opinions.
> >
> > Different opinions are what makes workable long-term standards more
> > useful than the results of a typical committee driven process. If
> > there are continued discussions without suitable conclusions then it
> > may mean there is an underlying issue.
>
> What we two are discussing here is something completely different from
> what I was discussing with both David and Sven.
>
>
> >> People are waiting for this technology to become a proper standard.
> >> Some are waiting for a long time and I think most of us deeply
> >> involved in this standardization process are exhausted after all the
> >> hard work we've done over the years. It's time to ship.
> >
> > If you can't ship your product before the W3C RDF Working Group rubber
> > stamps this particular format then you are in trouble and your
> > business model is flawed. If on the other hand you had defined your
> > application directly on RDF you would not have this delay, as users
> > would have a number of stable, well-defined, backup formats to
> > communicate using both now, and in the future when JSON-LD stabilises
> > as "the JSON serialisation of RDF".
>
> I acknowledge your opinion but think reality works slightly different.
>
>
It is fine to base an experimental commercial application on a draft
format, although advertising it as applicable to others for their
applications may be a little premature. However, getting worried that the
process is being taken off track by legitimate comments relating to its
essential overall goal as an interoperable RDF serialisation, without doing
something about it (namely withdrawing JSON-LD from the RDF WG and finding
a new target) doesn't seem to be working.


> > I am not saying that JSON-LD is a bad standard, but relating it to RDF
> > (even non-normatively as it is currently in the main JSON-LD
> > specification) brings with its a responsibility to maintain the
> > existing community. Even more so since you chose to promote and
> > develop it so close to the RDF specificiation. If it were any other
> > format developed outside of the RDF working group, and specifically to
> > not be an RDF serialisation then we wouldn't be commenting this way,
> > as it would have its own ecosystem and that would be fine. We may even
> > promote the mapping of JSON-LD back to the abstract RDF model with
> > partial interoperability instead of criticising it for not even having
> > the goal of reasonable interoperability.
>
> It is always easy to criticize a specific solution. We've worked very hard
> for a couple of years on this. All the development was completely open and
> we welcomed everyone. We brought the work into the RDF WG a year ago.
> Nevertheless, there were almost no contributions from RDF WG members.
>
>
When JSON-LD was brought it into the RDF WG, presumably to fast-track the
standardisation process, it should have been clearly understood by the
authors that JSON-LD, as the alternative to any other existing or possible
serialisation of RDF in JSON, must be fully compatible with the abstract
RDF model. Denying that now doesn't really give you any credit in the RDF
community. Ensuring that there is no general compatibility may actually
give you credit with people who are allergic to RDF, as they won't need to
be afraid of having to work with the existing RDF Linked Data web, as they
will have their own JSON Linked Data web now that supports one-way
transformations, in general, but will not generally be integrated back into
RDF applications for that reason.


> I you have specific proposals to make, make them now. As you very well
> know, we'll discuss all of them. But just criticizing design decisions
> without providing new insights or superior solutions doesn't bring as
> forward and honestly I have better things to do than to write the same
> mails over and over again.
>

My proposal is to ensure interoperability with RDF. As you seem to imply at
[1], modifying the specification to state that it supports the
compatibility from JSON-LD back to RDF seems to be non-negotiable at this
point. As long as RDF data can be consistently converted *into*,
non-native-numeric JSON-LD that is being viewed as good enough by the
authors of the specification. The specific points that I am interested in
have already been responded to. Ie, blank node predicates (negative)/blank
node datatypes (positive, [2]) /JSON native numeric datatypes versus
XMLSchema datatypes (negative).

If the JSON-LD authors consider RDF to have effectively had no impact at
all up to this point, as their justification for not ensuring
compatibility, then there is not much left to say in this forum. If the RDF
WG members received the same impression, it is not surprising that they
have not been focusing on the issue up to this point.

My advice to the other participants of the RDF WG who have not yet voiced
their collective opinions, would be to clearly respond to the stalemated
discussion points regarding full RDF compatibility, to clarify to potential
users, such as Sven, whether JSON-LD is still intended to be published by
the RDF WG as an interoperable serialisation of the RDF Abstract Model.

Good luck,

Peter

[1] https://github.com/json-ld/json-ld.org/issues/254#issuecomment-19156384
[2] https://github.com/json-ld/json-ld.org/issues/257
Received on Monday, 10 June 2013 07:37:02 UTC