Re: Understanding of JSON-LD values from Sven R. Kunze on 2013-06-13 (public-rdf-comments@w3.org from June 2013)

From: Sven R. Kunze <sven.kunze@informatik.tu-chemnitz.de>
Date: Thu, 13 Jun 2013 17:04:42 +0200
To: Markus Lanthaler <markus.lanthaler@gmx.net>
Cc: 'public-rdf-comments' <public-rdf-comments@w3.org>
Message-ID: <20130613170442.Horde.09sxUdLuqDTdtckcpxfjVg5@mail.tu-chemnitz.de>
Zitat von Markus Lanthaler <markus.lanthaler@gmx.net>:

> On Thursday, June 13, 2013 2:29 AM, Peter Ansell wrote:
> On 13 June 2013 07:49, Sven R.Kunze wrote:
>>> Good evening everybody,
>>>
>>> in a former discussion, I mentioned that the purpose of “native
>>> literals” in the JSON-LD data model is not clear to me. And it still
>>> is not.
>>>
>>> Markus wrote: “JSON-LD has e.g. native numbers and (probably more
>>> interesting) lists. In RDF everything is a opaque string that can only
>>> be interpreted, i.e., converted to a number in your programming
>>> language, if you understand the data type. So to speak, JSON-LD has a
>>> built-in data type for numbers.”
>>>
>>> So, what is the advantage of that? Shouldn’t every RDF graph lib
>>> provide a way to parse the literals with a datatype native to the
>>> programming language one uses?
>
> Sven, I don't really understand what you are trying to achieve with  
> your questions (and the way you frame and time them) but I'm  
> nevertheless trying to answer them in the most objective way I can  
> -- especially since you mention my name explicitly.

I try to achieve clarity.
The way is the default way, I suppose. Which other ways do I have to  
fill in a request other than on this mailing list?
I am asking now because JSON-LD came to my attention recently and I am  
interested in particular use cases that could take advantage of JSON-LD.


> JSON has a "native" representation for numbers. If we were to  
> prohibit the use of that feature it would make no sense at all to  
> define a syntax based on JSON.

That is completely wrong. You argue upon serialization which is native  
to JSON but which has absolutely nothing to do with the data model  
(http://json-ld.org/spec/latest/json-ld/#data-model).


>> From reading your previous mails I understand that you don't care  
>> about serialization syntaxes at all because your libraries take  
>> care of everything. The fact however is that we never (ever) talk  
>> about programming languages in any of the RDF specifications. They  
>> just don't matter (in the sense you are framing your question).

Of course they don't matter. That's the advantage of an abstract  
model. You do not need to take care of these details and get starting.


>> It's an implementation detail. Actually you are kind of  
>> contradicting yourself because you want native numbera in your  
>> programming environment but not in the serialization syntax.

I never said I do not want them in the serialization of 'my' data.  
That is because of one single reason:

The form of the serialization is not important at all. So, whether or  
not JSON-LD have quotes around the values + a datatype or a  
human-friendlier serialization as N3 does, is just not important.

Tools create data in which formats they can. Users write down data in  
which formats they see fit. But it's actually all the same.

And again this has absolutely nothing to do with the JSON-LD data model.


>> The biggest advantage of JSON - and thus the main reason of its  
>> success - is that there's no impedance mismatch between the  
>> serialization format and the native representation in your  
>> programming environment. I'm not aware of any language which can't  
>> parse JSON into a native representation.
>
>
>>> Of one drawback, I could easily think of: it’s confusing as it mixes
>>> up serialization and abstract model. What is so bad of having only
>>> *ONE* value for the number 42 instead of two?
>
> I do not follow. How does it mixes it up? Because there are no  
> quotes around the number? Because there's no explicit datatype? What  
> about Turtle's "native" numbers?

Turtle is serialization as JSON-LD is. So there might be quotes, there  
might be none.

But turtle maps both to one single value whereas JSON-LD conceives  
these two serializations as two different values in its data model.  
Which actually makes no sense to me as 42 is 42. There are no two 42.  
The concept of 42 is unique.


>>> The standard RDF data model only have *ONE* value for it whereas the
>>> JSON-LD model suggests *TWO*, namely the ‘native value’ and the
>>> datayped-string value. Correct me, when I get something wrong.
>
> I don't know what you mean by value but even in RDF there's a  
> difference between the lexical representation and the "value"... and  
> some systems may not even get to see the real "value" because they  
> don't understand the datatype. That's what I meant by "opaque  
> strings".

That's correct. But this also assures flexibility in allowing  
different datatypes.
However, I would consider an RDF tool not able to interpret the most  
XML basics datatypes not ready for production.


>>> Another question that arises when having two different 42 (is that
>>> even possible?) is the fact of how to work with them. Are they
>>> considered equal (in the mathematical sense)? Can I add/substract/...
>>> “42”^^xsd:integer and 42? What are the results: 84 or
>>> “84”^^xsd:integer?
>>>
>>> In order to refer to Markus’ statement: “In RDF everything is a opaque
>>> string that can ....” <<< that is not quite true as JSON data itself
>>> is only an opaque string, too, that only a JSON parser is able to
>>> understand.
>
> No, JSON has, just as RDF, a data model. It happens to support  
> numbers of infinite range and precision. RDF doesn't by itself.

RDF doesn't support even numbers. It supports value spaces. Who said  
there cannot be a datatype that defines such value space of the type  
of numbers you mentioned?

And to repeat myself. JSON-LD data is just a string. Hence, itself has  
the same limitations and possibilies as N3 and the like.

The RDF data model and the JSON-LD data model actually shouldn't care  
and RDF doesn't. But JSON-LD does, which confuses me.


> It relies on datatypes which define how such an "opaque string" can  
> be interpreted. An RDF library has to know how to interpret the XSD  
> types to be able to infer that "42"^^xsd:integer == 42, the same is  
> true for other datatypes.

The same is true for JSON and for JSON-LD.



>>> Other example: in N3, you can write false as a shortcut for
>>> “false”^^xsd:boolean.
>
> So? You can do the same in Turtle and JSON-LD.

My initial question was not about JSON-LD, but about the JSON-LD data model.


>>> Having said this, I do not quite understand why there is a need for
>>> such ‘native values’ in the data model when it’s just a serialization
>>> issue which on its own is perfectly valid as is simplifies a lot. But
>>> on the data model side, it’s more than questionable.
>
> I do not understand this question at all. What is "more than  
> questionable"? The fact the we allow developers to use JSON-native  
> numbers and booleans? Or is it the fact that JSON numbers don't map  
> 1:1 to XSD types?

My problem is: do they match at all? Is this actually the same graph?

<<<
{
   "@context":
   {
     "x":
     {
       "@id": "b",
       "@type": "http://www.w3.org/2001/XMLSchema#integer"
     }
   },
   "@id": "a",
   "x": "4"
}
>>>

<<<
{
     "@id": "a",
     "b": 4
}
>>>

Or aren't they as the two 4s are considered to be different concepts?
(Let's assume that a and b are URIs)

As you can see, my question is not related to any conrete syntax but  
to the very JSON-LD data model which I need to understand in order  
handle/design/consume/create JSON-LD data properly.


> If that's your concern then the answer is actually quite trivial.  
> Look at RFC4627, numbers are of infinite precision and range but  
> off-the-shelf parsers do have limited precision and range.  
> Unfortunately the exact range and precision is not specified so the  
> best we can do is to map them to the best matching types - and  
> sometimes that means that rounding errors may occur. Have a look at
>   http://json-ld.org/spec/latest/json-ld-api/#data-round-tripping
> that should explain it.

That part is perfectly reasonable and indeed necessary.


>>> In order to state it more clearly: 1.  When both a ‘native value’ and
>>> a ‘typed-literal value’ refer to the very same entity, I do not see
>>> the purpose of introducing ‘native values’ as syntactic sugar belongs
>>> to the syntax part and not to the abstract model part.
>
> See above. Unfortunately there's no 1:1 mapping.
>
>
>>> 2.  When they
>>> don’t, the above mentioned questions should be answered clearly within
>>> the spec.
>
> They are, I think
>   http://json-ld.org/spec/latest/json-ld-api/#data-round-tripping
>
>
>> From my understanding, the JSON-LD-API spec [1] (as they are
>> intentionally not normatively referring to either RDF or XMLSchema in
>> the JSON-LD spec to reduce the learning curve for JSON-only
>> developers)
>
> The API spec normatively references both RDF Semantics and XML Schema
>
>
>> provides RDF transformation algorithms that are controlled
>> by the useNativeTypes setting [2] (which is not a field on
>> JsonLdOptions?? [3])
>
> Yeah, there's no API for that, just an algorithm. Defining an RDF  
> API would probably end up in another perma-thread accusing us to be  
> overzealous
>
>
>> to determine whether to migrate numeric and
>> boolean datatypes between XMLSchema and JSON Native datatypes when
>> converting to RDF.
>
> No, when converting *from* RDF.
>
>
>
>> There should be no issues with the basic integer datatype that has the
>> same value space. The issues that I have been enquiring about recently
>> were in the double datatype. The reason that they are not syntactic
>> sugar, from my understanding, are that the value spaces are not
>> equivalent. Ie, you cannot represent some XMLSchema double and decimal
>> numbers in JSON Native.
>
> You can, but some off-the-shelf parsers might not be able to parse them.
>
>
>> The overarching goal of JSON-LD is to be completely compatible with
>> idiomatic JSON, and not RDF, so they must offer the ability for users
>> to use JSON Native types, even if that introduces round-tripping
>> issues.
>
> We support lossless round-tripping of JSON-LD to RDF and back.. but  
> in that case it won't be idiomatic JSON. All you have to do is to  
> set the use native types flag to false when serializing RDF as  
> JSON-LD -- and that's the default value by the way.
>
>
>> Although all RDF libraries will offer full support for the
>> commonly used XMLSchema datatypes, JSON-LD is focused on avoiding any
>> dependencies on RDF or XMLSchema libraries due to a feared backlask by
>> JSON developers if they do.
>
> That's just wrong. IMO it wouldn't make any sense to provide a  
> JSON-based syntax if you can't use it as idiomatic JSON.
>
>
>> JSON developers are notorious for their
>> hatred of anything XML, and (possibly by extension) RDF due to the
>> historical link between RDF and RDF/XML.
>
> I'm not going to comment this one.
>
>
>> The difference with N3 and Turtle are that their native valuespaces
>> are based on XMLSchema datatypes, so there are no issues with
>> conversion to RDF Abstract Model for N3/Turtle/other RDF users who
>> virtually universally are using XMLSchema to represent numeric data
>> internally and in their serialisations
>
> No, the difference is that there already exist JSON parsers for  
> virtually every programming language. That's not the case for N3 and  
> Turtle. The parsers that are being/have been built for that N3 and  
> Turtle are being/have been built exactly for that purpose. I think  
> the majority of the group just tries to ignore that fact. We are not  
> starting at a clean slate. We have developed JSON-LD by considering  
> the current JSON ecosystem. We started with implementations. We had  
> a test suite from the very beginning. The specification was a result  
> of our experiences.
>
>
>> Would it be useful to add a note to RDF-2-JSON-LD transformers that
>> they MAY leave xsd:double values as non-native if they can determine
>> that the transformation would not be lossless, even if the
>> useNativeTypes flag is set to true?
>
> IMO no, if a user sets the use native types flag to true she  
> expresses her intentions quite clear. Why should we ignore that?
>
>
>
> --
> Markus Lanthaler
> @markuslanthaler


-- 
Sven R. Kunze
Chemnitz University of Technology
Department of Computer Science
Distributed and Self-organizing Systems Group
Straße der Nationen 62
D-09107 Chemnitz
Germany
E-Mail: sven.kunze@informatik.tu-chemnitz.de
WWW: http://vsr.informatik.tu-chemnitz.de/people/kunze
Phone: +49 371 531 33882
Received on Thursday, 13 June 2013 15:05:13 UTC