Re: RDF-ISSUE-129 Re: json-ld-api: change proposal for handling of xs:integer from Gregg Kellogg on 2013-05-13 (public-rdf-wg@w3.org from May 2013)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Mon, 13 May 2013 07:24:38 -0700
To: Sandro Hawke <sandro@w3.org>
Cc: Manu Sporny <msporny@digitalbazaar.com>, W3C RDF WG <public-rdf-wg@w3.org>
Message-Id: <0716DEB1-77FF-441C-AA94-CD9ED93BDAD9@greggkellogg.net>
On May 13, 2013, at 4:36 AM, Sandro Hawke <sandro@w3.org> wrote:

> [this is really two related issues -- one about xs:integer, then other about xs:double, in JSON-LD.]
> 
> On 05/12/2013 09:45 PM, Manu Sporny wrote:
>> On 05/10/2013 06:31 PM, Sandro Hawke wrote:
>>> I believe, by saying in situations where there might be a loss, one
>>> MUST NOT convert to a number.
>> We didn't do this because the range for a JSON number isn't defined
>> anywhere.
>> 
>>> It's true we don't know exactly when there might be a loss, but after
>>> talking with Markus, I'm pretty confident that using the range of
>>> 32-bit integers will work well.
>> ... except that most systems support 64-bit numbers, and we'd be
>> hobbling those systems. :/
> 
> Yes, but I'm not sure the demand is *that* great for efficient handling of integers outside the range of 32-bits.      We're hobbling their handling of numbers in the range of +- (2^31...2^53), for the most part.
> 
> But yes, there is a tradeoff of efficiency against correctness.
> 
> I can't help wondering how the JSON standards community thinks about this.  It seems like a huge problem when transmitting JSON to not know if bits will be dropped from your numbers because the receiving system is using a different-from-expected representation of numbers.

The point of being able to use native numbers in JSON is that this is much more convenient for JSON developers to use than strings, which might still need tom be evaluated. But it is impossible to do this for every possible integer. I think that restricting this to 32 bits is a reasonable restriction, given the limitations of important JSON parsers, but requiring the use of BigInteger-like libraries should be considered.

>> We might want to put in guidance that moves the decision to the
>> processor (it can detect when a conversion would result in data loss).
>> Perhaps it should be up to the implementation to determine when data
>> could be lost.
> 
> The problem is:
> 
> step 1:  64-bit server pulls data out of its quadstore and serializes it as JSON-LD
> step 2:  Server sends that JSON-LD to client
> step 3:  32-bit client uses that data.
> 
> If the server is using native json numbers, and some number is in the 2^31...2^53 range, then the client will silently parse out the wrong number.    That's a pretty bad failure mode.    I'm not sure whether people will react by:
> 
>  - not using native json numbers for that range (as I'm suggesting)
>  - insisting that clients handle json numbers the same as the server does (somehow)
>  - not using native json numbers at all
>  - not using json-ld at all
> 
> I suspect if we give no guidance, the we'll find ourselves at the later options.

Prefer the second option, but could live with the first.

>>> I'd also add:
>>> 
>>> "1"^^xs:int              // not native since it's 'int' not
>>> 'integer' "01"^^xs:integer     // not native since it's not in
>>> canonical form
>> +1
>> 
>>> These rules will make xs:integer data round tripping through JSON-LD
>>> perfectly lossless, I believe, on systems that can handle at least
>>> 32 bit integers.
>> Yeah, but I'm still concerned about the downsides of limiting the number
>> to 32-bits, especially since most of the world will be using 64-bit
>> machines from now on.
> 
> Another option is to say JSON LD processors MUST retain at least 53 bits of precision on numbers (my second option above), but Markus tells me PHP compiled for 32-bit hardware, and some C JSON parsers, wont do that.

Likely, languages with these limitations have some kind of BigInteger implementation; if so, we could consider using the 64-bit space.

>> I do agree that we might be able to change the text to ensure that
>> precision loss isn't an issue, and I do agree with you that it's
>> definitely worth trying to prevent data loss.
>> 
>> Tracking the issue here:
>> 
>> http://lists.w3.org/Archives/Public/public-rdf-wg/2013May/0136.html
>> 
>>> On a related topic, there's still the problem of xs:double.  I don't
>>> have a good solution there.   I think the only way to prevent
>>> datatype corruption there is to say don't use native number when the
>>> value happens to be an integer.
>> I don't quite understand, can you elaborate a bit more? Do you mean,
>> this would be an issue?
>> 
>> "234.0"^^xsd:double --- fromRDF() ---> JsonNumber(234)
> 
> Yes.
> 
> This manifests as a problem if:
> 
> step 1: server produces a JSON-LD document using native numbers
> step 2: client receives the data, converts it to RDF
> step 3: client merges it with (or compares it to) another copy of the data from another source, or passes it on to someone else who might
> 
> If "234.0"^^xsd:double occurred in that data, it'll appear as 234 in the JSON-LD document, and in step 2 the client will instead add "234"^^xs:integer to its database.  Now,  when it merges with another copy of the data, or does a diff, or issues a PATCH back to change that data -- unless all the other data paths also use JSON-LD with native numbers -- the data will be split, with two copies, two triples, or something.
> 
> The database with one triple:
> 
> :alice :age "7.0"^^xs:double.
> 
> will quickly turn into a database with two triples:
> 
> :alice :age "7.0"^^xs:double.
> :alice :age "7"^^xs:integer.
> 
> At an application layer that's probably okay -- it's the same number after all -- but for the infrastructure it's a real problem. Something trying to do graph sync will view it as a change.    If :age is a functional property, then an OWL reasoner will flag this data as internally inconsistent (because in OWL, the integer 1 and the floating point number 1 are not the same number -- grumble if you want, that was not a decision made lightly).
> 
> Brainstorming....
> 
>  Option 0: leave as-is.   RDF data cannot be faithfully transmitted through JSON-LD if 'use native numbers' is turned on.
> 
>  Option 1: in converting RDF to JSON-LD, processors MUST NOT use native json numbers for xs:double literals whose values happen to be integers.  Leave them in expanded form.
> 
>  Option 2: in converting between RDF and JSON-LD, processors SHOULD handle the JSON content as a *string* not an object.  When they serialize as double, they SHOULD make sure the representation includes a decimal point.  When they parse, they should map numbers with a decimal point back to xs:double.   Also, when they parse, they should notice numbers that are too big for the local integer representation and keep them in a string form.
> 
> FWIW, I hate all of these options.   I can't even decide which I hate the least.   Seriously hoping someone has a better idea....

The point of having the useNativeTypes flag is to address these issues, hobbling the implementations for all implementations to guarantee no data loss goes against the whole point of using a JSON representation in the first place; the format is optimized for applications,

Any JSON-LD processor can faithfully transform from other RDF formats by turning off the useNativeTypes option; the only thing to consider is if this guidance needs to be made more pro intent and if we should consider changing the default for that option.

Option 0 preserves the intent of the format the best, but developers should be aware that, for the sake of convenience and utility, developers should recognize the possibility of round-tripping errors.

Option 1 is much more inconvenient for developers, as their code now needs to branch if the value is a string or hash, rather than just count on its being a number.

Option 2 places more of a burden on processor developers. In Ruby, I'd need to always use custom datatypes for numbers to carry around the original lexical representation, but this could be easily lost through intermediate operations. I'd also need a custom JSON parser and serializer to ensure that the serialized form is represented properly, not worth it IMO.

Gregg

>     -- Sandro
> 
>> 
>> -- manu
> 
>
Received on Monday, 13 May 2013 14:25:05 UTC