RDF-ISSUE-129 Re: json-ld-api: change proposal for handling of xs:integer from Sandro Hawke on 2013-05-13 (public-rdf-wg@w3.org from May 2013)

From: Sandro Hawke <sandro@w3.org>
Date: Mon, 13 May 2013 07:36:10 -0400
To: Manu Sporny <msporny@digitalbazaar.com>
CC: W3C RDF WG <public-rdf-wg@w3.org>
Message-ID: <5190D02A.6000106@w3.org>
[this is really two related issues -- one about xs:integer, then other 
about xs:double, in JSON-LD.]

On 05/12/2013 09:45 PM, Manu Sporny wrote:
> On 05/10/2013 06:31 PM, Sandro Hawke wrote:
>> I believe, by saying in situations where there might be a loss, one
>> MUST NOT convert to a number.
> We didn't do this because the range for a JSON number isn't defined
> anywhere.
>
>> It's true we don't know exactly when there might be a loss, but after
>> talking with Markus, I'm pretty confident that using the range of
>> 32-bit integers will work well.
> ... except that most systems support 64-bit numbers, and we'd be
> hobbling those systems. :/

Yes, but I'm not sure the demand is *that* great for efficient handling 
of integers outside the range of 32-bits.      We're hobbling their 
handling of numbers in the range of +- (2^31...2^53), for the most part.

But yes, there is a tradeoff of efficiency against correctness.

I can't help wondering how the JSON standards community thinks about 
this.  It seems like a huge problem when transmitting JSON to not know 
if bits will be dropped from your numbers because the receiving system 
is using a different-from-expected representation of numbers.

> We might want to put in guidance that moves the decision to the
> processor (it can detect when a conversion would result in data loss).
> Perhaps it should be up to the implementation to determine when data
> could be lost.

The problem is:

step 1:  64-bit server pulls data out of its quadstore and serializes it 
as JSON-LD
step 2:  Server sends that JSON-LD to client
step 3:  32-bit client uses that data.

If the server is using native json numbers, and some number is in the 
2^31...2^53 range, then the client will silently parse out the wrong 
number.    That's a pretty bad failure mode.    I'm not sure whether 
people will react by:

   - not using native json numbers for that range (as I'm suggesting)
   - insisting that clients handle json numbers the same as the server 
does (somehow)
   - not using native json numbers at all
   - not using json-ld at all

I suspect if we give no guidance, the we'll find ourselves at the later 
options.

>> I'd also add:
>>
>> "1"^^xs:int              // not native since it's 'int' not
>> 'integer' "01"^^xs:integer     // not native since it's not in
>> canonical form
> +1
>
>> These rules will make xs:integer data round tripping through JSON-LD
>> perfectly lossless, I believe, on systems that can handle at least
>> 32 bit integers.
> Yeah, but I'm still concerned about the downsides of limiting the number
> to 32-bits, especially since most of the world will be using 64-bit
> machines from now on.

Another option is to say JSON LD processors MUST retain at least 53 bits 
of precision on numbers (my second option above), but Markus tells me 
PHP compiled for 32-bit hardware, and some C JSON parsers, wont do that.

> I do agree that we might be able to change the text to ensure that
> precision loss isn't an issue, and I do agree with you that it's
> definitely worth trying to prevent data loss.
>
> Tracking the issue here:
>
> http://lists.w3.org/Archives/Public/public-rdf-wg/2013May/0136.html
>
>> On a related topic, there's still the problem of xs:double.  I don't
>> have a good solution there.   I think the only way to prevent
>> datatype corruption there is to say don't use native number when the
>> value happens to be an integer.
> I don't quite understand, can you elaborate a bit more? Do you mean,
> this would be an issue?
>
> "234.0"^^xsd:double --- fromRDF() ---> JsonNumber(234)

Yes.

This manifests as a problem if:

step 1: server produces a JSON-LD document using native numbers
step 2: client receives the data, converts it to RDF
step 3: client merges it with (or compares it to) another copy of the 
data from another source, or passes it on to someone else who might

If "234.0"^^xsd:double occurred in that data, it'll appear as 234 in the 
JSON-LD document, and in step 2 the client will instead add 
"234"^^xs:integer to its database.  Now,  when it merges with another 
copy of the data, or does a diff, or issues a PATCH back to change that 
data -- unless all the other data paths also use JSON-LD with native 
numbers -- the data will be split, with two copies, two triples, or 
something.

The database with one triple:

:alice :age "7.0"^^xs:double.

will quickly turn into a database with two triples:

:alice :age "7.0"^^xs:double.
:alice :age "7"^^xs:integer.

At an application layer that's probably okay -- it's the same number 
after all -- but for the infrastructure it's a real problem. Something 
trying to do graph sync will view it as a change.    If :age is a 
functional property, then an OWL reasoner will flag this data as 
internally inconsistent (because in OWL, the integer 1 and the floating 
point number 1 are not the same number -- grumble if you want, that was 
not a decision made lightly).

Brainstorming....

   Option 0: leave as-is.   RDF data cannot be faithfully transmitted 
through JSON-LD if 'use native numbers' is turned on.

   Option 1: in converting RDF to JSON-LD, processors MUST NOT use 
native json numbers for xs:double literals whose values happen to be 
integers.  Leave them in expanded form.

   Option 2: in converting between RDF and JSON-LD, processors SHOULD 
handle the JSON content as a *string* not an object.  When they 
serialize as double, they SHOULD make sure the representation includes a 
decimal point.  When they parse, they should map numbers with a decimal 
point back to xs:double.   Also, when they parse, they should notice 
numbers that are too big for the local integer representation and keep 
them in a string form.

FWIW, I hate all of these options.   I can't even decide which I hate 
the least.   Seriously hoping someone has a better idea....

      -- Sandro

>
> -- manu
>
Received on Monday, 13 May 2013 11:36:18 UTC