- From: Sandro Hawke <sandro@w3.org>
- Date: Mon, 13 May 2013 07:36:10 -0400
- To: Manu Sporny <msporny@digitalbazaar.com>
- CC: W3C RDF WG <public-rdf-wg@w3.org>
[this is really two related issues -- one about xs:integer, then other about xs:double, in JSON-LD.] On 05/12/2013 09:45 PM, Manu Sporny wrote: > On 05/10/2013 06:31 PM, Sandro Hawke wrote: >> I believe, by saying in situations where there might be a loss, one >> MUST NOT convert to a number. > We didn't do this because the range for a JSON number isn't defined > anywhere. > >> It's true we don't know exactly when there might be a loss, but after >> talking with Markus, I'm pretty confident that using the range of >> 32-bit integers will work well. > ... except that most systems support 64-bit numbers, and we'd be > hobbling those systems. :/ Yes, but I'm not sure the demand is *that* great for efficient handling of integers outside the range of 32-bits. We're hobbling their handling of numbers in the range of +- (2^31...2^53), for the most part. But yes, there is a tradeoff of efficiency against correctness. I can't help wondering how the JSON standards community thinks about this. It seems like a huge problem when transmitting JSON to not know if bits will be dropped from your numbers because the receiving system is using a different-from-expected representation of numbers. > We might want to put in guidance that moves the decision to the > processor (it can detect when a conversion would result in data loss). > Perhaps it should be up to the implementation to determine when data > could be lost. The problem is: step 1: 64-bit server pulls data out of its quadstore and serializes it as JSON-LD step 2: Server sends that JSON-LD to client step 3: 32-bit client uses that data. If the server is using native json numbers, and some number is in the 2^31...2^53 range, then the client will silently parse out the wrong number. That's a pretty bad failure mode. I'm not sure whether people will react by: - not using native json numbers for that range (as I'm suggesting) - insisting that clients handle json numbers the same as the server does (somehow) - not using native json numbers at all - not using json-ld at all I suspect if we give no guidance, the we'll find ourselves at the later options. >> I'd also add: >> >> "1"^^xs:int // not native since it's 'int' not >> 'integer' "01"^^xs:integer // not native since it's not in >> canonical form > +1 > >> These rules will make xs:integer data round tripping through JSON-LD >> perfectly lossless, I believe, on systems that can handle at least >> 32 bit integers. > Yeah, but I'm still concerned about the downsides of limiting the number > to 32-bits, especially since most of the world will be using 64-bit > machines from now on. Another option is to say JSON LD processors MUST retain at least 53 bits of precision on numbers (my second option above), but Markus tells me PHP compiled for 32-bit hardware, and some C JSON parsers, wont do that. > I do agree that we might be able to change the text to ensure that > precision loss isn't an issue, and I do agree with you that it's > definitely worth trying to prevent data loss. > > Tracking the issue here: > > http://lists.w3.org/Archives/Public/public-rdf-wg/2013May/0136.html > >> On a related topic, there's still the problem of xs:double. I don't >> have a good solution there. I think the only way to prevent >> datatype corruption there is to say don't use native number when the >> value happens to be an integer. > I don't quite understand, can you elaborate a bit more? Do you mean, > this would be an issue? > > "234.0"^^xsd:double --- fromRDF() ---> JsonNumber(234) Yes. This manifests as a problem if: step 1: server produces a JSON-LD document using native numbers step 2: client receives the data, converts it to RDF step 3: client merges it with (or compares it to) another copy of the data from another source, or passes it on to someone else who might If "234.0"^^xsd:double occurred in that data, it'll appear as 234 in the JSON-LD document, and in step 2 the client will instead add "234"^^xs:integer to its database. Now, when it merges with another copy of the data, or does a diff, or issues a PATCH back to change that data -- unless all the other data paths also use JSON-LD with native numbers -- the data will be split, with two copies, two triples, or something. The database with one triple: :alice :age "7.0"^^xs:double. will quickly turn into a database with two triples: :alice :age "7.0"^^xs:double. :alice :age "7"^^xs:integer. At an application layer that's probably okay -- it's the same number after all -- but for the infrastructure it's a real problem. Something trying to do graph sync will view it as a change. If :age is a functional property, then an OWL reasoner will flag this data as internally inconsistent (because in OWL, the integer 1 and the floating point number 1 are not the same number -- grumble if you want, that was not a decision made lightly). Brainstorming.... Option 0: leave as-is. RDF data cannot be faithfully transmitted through JSON-LD if 'use native numbers' is turned on. Option 1: in converting RDF to JSON-LD, processors MUST NOT use native json numbers for xs:double literals whose values happen to be integers. Leave them in expanded form. Option 2: in converting between RDF and JSON-LD, processors SHOULD handle the JSON content as a *string* not an object. When they serialize as double, they SHOULD make sure the representation includes a decimal point. When they parse, they should map numbers with a decimal point back to xs:double. Also, when they parse, they should notice numbers that are too big for the local integer representation and keep them in a string form. FWIW, I hate all of these options. I can't even decide which I hate the least. Seriously hoping someone has a better idea.... -- Sandro > > -- manu >
Received on Monday, 13 May 2013 11:36:18 UTC