- From: Sandro Hawke <sandro@w3.org>
- Date: Mon, 13 May 2013 15:48:47 -0400
- To: Markus Lanthaler <markus.lanthaler@gmx.net>
- CC: 'W3C RDF WG' <public-rdf-wg@w3.org>
Just a couple short comments inline, for now. On 05/13/2013 01:20 PM, Markus Lanthaler wrote: > On Monday, May 13, 2013 11:25 AM, Gregg Kellogg wrote: >> On May 13, 2013, at 4:36 AM, Sandro Hawke <sandro@w3.org> wrote: >> >>> [this is really two related issues -- one about xs:integer, then >> other about xs:double, in JSON-LD.] >>> On 05/12/2013 09:45 PM, Manu Sporny wrote: >>>> On 05/10/2013 06:31 PM, Sandro Hawke wrote: >>>>> I believe, by saying in situations where there might be a loss, one >>>>> MUST NOT convert to a number. >>>> We didn't do this because the range for a JSON number isn't defined >>>> anywhere. > Right. JSON-LD the data format doesn't has this issue as it has an unlimited > value space. So it's really just problematic for systems converting those > strings (even the things without quotes are strings on the wire) to numbers. > > >>>>> It's true we don't know exactly when there might be a loss, but >> after >>>>> talking with Markus, I'm pretty confident that using the range of >>>>> 32-bit integers will work well. >>>> ... except that most systems support 64-bit numbers, and we'd be >>>> hobbling those systems. :/ > And problem is still there for 16bit or 8bit systems. That might not matter > much in practice but in a couple of years the 32bit limit won't matter > anymore - just as 16bit or 8bit don't do much anymore today. > > >>> Yes, but I'm not sure the demand is *that* great for efficient >> handling of integers outside the range of 32-bits. We're hobbling >> their handling of numbers in the range of +- (2^31...2^53), for the >> most part. >>> But yes, there is a tradeoff of efficiency against correctness. >>> >>> I can't help wondering how the JSON standards community thinks about >> this. It seems like a huge problem when transmitting JSON to not know >> if bits will be dropped from your numbers because the receiving system >> is using a different-from-expected representation of numbers. > Typically large numbers are represented as strings. Twitter run into that > problem when their Tweet IDs crossed 53bit a couple of years ago. They are > now serializing it as both a number and a string; id and id_str, see: > > https://dev.twitter.com/docs/twitter-ids-json-and-snowflake > > In JSON-LD we have a way to add a type to such a string-number, so that > shouldn't be a big problem. > > >> The point of being able to use native numbers in JSON is that this is >> much more convenient for JSON developers to use than strings, which >> might still need tom be evaluated. But it is impossible to do this for >> every possible integer. I think that restricting this to 32 bits is a >> reasonable restriction, given the limitations of important JSON >> parsers, but requiring the use of BigInteger-like libraries should be >> considered. > We need to distinguish between the data format (the thing on the wire) and > processors. On the wire the range and precision is unlimited. Processors > converting that to some native type of course have limitations but as Gregg > said that limit can be stretched quite far these days... even though it > makes implementations much more complicated as off-the-shelves JSON parsers > don't do this (yet). PHP allows e.g. to parse large numbers into strings so > that nothing is lost (except the info that it was a number and not a > string). Losing the fact that it was a number and not string == corrupted data. > >>>> We might want to put in guidance that moves the decision to the >>>> processor (it can detect when a conversion would result in data >> loss). >>>> Perhaps it should be up to the implementation to determine when data >>>> could be lost. > That would be my preferred solution. > > >>> The problem is: >>> >>> step 1: 64-bit server pulls data out of its quadstore and serializes >> it as JSON-LD >>> step 2: Server sends that JSON-LD to client >>> step 3: 32-bit client uses that data. >>> >>> If the server is using native json numbers, and some number is in the >> 2^31...2^53 range, then the client will silently parse out the wrong >> number. That's a pretty bad failure mode. I'm not sure whether >> people will react by: >>> - not using native json numbers for that range (as I'm suggesting) >>> - insisting that clients handle json numbers the same as the server >> does (somehow) >>> - not using native json numbers at all >>> - not using json-ld at all >>> >>> I suspect if we give no guidance, the we'll find ourselves at the >> later options. > I don't agree with that reasoning. JSON does exactly the same and I haven't > heard from people stopping using it because of that. Yeah, in some cases it > might be better to serialize numbers as strings but in contrast to JSON, > JSON-LD allows to add a datatype - so it won't be an opaque string as in > JSON. > > >> Prefer the second option, but could live with the first. >> >>>>> I'd also add: >>>>> >>>>> "1"^^xs:int // not native since it's 'int' not >>>>> 'integer' "01"^^xs:integer // not native since it's not in >>>>> canonical form >>>> +1 > So we are just converting numbers in *canonical* lexical form? Would be fine > with that. > If we want perfect round-tripping, yes, we have to only convert numbers which happen to be in canonical form. >>>>> These rules will make xs:integer data round tripping through JSON- >>>>> LD perfectly lossless, I believe, on systems that can handle at >>>>> least 32 bit integers. >>>> Yeah, but I'm still concerned about the downsides of limiting the >>>> number to 32-bits, especially since most of the world will be using >>>> 64-bit machines from now on. > Me too... and in a couple of years the same will be true about 64bit. > > >>> Another option is to say JSON LD processors MUST retain at least 53 >> bits of precision on numbers (my second option above), but Markus tells >> me PHP compiled for 32-bit hardware, and some C JSON parsers, wont do >> that. > -1, that will make it impossible to implement conformant JSON-LD processors > on certain platforms. > > >> Likely, languages with these limitations have some kind of BigInteger >> implementation; if so, we could consider using the 64-bit space. >> >>>> I do agree that we might be able to change the text to ensure that >>>> precision loss isn't an issue, and I do agree with you that it's >>>> definitely worth trying to prevent data loss. >>>> >>>> Tracking the issue here: >>>> >>>> http://lists.w3.org/Archives/Public/public-rdf-wg/2013May/0136.html >>>> >>>>> On a related topic, there's still the problem of xs:double. I >> don't >>>>> have a good solution there. I think the only way to prevent >>>>> datatype corruption there is to say don't use native number when >> the >>>>> value happens to be an integer. >>>> I don't quite understand, can you elaborate a bit more? Do you mean, >>>> this would be an issue? >>>> >>>> "234.0"^^xsd:double --- fromRDF() ---> JsonNumber(234) >>> Yes. > "234.0"^^xsd:double --- fromRDF() ---> JsonNumber(234) --> toRDF > "234"^^xsd:integer > > >>> Option 0: leave as-is. RDF data cannot be faithfully transmitted >> through JSON-LD if 'use native numbers' is turned on. > That's what the flag is for. I'm wondering how other RDF libraries handle > that!? For example, what happens if you call Jena's getInt() with an integer >> 32bit? Will it throw an exception? > http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/rdf/model/ > Literal.html > > >>> Option 1: in converting RDF to JSON-LD, processors MUST NOT use >> native json numbers for xs:double literals whose values happen to be >> integers. Leave them in expanded form. > That would be a very weird and surprising behavior for most users. > > >>> Option 2: in converting between RDF and JSON-LD, processors SHOULD >> handle the JSON content as a *string* not an object. When they >> serialize as double, they SHOULD make sure the representation includes >> a decimal point. When they parse, they should map numbers with a >> decimal point back to xs:double. Also, when they parse, they should >> notice numbers that are too big for the local integer representation >> and keep them in a string form. > Isn't that exactly what useNativeTypes = false does? > > >>> FWIW, I hate all of these options. I can't even decide which I hate >> the least. Seriously hoping someone has a better idea.... >> >> The point of having the useNativeTypes flag is to address these issues, >> hobbling the implementations for all implementations to guarantee no >> data loss goes against the whole point of using a JSON representation >> in the first place; the format is optimized for applications, > I think we should keep in mind that we are primarily designing a data > format. The data has none of these issues as numbers can be of arbitrary > size and precision. The problem manifests itself when those numbers are > converted to some native representation. You have the same problem > anywhere.. plain-old JSON, XML, etc. I think we should just add a note or > something highlighting the problem and explaining the various approaches to > avoid it. > > >> Any JSON-LD processor can faithfully transform from other RDF formats >> by turning off the useNativeTypes option; the only thing to consider is >> if this guidance needs to be made more pro intent and if we should >> consider changing the default for that option. > +1.. don't care much about the default value. > > >> Option 0 preserves the intent of the format the best, but developers >> should be aware that, for the sake of convenience and utility, >> developers should recognize the possibility of round-tripping errors. > +1, that's how JSON has been successfully used for years. > > >> Option 1 is much more inconvenient for developers, as their code now >> needs to branch if the value is a string or hash, rather than just >> count on its being a number. > -1, very unintuitive behavior > > >> Option 2 places more of a burden on processor developers. In Ruby, I'd >> need to always use custom datatypes for numbers to carry around the >> original lexical representation, but this could be easily lost through >> intermediate operations. I'd also need a custom JSON parser and >> serializer to ensure that the serialized form is represented properly, >> not worth it IMO. > Just use useNativeTypes = false if you want that behavior. Requiring > implementers to write their own JSON parsers is not an option in my opinion. Sorry, the flag doesn't really help in the scenario I provided above, where someone is serving JSON-LD to an unknown client. I would argue this is the expected, majority use case. (I guess the other one that might be common is reading RDF and converting it to JSON-LD for internal use as a kind API to the data?) And in this use case, if the server sets useNativeTypes=true, it's going to be providing data that for some data values the client will get either the wrong RDF data value and/or the wrong RDF data type. In other words with useNativeTypes turned on, JSON-LD is not a faithful RDF syntax. Given that -- which you all seem invested in -- maybe we should go all the way and convert all RDF numeric literals to native JSON numbers. It makes the lossy-but-convenient conversion even more convenient and lossy in a less-surprising way. Rather than weirdly having *some* doubles turned into integers in the rdf->json->rdf round trip, we'd just have (I propose) EVERY numeric literal turned into an xs:double. Certainly that's what pretty much every JavaScript coder would want/expect with useNativeTypes=true. I'd also suggest we say the people SHOULD NOT publish JSON-LD with json-native numbers in it, unless they're fine with them being understood in a platform-dependent way. -- Sandro > > > -- > Markus Lanthaler > @markuslanthaler > > >
Received on Monday, 13 May 2013 19:48:55 UTC