Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

On Jul 6, 2008, at 10:55 PM, Rob Shearer wrote:

>>> XSD offers a lexical syntax for points that happen to lie on the  
>>> real number line
>>
>> It offers several and we're free to define one for owl:real. If we  
>> use any decimal notation, we have exactness problems (e.g., 1/3),  
>> but decimal is very user friendly. So, I was thinking that the  
>> valid syntax for a real would be decimal floating points and  
>> ratios of integers. We could include scientific notation as well.
>
> Why on earth would the OWL group come up with their own syntax for  
> encoding numbers?

I'm presuming we're sticking with the basic xsd framework. So types  
have a lexical space and a values space. So, owl:real has a value  
space of the reals. But what should the lexical space be? I'd propose  
that at least the union of the xsd numeric types lexical spaces be  
the lexical space for our new type. I would add additional syntax for  
exact rationals (such as 1/3). The first part is isomorphic to your  
proposal about xsd syntax, I believe.

> The XSchema guys have already done that, and people have  
> implemented parsers for their spec. If there's going to be a syntax  
> for rationals or algebraics, then that seems to be right up their  
> alley.

They don't seem interested, alas.

>>> But my main point is that users have no interest in the "holes"  
>>> introduced by the xsd:float value space: providing them access to  
>>> a value space of numbers representable in float representation is  
>>> not useful, and could lead to lots of confusion, particularly if  
>>> users could easily use such a space "by accident".
>>
>> Well, you'll get exactness holes with binary or decimal notation,  
>> regardless of density issues.
>
> I thought I had made my proposal clear on this: the value space  
> does not have holes.

Sure.

> The representations supported for particular values are not  
> sufficient to address all the points in that space, but the space  
> itself does *not* have holes.

I just meant things that you can't write down 1/3 in decimal. That's  
all.

>>> I don't know what you mean by "lexical space of the reals".
>>
>> XSD datatypes have a lexical space (e.g., the syntax) and a value  
>> space. You are suggesting, I thought, that we adopt a value space  
>> that is the reals and something about using xsd syntax (i.e.,  
>> lexical spaces) for the syntax.
>
> For the syntax of particular values. I keep trying to stress that  
> values spaces should be kept separate from the syntax used for  
> particular values.

Sure. But that's true in XSD as well. From what I can tell, you want  
all the literals that have "xsd:float" (to pick an example) to map to  
(a subset) of the reals (as the value space) and constrain/enable  
certain syntax. So "1.0"^^xsd:float would be a syntax error.

>> XSD offers exact syntax only for binary and decimals (I believe  
>> it's exact for binary). I was wondering what sort of lexical space  
>> you want.
>
> XSD offers a well-defined mapping from lexical representation to  
> IEEE floats.

Yes. I just hadn't checked the spec, hence my hesitation.

> XSD defines an *exact* value for each valid lexical representaion.  
> You may not like the way the mapping is defined (because the value  
> of "1.1e0^^xsd:float" on the real number line is not equal to the  
> value of "1.1^^xsd:decimal"),

No that's fine.

> but there is no imprecision whatsoever about what each string  
> represents.

You've got the wrong string. I only hedged because I hadn't looked  
and I don't like to speak with certainy without looking. My point was  
only that there are numbers which can not be exactly represented in  
binary or in decimal.

> I am satisfied with the work the XSchema group did on floating- 
> point lexical representations.
>
>>> But implementations should allow users to specify particular  
>>> points in that value space using the lexical representations for  
>>> `xsd:float` and `xsd:int` values.
>>
>> So you want a very broad lexical space for our real type, i.e.,  
>> "1", "1.0",  and "12.78e-2".
>
> No. I want `real` to be a value space with no lexical connotations.

I'd be surprised if we could get consensus on abandoning the lexical  
space/value space language and understanding. It's pretty deeply  
embedded into RDF.

> I want to be able to specify a particular point in this value space  
> using a string such as "1.0e0^^xsd:float".

Yeah, I'm kinda against that. But I would support "1.0e0^^owl:real".

> The XSD lexical forms are not "the lexical space for reals". There  
> is no such thing as "the lexical space for reals".

Bravo! ;)

> There is such a thing as "the space of lexical representations  
> which a conformant implementation must support for particular  
> values in the real value space", but this space is much smaller  
> than the real value space.

Our initial proposal for owl:real is to support for syntax, pairs of  
integers with the second being non-zero (i.e., standard fraction  
syntax for rationals) and (at least) the algebraic reals for the  
value space. If you don't have equations or special constants, you  
can't address the irrationals or transcendentals anyway. We are  
aiming to support some classes of equation, but only with rational  
constants.

>> If we want exactness for the rationals, we need either to allow  
>> repeating (e.g., 0.333repeating) (usually done with a macron) or  
>> fraction syntax (e.g., 1/3).
>
> I don't intend to support exactness for rationals. A conformant  
> implementation should only be required to provide exact support for  
> `xsd:int` and `xsd:float` values.

I don't think that would fly.

[snip]
>> Given that more and more languages (e.g., Java) now bundle a  
>> decimal type with their core libraries, I'm not so clear on the  
>> first.
>
> I'm not sure Java is an example of "more and more languages". In  
> fact it is the flagship "you only ever need one language" proposal.

I picked java because it didn't have it for a long time and now it  
does. To pick another example, Python now has a bundled decimal  
class. Both of these are quite recent additions to popular languages.  
SQL supports it.  Visual Basic seems to.

> And even in super-OO Java you have to program differently if you're  
> going to play with polymorphic numbers than you would if you stuck  
> to ints and floats.
>
> I'd like to write a distributed OWL reasoner in Erlang. But  
> Javascript and C are perhaps more persuasive counterexamples to  
> your argument.

Javascript is a bit odd in not supporting integers either :) There  
are high quality decimal libraries for C++ (e.g., from IBM) and the  
committee is considering decimal support (<http://open-std.org/JTC1/ 
SC22/WG21/>)

>> I'd like to hear more about the second.
>
> The most efficient bignum and decimal libraries are an order of  
> magnitude slower than corresponding int and float calculations.  
> Hardware is good with ints and floats.

Sure, but I wouldn't have thought that this would be a significant  
factor. Obviously, if the user writes really big or really small  
numbers, you have to deal with them anyway. If you only have user- 
defined types (no equations), then the operation (and number there  
of) is pretty limited (inclusion and cardinality testing). I'm a bit  
skeptical that it makes a huge practical difference. Perhaps because  
it doesn't come up too much.

Also, perhaps I misrecall, but don't you want arbitrarily sized floats?

"""For the restriction "forall R `xsd:float`" I simply bounded the  
real number line at the min and max values of floats. Still a dense,  
infinite number line, but with bounds. I hated this usage, however,  
and would prefer if it became illegal."""

So you did bound...but you "hate it"? Which, the bounds? the  
universal quantifier?

Implementations could always throw a warning or error if they hit a  
too large number.

>>> ---thus a vocabulary for what it means to "support" a numeric xsd  
>>> type for particular values would be useful.
>>
>> This is what we're after. Anything we spec will be tightly  
>> specced. At the moment, we only have required and optional as  
>> modalities of support. I think supporting various levels of  
>> precision  (or variant mapping) would be quite hard to understand.
>
> But presumably you're making clear that implementations which  
> implement some "optional" functionality, but do so in a way which  
> contradicts the optional semantics, are non-compliant.

That's always a problem with optional :(

> If so, then specifying what support for additional lexical  
> representations means (i.e. exact) would make clear that a product  
> which parsed `xsd:decimal` but internally converted to floating  
> point would not "support `xsd:decimal`" by the terms of the OWL 2.0  
> spec.

They can convert as long as the observable behavior is the same.

> The implementors could always claim "partial support", however.

If they are going to vary in observable ways, I would prefer that  
they would make that clear in documentation and by giving warnings. A  
"strict" mode would also be quite welcome to me as a user.

>> One this model, users would just have to decide between integers  
>> and reals. We could have quite a wide lexical space for reals (and  
>> even for integers, i.e., allow 1.0 to mean the integer 1).
>
> I'm getting really confused what you're talking about---constants  
> appearing in XML and RDF OWL 2.0 files should be typed; there's no  
> need at all to guess the type based on syntax.
>
> And of course "1.0e0^^xsd:float" and "1^^xsd:integer" are exactly  
> the same point on the real number line.

Sure.

But I was talking about owl:real. It seems reasonable to allow  
"1.0e0^^owl:real:" and "1^^owl:real". (xsd:integer could be a subtype  
of owl:real as well).

>> But "0.1"^^xsd:float would not be required, but also we wouldn't  
>> change the meaning along the lines you suggest (we'd just be  
>> silent about it). It's fairly simple to migrate old ontologies to  
>> the new one with a simple converter. If enough implementations did  
>> it silently, that would be information for a future group.
>
> No idea what this means. But I'm guessing I disagree with it.

Me too :)

Cheers,
Bijan.

Received on Sunday, 6 July 2008 23:03:22 UTC