Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

On Jul 6, 2008, at 8:07 PM, Rob Shearer wrote:

>>> Most importantly, I do not think there is necessarily a direct  
>>> correlation between the lexical representations used to represent  
>>> particular values and the value spaces in which those particular  
>>> values live. I.e. users want to be able to specify particular  
>>> values within the `real` value space using `xsd:float`,
>>
>> You mean the type name or the lexical syntax (e.g., "12.78e-2")?
>
> XSD offers a lexical syntax for points that happen to lie on the  
> real number line

It offers several and we're free to define one for owl:real. If we  
use any decimal notation, we have exactness problems (e.g., 1/3), but  
decimal is very user friendly. So, I was thinking that the valid  
syntax for a real would be decimal floating points and ratios of  
integers. We could include scientific notation as well.

> ---that's what I suggest using it for. The easiest approach is that  
> xsd names on their own are not valid "datatypes"; particular values  
> encoded using xsd, however, are (because particular values are  
> single-element value spaces).
>
>> I'm personally more comfortable with allowing the latter than  
>> pushing "xsd:float" as a synonym for the real value space. Your  
>> milage obviously varies.
>>
>>> but they do *not* have any interest in use of the `xsd:float`  
>>> value space.
>>
>> Some do at least to the extent of wanting NaN (and perhaps -0).  
>> I'd personally prefer not to shove them into the real type  
>> (certainly NaN; I suppose we could make our reals the affine reals  
>> and handle +inf).
>
> I'd endorse including only one zero, but I agree there's an issue  
> with NaN.

And the infinities, though we could always go for the affine real line.

> My principled stand is that it's inconsistent (a value space of  
> size zero), but I'd definitely want to analyze the use cases to see  
> who loses important functionality from that decision.
>
> But my main point is that users have no interest in the "holes"  
> introduced by the xsd:float value space: providing them access to a  
> value space of numbers representable in float representation is not  
> useful, and could lead to lots of confusion, particularly if users  
> could easily use such a space "by accident".

Well, you'll get exactness holes with binary or decimal notation,  
regardless of density issues.

> That's the situation we've fallen into with floats in OWL 1.0.
>
>>> Thus we've got two orthogonal concepts which happen to coincide  
>>> for strings and integers but not for real numbers.
>>>
>>> My proposed solution would be to use brand-new OWL names for all  
>>> value spaces, but use xsd syntax to specify particular values.
>>
>> Could you say what you think the lexical space of the reals should  
>> include?
>
> I don't know what you mean by "lexical space of the reals".

XSD datatypes have a lexical space (e.g., the syntax) and a value  
space. You are suggesting, I thought, that we adopt a value space  
that is the reals and something about using xsd syntax (i.e., lexical  
spaces) for the syntax. XSD offers exact syntax only for binary and  
decimals (I believe it's exact for binary). I was wondering what sort  
of lexical space you want.

> I don't propose defining the reals lexically;

Sure.

> I propose defining the value space mathematically.

Well, of course. But that's what XSD does as well. The decimals are a  
well defined mathematical set.

> But implementations should allow users to specify particular points  
> in that value space using the lexical representations for  
> `xsd:float` and `xsd:int` values.

So you want a very broad lexical space for our real type, i.e., "1",  
"1.0",  and "12.78e-2". If we want exactness for the rationals, we  
need either to allow repeating (e.g., 0.333repeating) (usually done  
with a macron) or fraction syntax (e.g., 1/3).

> I expect most implementations will also support points represented  
> as `xsd:double` and `xsd:long` as well.

You mean their syntax, i.e., their lexical space.

(Sorry for using the XSD terminology, but I think it's a bit clearer  
if we stick to it for the moment.)

> I
> do *not* think a conformant implementations should have to deal  
> with arbitrary points represented as `xsd:decimal` (since the vast  
> majority of users don't need the extra representational power, and  
> there is substantial implementation burden and performance penalty  
> for dealing with such values correctly).

Given that more and more languages (e.g., Java) now bundle a decimal  
type with their core libraries, I'm not so clear on the first. I'd  
like to hear more about the second.

>> At least, as a first cut? (It seems decimal, scientific, and  
>> rational notation would all be useful, the first two for common  
>> ways of writing and the third for full coverage of the rationals.)
>
> The WG should consider that some implementations might allow lots  
> of xsd syntaxes but lose precision on some of them (allow use of  
> `xsd:decimal` in ontology files for user convenience, but convert  
> them to floats during parsing)

Obviously, this can cause quite serious interoperability problems.  
Some I'm inclined against it on first blush.

> ---thus a vocabulary for what it means to "support" a numeric xsd  
> type for particular values would be useful.

This is what we're after. Anything we spec will be tightly specced.  
At the moment, we only have required and optional as modalities of  
support. I think supporting various levels of precision  (or variant  
mapping) would be quite hard to understand.

> My big concern here is that an ontology will be developed and  
> tested with a reasoner with "full" `xsd:decimal` support but then  
> when it's used with an implementation with "imprecise"  
> `xsd:decimal` support everything goes pear-shaped.

That would be bad :) There could be subtler problems if people mapped  
decimal syntax to binary in variant ways (i.e., which float do you  
take 0.1 to?)

> Spitting out warnings during parsing isn't a great solution...
>
> And of course some implementations might offer additional value  
> spaces as well, but I'd like the spec to make it very clear that  
> this is a very different thing than the above. For one thing, I'd  
> suggest outlawing any use of names within the xsd namespace for  
> value spaces, even spaces implementors have added as extensions.  
> "Support for `xsd:decimal`" should mean `xsd:decimal` syntax for  
> points on the real number line and nothing else.\

This doesn't seem likely. Existing implementations already do  
different things with different xsd types. It'll be very hard to get  
buy in from the RDF community. It seems like a more likely strategy  
is to fix a (required) set of OWL types (or core types) which are  
easy to understand and robust with respect to intuitive behavior, and  
leave the more specialized types for future people to standardize.

One this model, users would just have to decide between integers and  
reals. We could have quite a wide lexical space for reals (and even  
for integers, i.e., allow 1.0 to mean the integer 1). But  
"0.1"^^xsd:float would not be required, but also we wouldn't change  
the meaning along the lines you suggest (we'd just be silent about  
it). It's fairly simple to migrate old ontologies to the new one with  
a simple converter. If enough implementations did it silently, that  
would be information for a future group.

Thanks again.

Cheers,
Bijan.

Received on Sunday, 6 July 2008 20:55:00 UTC