Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

On 2 Jul 2008, at 20:09, Boris Motik wrote:

> Hello,
>
> In light of our discussion at today's teleconf, I would like to put  
> forward a new proposal for resolving the dilemma regarding the
> real vs. float vs. double problem.
>
> 1. We make all of xsd:float, xsd:double, and owl:real disjoint.

Agreed.

> 2a. We make the value space of xsd:float as containing the  
> following things:
>
> - the special values NaN, +inf, and -inf

Agreed.

> - the continuous range of real numbers between the real number min  
> = -(2^24-1)*2^(-149) and the real number max = (2^24-1)*2^104,
> inclusive.

We should separate this out. I think there's a case to be made for  
keeping them finite.

> 2b. We make the set of constants of xsd:float exactly the same as  
> in XML Schema. Thus, the lexical representations are INF, -INF,
> NaN, and all strings of the form "manEex", where "man" is the  
> decimal mantissa and "ex" is an integer.

Sounds good.

> By the way, I think that XML Schema specification (Section 3.2.4)  
> is incomplete here: we should additionally say what happens in the
> case of constants "manEex" for which man*10^ex is *NOT  
> REPRESENTABLE* in the form m*2^e matching the conditions of  
> xsd:float. The
> problem is that, since we are dealing with a logical language, we  
> need to specify how rounding is handled.

Agreed.

> We have two options:
> - we disqualify such constants as syntactically incorrect, or
> - we say that such constants stand for the nearest rounded float.

I prefer the former.

> 3. We do the similar thing for xsd:double.

Pace the continuity.

> 4. We would disallow the "pattern" facet for all numeric datatypes.

Agreed.

> In this way, we've stayed true to the XML Schema specification in  
> the sense that all datatypes are disjoint, and we provide for the
> special values. The only departure from XML Schema is that we  
> officially need not worry about the discreteness of xsd:float and
> xsd:double during reasoning. I find this last point quite  
> important: what is the point in producing a spec that nobody will
> implement in its entirety?

But this is true now, yes? as one can provide arbitrarily large, but  
finite datatypes working with ranges of integers. Indeed, I wouldn't  
be surprised if it weren't pretty easy to do with strings as well.

Given that that's the case, I think we should leave it to be correct  
with respect to the type and just warn people. All things being  
equal, I'd rather restrict them syntactically then to mess with their  
semantics. The only reason to use a float or a double type in a  
complex way is when you care about the discreteness. I'd rather defer  
all that aspect to people who, e.g., want to deal with logical float  
based equations (e.g., using intervals, or what have you).

It's not an happy situation we're in, so I'm inclined to be  
conservative with regard to the semantics even at the cost of  
practical implementation of everything wrt those types.

> Let me know how you feel about this.

I'm really hesitant to specify a variant semantics in the absence of  
more concrete understanding of the use space. In a sense, we're faced  
with some legacy stuff. I'd prefer not to fix it too much in advance  
of a lot more knowledge. If it were the only case where we had large  
finite ranges of concrete values, I would be more amenable. Since we  
can define such anyway, I don't see that this is *so* much worse.  
(It's a bit worse as it's built in, so requires less effort for  
people to hit.)

Cheers,
Bijan.

Received on Friday, 4 July 2008 13:56:42 UTC