Re: A possible structure of the datatype system for OWL 2 (related to ISSUE-126) from Bijan Parsia on 2008-07-10 (public-owl-wg@w3.org from July 2008)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Thu, 10 Jul 2008 16:42:59 +0100
To: Rob Shearer <rob.shearer@comlab.ox.ac.uk>
Cc: Michael Smith <msmith@clarkparsia.com>, public-owl-wg@w3.org
Message-Id: <CACCB0C7-C8B7-4CE5-9426-F110E1B9E20B@cs.man.ac.uk>
On 10 Jul 2008, at 15:31, Rob Shearer wrote:

>> I was concerned about the large number of possible constants (e.g.,
>> "0.01") for which there is not a directly corresponding value in the
>> float value space.  [2] indicates to me that for some those, there  
>> are
>> alternative lexical to value space mappings, which could cause  
>> problems.
>
> Let's be a bit more explicit about the exactly problems we're  
> facing, though. Suppose the decimal strings a and b encode numeric  
> values, and that IEEE-754-compliant rounding model X would  
> interpret both as the floating-point value c, while a different  
> IEEE-754-compliant rounding model would encode a as c but would  
> encode b as the (different) floating-point value d. Then the class:
>
> 	(forall R (= "a"^^xsd:float)) and (exists R (= "b"^^xsd:float))
>
> Would be satisfiable in implementation X but not in implementation Y.
>
> Two notes on this:
>
> 1. You obviously need ambiguous values for this problem to arise,  
> and the IEEE spec only allows ambiguity for pretty wacky numbers.  
> Anything representable (not just represented, but representable) in  
> decimal as ±M × 10^{±N} for M < 10^9 and N < 14 is unambiguous (M <  
> 10^17 and N < 28 for double-precision), so users need to type a lot  
> of digits before they start experiencing the problem.

That's nice. I knew that for integers you had to get quite large  
before exactness became an issue.

I wasn't able to get to the spec online...do you have a pointer?

> 2. A single ambiguous value does not in itself produce a problem.  
> Problems only arise in ontologies which use one ambiguous value, as  
> well as other values within the range of IEEE-754-legal  
> interpretations of that value. Given that the IEEE-754-legal range  
> is quite small (a limited error is allowed only in the least  
> significant digit of the destination type, which I read as the last  
> bit of the float), this issue is unlikely to arise very often.
>
> I agree that this situation kind of sucks, but I don't see any  
> viable alternatives. The non-viable ones I can come up with are:
>
> 1. Disallow floating-point numbers entirely. This seems like a non- 
> starter---the vast vast majority of scientific data makes use of  
> such numbers.
> 2. Only allow unambiguous floating-point representations. This  
> seems a clear violation of the XSchema semantics. What is more, the  
> implementation burden seems high; libraries usually don't have this  
> functionality. I wouldn't know how to implement this, and I  
> strongly suspect many implementors would either not implement it at  
> all or get it wrong.

For integers it's pretty easy as it's just a size bound, yes?

I need to poke into the other to see if we could characterize it as a  
number of digits like thing. It seems like that's what you said above.

> 3. Impose explicit rounding rules above and beyond IEEE-754. This  
> would break every floating point implementation in existence. I  
> would encourage Oxford to object to such a model, and I can't  
> imagine such a proposal passing a vote of the AC.
>
> While I'd love to hear any "real" solutions, this does seem like  
> the kind of thing appropriate for a lint-like tool.

Perhaps we can distinguish between document conformance and processor  
conformance? It's not my favorite thing to do, but we could say that  
documents can have arbitrary precision/size and that the mapping is  
always exact, but then require processors to support a more minimal  
level. Then I guess we just throw warnings (or use lint). It seems  
like this will either be mostly in the unlikely (people won't  
generally jot these down  in their camera ontology) or in specialized  
cases where the user should go for a tool that went beyond the  
minimum. That seems reasonable. (If enough ontologies exceed the  
minimum, that provides clear pressure for convergence as long as the  
semantics is unambiguous).

> It's a very rare and complex situation that would be hard to  
> enshrine in the specification, but a tool could easily warn users  
> of potentially problematic use, and the tool could even easily  
> repair this use by rewriting values to unambiguous values.

Right. A "coerce to decimal" or whatever refactoring command.

Cheers,
Bijan.
Received on Thursday, 10 July 2008 15:40:43 UTC