Re: A possible structure of the datatype system for OWL 2 (related to ISSUE-126) from Rob Shearer on 2008-07-10 (public-owl-wg@w3.org from July 2008)

From: Rob Shearer <rob.shearer@comlab.ox.ac.uk>
Date: Thu, 10 Jul 2008 15:31:57 +0100
To: Michael Smith <msmith@clarkparsia.com>
Cc: public-owl-wg@w3.org
Message-Id: <1A7509EC-9453-4A84-AC79-9565E60C6C61@comlab.ox.ac.uk>

> I was concerned about the large number of possible constants (e.g.,
> "0.01") for which there is not a directly corresponding value in the
> float value space.  [2] indicates to me that for some those, there are
> alternative lexical to value space mappings, which could cause  
> problems.

Let's be a bit more explicit about the exactly problems we're facing,  
though. Suppose the decimal strings a and b encode numeric values, and  
that IEEE-754-compliant rounding model X would interpret both as the  
floating-point value c, while a different IEEE-754-compliant rounding  
model would encode a as c but would encode b as the (different)  
floating-point value d. Then the class:

	(forall R (= "a"^^xsd:float)) and (exists R (= "b"^^xsd:float))

Would be satisfiable in implementation X but not in implementation Y.

Two notes on this:

1. You obviously need ambiguous values for this problem to arise, and  
the IEEE spec only allows ambiguity for pretty wacky numbers. Anything  
representable (not just represented, but representable) in decimal as  
±M × 10^{±N} for M < 10^9 and N < 14 is unambiguous (M < 10^17 and N <  
28 for double-precision), so users need to type a lot of digits before  
they start experiencing the problem.

2. A single ambiguous value does not in itself produce a problem.  
Problems only arise in ontologies which use one ambiguous value, as  
well as other values within the range of IEEE-754-legal  
interpretations of that value. Given that the IEEE-754-legal range is  
quite small (a limited error is allowed only in the least significant  
digit of the destination type, which I read as the last bit of the  
float), this issue is unlikely to arise very often.

I agree that this situation kind of sucks, but I don't see any viable  
alternatives. The non-viable ones I can come up with are:

1. Disallow floating-point numbers entirely. This seems like a non- 
starter---the vast vast majority of scientific data makes use of such  
numbers.
2. Only allow unambiguous floating-point representations. This seems a  
clear violation of the XSchema semantics. What is more, the  
implementation burden seems high; libraries usually don't have this  
functionality. I wouldn't know how to implement this, and I strongly  
suspect many implementors would either not implement it at all or get  
it wrong.
3. Impose explicit rounding rules above and beyond IEEE-754. This  
would break every floating point implementation in existence. I would  
encourage Oxford to object to such a model, and I can't imagine such a  
proposal passing a vote of the AC.

While I'd love to hear any "real" solutions, this does seem like the  
kind of thing appropriate for a lint-like tool. It's a very rare and  
complex situation that would be hard to enshrine in the specification,  
but a tool could easily warn users of potentially problematic use, and  
the tool could even easily repair this use by rewriting values to  
unambiguous values.

Other suggestions?

-rob

Attachments

application/pkcs7-signature attachment: smime.p7s

Received on Thursday, 10 July 2008 14:32:34 UTC