Where I am about floats, etc.

I'm now inclined toward having two primitive built in datatypes  
string and some flavor of real. I'm pretty ok in allowing all of the  
types derived or derivable (using facets we permit) from them in the  
XSD scheme. (So, I'd definitely put in xsd:decimal, for example). Any  
additional ones need to be considered carefully and perhaps much  
later in the game. (Even something like anyURI would need a fair bit  
of attention. Alas :()

I think I'm specifically against including xsd:float and xsd:double  
as types at all at this stage, and even as our specing them out as  
optional. (We shouldn't forbid them; just be silent.)

It's pretty evident that there's a fairly wide set of views on float  
and double, including a set of varyingly negative ones (or rather,  
ones that would like to effectively eliminate them). Given that we've  
already had some thinko-s about them and their various  
characteristics, I don't think it's a good idea to try to meddle with  
them. (Note that we've had varying, absolutely confident reports on  
which bits were necessary (think of the discussion around NaN). I'm  
not confident that we can get good data on what's "really wanted"  
here esp. since some of the ramifications aren't obvious.) Tweaking  
their semantics somewhat randomly just seems like a very bad idea to  
me, and a waste of effort. I'd rather nail down the particulars of  
fewer types and get excellent, wide support for them. Working with  
implementors outside of the group on float and double support seems a  
better bet at this stage.

I recognize that this does not address some important use cases and  
makes OWL tougher for dealing with scientific data (in particular).  
But given that xsd1:float has some issues (e.g., 1 zero and only 1  
NaN), it might be better all aroudn to punt on it.

(A work around would be to have a named string user defined type. It  
wouldn't do syntax checking, but my understanding of the use case  
(and it's a recollection because email search sucks hard) is that the  
issue is transmission of data, not reasoning about it. Another work  
around would be to lobby implementors. Yet another workaround would  
be to use the corresponding integers. None of these are thrilling.  
However, if the use pattern is dump to a format and load into  
another, they may be tolerable.

For this use case, I'd be a bit leary of XSD1:float, given its  
quirks. I have no idea if scientific computing uses the various  
distinct NaNs in some fashion, but it wouldn't surprise me. It seems  
to be identified as important here: <http://math.nist.gov/ 
javanumerics/reports/jgfnwg-01.html>. Signaling vs. non-signaling may  
be significant...I don't know.[1])

Cheers,
Bijan.

P.S. This is a nice paper: http://hal.archives-ouvertes.fr/docs/ 
00/28/14/29/PDF/floating-point-article.pdf

[1]  I couldn't find a specific example, but <http://docs.sun.com/app/ 
docs/doc/800-7895/6hos0aou4?a=view> sez:

"""In IEEE 754, NaNs are often represented as floating-point numbers  
with the exponent emax + 1 and nonzero significands. Implementations  
are free to put system-dependent information into the significand.  
Thus there is not a unique NaN, but rather a whole family of NaNs.  
When a NaN and an ordinary floating-point number are combined, the  
result should be the same as the NaN operand. Thus if the result of a  
long computation is a NaN, the system-dependent information in the  
significand will be the information that was generated when the first  
NaN in the computation was generated. Actually, there is a caveat to  
the last statement. If both operands are NaNs, then the result will  
be one of those NaNs, but it might not be the NaN that was generated  
first."""

This suggests that picking a special NaN concept would limit the  
utility of transmitting scientific data via OWL.

Received on Sunday, 6 July 2008 22:01:13 UTC