Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

On Jul 5, 2008, at 1:58 AM, Dave Peterson wrote:

>> Towards the end of understanding the terminology, I've been trying  
>> to understand what the value space of XML Schema means, given that  
>> it doesn't mean what one would expect in a mathematical sense.
>
> I'll have to take exception to that.  I'm sure it doesn't mean what  
> you
> would expect in a mathematical sense.  But it does very definitely  
> mean
> what I would expect in a mathematical sense.  (Credentials:  Phd, U.C.
> Berkeley, 1965, primarily in Analysis and Foundations of Mathematics;
> Assistant Professor of Mathematics and Computer Science, and Associate
> Professor of Mathematics at various times during my career.)  So  
> please
> don't generalize to an arbitrary "one" and imply that that's the only
> possible reasonable expectation.

I'm sorry for the overgeneralization and didn't mean to insult. It's  
just that as much as I think about it, I can't understand the idea  
that the value space of floats and the value space of decimal are  
disjoint. Fundamentally these represent some of the same real numbers  
and this isn't reflected in the spec. In addition, many numbers that  
can be finitely expressed and be calculated with find no place in  
*any* of the value spaces, e.g. 1/3. It is this sense of  
"mathematical" that I was referring to.

I have looked at the functions and operators specification. I  
understand how you come to your previous points about different  
choice of equality, as the specification promotes decimal to float.  
As a matter of clarity, I probably would have called the comparison  
not "equality" but "equality as floats" and "equality as doubles".

Considering the definition of equality, I would ask: Is that  
something someone would do if they weren't constrained to use  
floating point numbers? It is a perfectly reasonably thing to do if  
you don't have have any more expressive numeric types, as it is a  
perfectly reasonable thing to do to throw an exception when a  
multiplication of integers exceeds the limit of the integer datatype.  
However we now have libraries that support arbitrary precision  
integer and rational numbers. Floats can be promoted to the latter  
without loss of precision, as can decimal. Again, no addressing of  
this in the spec, nor any theoretical justification of how it is even  
possible to do an exact (sometimes) promotion of a decimal value to a  
float value if their value spaces are disjoint. Maybe there's a way  
to make sense of this. I'm trying.

To offer a concrete suggestion (I'll get to putting something into  
the bug tracker...), and speaking to the possibility of harmonizing  
the OWL specification and the XSD specification, something to  
consider would be to add xsd:real and xsd:rational. This could at  
least prevent the (strong) possibility of OWL defining those types  
itself. Personally, I think it would be cleaner to have all the  
numeric types handled in the XML Schema documents.  I realize that  
this might be a bit of work, but at least that work would have  
interested parties from both the OWL and XSD WGs.

I'd also consider reviewing the part of the spec that says:
> Should a derivation be made using a derivation mechanism that  
> removes ·lexical representations· from the·lexical space· to the  
> extent that one or more values cease to have any ·lexical  
> representation·, then those values are dropped from the ·value space·.
>
>

I've still no understanding of why that is a desirable thing to do,  
and we've discussed aspects that some might consider undesirable.

>> Similarly, there seems to be missing an underlying type for the  
>> date types - although there is reference to timeOnTimeline, this  
>> value type is not surfaced in the type hierarchy.
>
> I'd very much like to hear how you'd do this; unlike the number  
> datatypes,
> where I could envisage how to pull them together, I can't envisage a
> reasonable way for all the d/t datatypes to be derived from a  
> universal
> one.  And I did try.

I had in mind subtyping the dates into those with and those without a  
timezone, and having each descend from a separate timeOnTimeline.

>> One thought is that whether a correct interpretation is more along  
>> the lines of considering the value spaces as data structures.
>
> I'm curious what you mean by "data structure" here.  Reading on, it  
> sounds
> like you mean various possible machine representations of the values.
> Let me assure you that that's not what is meant by a value space.  In
> fact, I can think of several extremely different-appearing  
> representations
> of, for example, the integers, that are nonetheless isomorphic.  They
> are all potential machine representations of the values for the same
> datatype.  XSD does not have anything to say about machine  
> representations,
> except to say that if an implementation has two different  
> representations
> of the same value, it is obligated to generally treat them the same.

Again, it is trying to wrestle with the disjointness of float and  
decimal value spaces that is leading me to look for some explanation.  
While XSD does not explicitly speak about machine representation,  
that does not mean that those concepts do not (overly) influence the  
specification. To explain myself a bit further on this kind of  
analysis - I spend a lot of time developing ontologies, and searching  
for unspoken, but operant, knowledge and constraint and then exposing  
it is a common aspect of this work.

What I specifically mean by data structure in this case was the  
little data structure that is a floating point number, composed of  
part: integer mantissa, integer exponent, sign bit, +some symbols  
encodings. I compared that to integer which doesn't have these parts.  
However decimal seems to necessarily be composed of different kinds  
of parts.

>> Another thought is that the value spaces are another aspect of  
>> lexical expression. This would account well for there being a  
>> difference between base64Binary and hexBinary, but not explain why  
>> these are not pattern facet restrictions on string.
>
> base64Binary and hexBinary are different because they use entirely  
> different
> lexical mappings.  Different lexical mappings mean different  
> datatypes.

But not disjoint value spaces.

> Except for our decision to paint the two value spaces different colors
> so we can tell them apart,

Why would one want to tell them apart? Why not consider a single  
lexical mapping that has a disjunction? More than one lexical can map  
to the same float, more than one lexical representation of a bit  
sequence can map to it.

> the value spaces of these two datatypes are
> the same.  (In this case, I suspect that the obvious equality across
> these two value spaces would not bother anyone.  But we weren't going
> to do that for some obvious datatype pairs and not others.

It's the obviousness, and the spec's decision to not respect that  
obviousness that is my concern.

> They are not pattern-facet restrictions on string for the same  
> reason that
> float and double are not pattern-facet restrictions on string.  The  
> value
> spaces are different.  String values are character strings; the  
> xxxBinary
> values are bit-strings.  Bits aren't characters.

Fair enough. My mistake.

>> Finally, I wonder if you have comments on a couple of other  
>> aspects of datatypes that appear in XML schema. Specifically, data  
>> types that are derived by list and time and date types. Clearly  
>> such concepts or similar are relevant to OWL given work on, e.g.  
>> workflow, or  in spatial reasoning. Where do they fit into your  
>> view of OWL class space?
>
> You both should definitely look up the latest Public Working Draft (a
> Last Call draft) for XSD.  I think it might clear up some of the  
> questions,
> hopefully providing a better understanding or description of list
> datatypes and date/time datatypes.

Have been. Will be doing more.

-Alan

Received on Saturday, 5 July 2008 11:25:30 UTC