RE: ISSUE-126 (Revisit Datatypes): A proposal for resolution from Boris Motik on 2008-07-01 (public-owl-wg@w3.org from July 2008)

From: Boris Motik <boris.motik@comlab.ox.ac.uk>
Date: Tue, 1 Jul 2008 15:17:39 +0100
To: "'Michael Smith'" <msmith@clarkparsia.com>
Cc: "'OWL Working Group WG'" <public-owl-wg@w3.org>
Message-ID: <002101c8db85$37c6ae00$7212a8c0@wolf>
Hello,

I was not aware of (http://www.w3.org/TR/swbp-xsch-datatypes/); however, I believe that the recommendation in Section 3.2 is simply
wrong. The XML Schema specification (http://www.w3.org/TR/xmlschema-2/) defines the value spaces of datatypes in the following way:

- For xsd:integer: The value space of integer is the infinite set { ...,-2,-1,0,1,2,... }.

- For xsd:decimal: The value space of decimal is the set of numbers that can be obtained by multiplying an integer by a non-positive
power of ten, i.e., expressible as i x 10^-n where i and n are integers and n >= 0.

- For xsd:float: The basic value space of double consists of the values m x 2^e, where m is an integer whose absolute value is less
than 2^53, and e is an integer between -1075 and 970, inclusive.

Given these definitions, the value spaces of all these datatypes are just numbers, not pairs of the form (number,type). Therefore,
if we base the datatype system of OWL 2 on XML Schema, we have no other choice but to say that the value spaces are overlapping.

Personally, I find this interpretation to be the only intuitive answer. After all, "40"^^xsd:integer and "40"^^xsd:float denote the
one and the same number; hence, there should be no semantic difference between the two. If we say that these are not the same
numbers, we are introducing a totally artificial distinction between the integer 40 and the floating number 40, which is not
grounded in mathematics in any way. Thus, I cannot a reason why (1) should not entail (2).

(1) eg:JeremyCarroll eg:ageInYears "40"^^xsd:integer .
(2) eg:JeremyCarroll eg:ageInYears "40"^^xsd:float .

In fact, I could see such behavior as being really counterintuitive. Consider, for example, the following ontology:

(3) PropertyRange( a:weight xsd:decimal )
(4) PropertyAssertion( a:weight a:Bob "70"^^xsd:integer )

If we take the view that xsd:decimal and xsd:integer are disjoint, then the above ontology is inconsistent: (3) restricts the range
of a:weight to decimal numbers, but (4) uses an integer. I find this totally confusing: integers ARE decimal numbers!

Things get even worse in the case of xsd:integer and xsd:int. Imagine the following ontology:

(5) PropertyRange( a:weight xsd:integer )
(6) PropertyAssertion( a:weight a:Bob "70"^^xsd:int )

Again, this ontology is inconsistent because of the strange typing condition on the property, and this can be really confusing for
users. Although not related to numbers, there are similar examples in the string domain. For example, xsd:NMToken is just a
particular type of strings. The figure in Section 3 of (http://www.w3.org/TR/xmlschema-2/) strongly reflects this intuition:
xsd:NMToken has been made subordinate to xsd:string.

One final point is related to facets. The xsd:int is really nothing else than xsd:integer restricted to a particular subset of
integers. Thus, xsd:int is in fact equivalent to

(7) DatatypeRestriction( xsd:integer
         minInclusive "-2147483648"^^xsd:int
         maxInclusive "2147483647"^^xsd:int )

In fact, this is precisely how XML Schema defines xsd:int. But then, treating the constant "70"^^xsd:int as not being an instance of
xsd:integer is really confusing.


On the practical side, I can't see how overlapping value spaces would be more difficult to implement than if they were not
overlapping. In my ISWC paper I've presented an algorithm that deals with this issue, and I really can't see a possible source of
implementation difficulty. 


To summarize, making value spaces of built-in datatypes non-overlapping seems unjustifiable from an intuitive, mathematical, and XML
Schema point of view. In fact, (http://www.w3.org/TR/swbp-xsch-datatypes/) presents no evidence at all to why treating basic
datatypes as being disjoint should be more appropriate, and the choice seems rather arbitrary. In light of OWL 2, we should
therefore do the following:

- We should accept that this was a source of confusion in OWL 1. To correct this problem in OWL 2, we should do the right thing and
include an explicit description of the values spaces of all datatypes. In doing so, we should follow XML Schema as much as possible.

- We should say that (http://www.w3.org/TR/swbp-xsch-datatypes/) is an incorrect interpretation of the OWL 1 specification. After
all, this document was nonnormative, so this may be acceptable.

Regards,

	Boris

> -----Original Message-----
> From: public-owl-wg-request@w3.org [mailto:public-owl-wg-request@w3.org] On Behalf Of Michael Smith
> Sent: 01 July 2008 14:05
> To: Boris Motik
> Cc: 'OWL Working Group WG'
> Subject: Re: ISSUE-126 (Revisit Datatypes): A proposal for resolution
> 
> 
> On Tue, 2008-07-01 at 13:15 +0100, Boris Motik wrote:
> ?
> > 1. We introduce the owl:real datatype with the value space of all real numbers. We don't add any
> constants of the form owl:real.
> 
> > 3. We leave xsd:integer and all the derived types as they are. Thus, "1"^xsd:integer is still valid
> as usual.
> 
> > 5. We make xsd:decimal the subset of owl:real and we leave all constants as they are. Thus,
> "1.0"^^xsd:decimal is still a valid
> > constant in OWL.
> >
> > 6. We disallow the "pattern" facet on all numeric datatypes.
> >
> 
> The items above sound good.
> ?
> 
> > 2. We introduce the owl:rational datatype with the value space of all
> > rational numbers. We add a constant for each rational number
> > along the lines of http://www.w3.org/2007/OWL/wiki/OWL_Rational. Thus,
> > we would be able to write things such as "1/3"^^owl:rational
> > or even "1/1"^^owl:rational.
> 
> I haven't heard anyone request a datatype that is just rationals.  The
> proposal was to have rational constants, but over the real value space.
> If we constrain ourselves to rationals in the value space, we'd likely
> run into some of the numerics problems we're trying to avoid.
> 
> 
> > 4. We make xsd:double the subset of real numbers between the minimal double and the maximal double
> number. We do the same for
> > xsd:float. All constants are the same as in XML Schema; thus, "1"^^float and "1.0"^^float are all
> the same constants. We disallow
> > the NaN (not-a-number) constant (allowing NaN would make owl:double not a subset of owl:real).
> >
> > We are thus staying "almost" true to XML Schema, in thatwe have exactly the same set of constants
> as in XML Schema. The main change
> > is that, in order to facilitate simpler implementations, we make the extension of xsd:double and
> xsd:float continuous rather than
> > discrete.
> 
> On Jun 19 (in [1]) I mentioned that this is a change from best practice
> advice [2] and impacts existing implementations.  I asked for
> clarification on what the benefit of the change would be so we could
> evaluate this as a trade-off.
> 
> > As I explained in my last e-mail, two constants can be different even though they denote the same
> value.
> 
> Constants that are different but which map to the same value -- this
> isn't a difference that is significant to a reasoner.  Do I
> misunderstand you?
> 
> > This elegantly solves the
> > problems that Alan mentioned in his last e-mail. For example, if the ontology initially contains
> "1.0"^^xsd:float, this would be
> > read into a constant whose lexical representation is "1.0" and whose URI is xsd:float. Thus, if you
> write the ontology back from the
> > structural spec, the constant would be written out as "1.0"^^xsd:float, and thus the form of the
> ontology would be preserved.
> 
> This would only be true if one's tool supported some sort of literal
> round-trip guarantee.  The internal representation of literal constants
> is an efficiency trade-off over which I expect tool vendors would be
> making their own decisions.
> 
> > The only thing that changes really is that we'd say that the extensions of xsd:double and xsd:float
> are continuous and not discrete,
> > and we'd tweak them (e.g., by removing NaN) to make them subsets of owl:real.
> 
> It seems that "supporting" xsd:float and xsd:double, but not allowing
> some of their permitted values is likely to confuse users.
> 
> --
> Mike Smith
> 
> Clark & Parsia
> 
> [1] http://lists.w3.org/Archives/Public/public-owl-wg/2008Jun/0149.html
> [2] http://www.w3.org/TR/swbp-xsch-datatypes/#sec-values
>
Received on Tuesday, 1 July 2008 14:19:13 UTC