RE: ISSUE-126 (Revisit Datatypes): The list of normative datatypes should be revisited from Boris Motik on 2008-06-19 (public-owl-wg@w3.org from June 2008)

From: Boris Motik <boris.motik@comlab.ox.ac.uk>
Date: Thu, 19 Jun 2008 08:49:54 +0100
To: "'Alan Ruttenberg'" <alanruttenberg@gmail.com>
Cc: "'OWL Working Group WG'" <public-owl-wg@w3.org>
Message-ID: <000a01c8d1e1$0fd14980$6c12a8c0@wolf>
Hello,

> -----Original Message-----
> From: public-owl-wg-request@w3.org [mailto:public-owl-wg-request@w3.org] On Behalf Of Alan Ruttenberg
> Sent: 19 June 2008 06:50
> To: Boris Motik
> Cc: 'OWL Working Group WG'
> Subject: Re: ISSUE-126 (Revisit Datatypes): The list of normative datatypes should be revisited
> 
> 
> Hi Boris,
> 
> While it is useful to have this proposal for the record, I think
> there needs to be some education within the working group in order to
> better understand the issues leading to this proposal, as well as the
> consequences of the solutions. For the excluded datatypes, we need to
> understand the impact on users and potential users.
> 
> I also think we need to clarify what being a supported datatype
> means.

In OWL 1, people often complained that different systems supported a radically different subset of XML Schema datatypes. Therefore,
we decided to included a more extensive list of datatypes into OWL 2 and make them normative -- that is, each OWL 2 reasoner that
wants to have a compliance label should support all of them (with all the allowed facets).

> In particular it should be clarified whether this means that
> tools will simply reject ontologies that mention these types, even as
> annotation values,

I believe we should not make a distinction between what is supported in annotations and what is supported in class descriptions.
This is likely to cause confusion and is going to lead to a very complex specification. I believe we should carefully select the set
of datatypes that we think we can support in class descriptions, and we should make them normative. Clearly, nothing prevents tools
from supporting other datatypes and allowing them in class descriptions and/or annotations, but I'd leave this out of the spec.

> and whether there are possibly different levels of
> support - use/non-use of all facets/ some facts - ability to use the
> datatypes within nary predicates come to mind as examples where we
> might define a minimal level of support without eradicating the types
> completely. I am still working on understanding the technical issue -
> do I have it right that the issues really center around the nary-
> predicates? If there were no nary-datatypes and no facets, would any
> of these datatypes be a problem?
> 

This issue has nothing to do with n-ary predicates; it addresses the current (unary) case.

OK, you might see the rounding problem as being related to possible future n-ary extensions: the rounding problem doesn't surface in
the current datatype system because there is no way to perform operations (such as addition) that might require rounding. This,
however, is not the core part of the proposal; all other comments relate to the datatype system as it currently is.

> I also have a rather basic question about the datatype formulation in
> the OWL semantics. Specifically, although I believe the xsd datatypes
> are considered disjoint, their value spaces intersect, i.e. the value
> space of positive integers intersects the value space of xsd:float.
> Is that correct?

Yes; hence, the datatypes that we have in the spec are not disjoint.

Please don't get confused by the fact that we have formalized datatypes as being disjoint in our paper. This is just for convenience
of presentation: we have assumed that we have one numeric datatype and that different subsets (such as integers) are modeled as
facets. We are not proposing to do this in the OWL 2 spec; this has been used in the paper merely as a convenience because it
allowed us to compartmentalize all problems related to numbers to a single datatype.

> Assuming that is the case, do cardinality
> restrictions apply at the value space level, i.e. is the cardinality
> of {"2^^xsd:int 2.0^^xsd:float} 1 or 2 (assuming, as I believe is the
> case, that 2 can be exactly represented as a floating point value).

The cardinality should be one: "2.0"^^float in floating point arithmetic denotes the integer 2, just like the constant "2"^^xsd:int.

Your question is, however, quite on the spot: these types of things need to be clarified, an this is precisely what I wanted to
achieve with this issue.

> 
> Of your points below, 2 and 3 don't seem problematic from my point of
> view (internationalized string is missing from the list in 3- an
> oversight I presume). The others require more thought on my part.
> 

I haven't seen any ontology using, say, xsd:gYearMonth; therefore, I really believe that my first point should not be contentious.
Furthermore, there is nothing that would prevent people from implementing the remaining datatypes.

Regarding 4, I really don't believe that people would see a difference in the consequence in practice, but this would make the spec
much cleaner. Here is an example of what might go wrong. Imagine you have the following ontology:

(1) PropertyRange( R DatatypeRestriction( xsd:float minExclusive f1 maxExclusive f2 ) )
(2) PropertyRange( S DatatypeRestriction( xsd:float minExclusive f1 maxExclusive f2 ) )
(3) DisjointProperties( R S )
(4) ClassAssertion( SomeValeusFrom( R rdfs:Literal ) a )
(5) ClassAssertion( SomeValeusFrom( S rdfs:Literal ) a )

Now this ontology is satisfiable iff the data range DatatypeRestriction( xsd:float minExclusive f1 maxExclusive f2 ) contains two or
more floating point values. To determine this, you thus need to be able to determine whether there are at least two different values
between f1 and f2.

My first observation is that implementing this correctly is not trivial. Note that you can't simply subtract the two numbers; you
need to take into account that each floating point number is actually represented as m*2^e and then you need to do some really nasty
operations.

My second observation is that, in practice, nobody will care: f1 and f2 will be typically sufficiently apart so that there will be
plenty of numbers between them for you to choose from.

But then, we have a problem for the implementors: in practice, the precise inference will probably never be relevant, but they still
have to provide it because of a possible corner-case. Note that, because xsd:float is discrete and finite, it is principally
possible to choose f1 and f2 such that there is exactly one floating point number between them, which would then make the ontology
unsatisfiable. Thus, to have a 100% correct implementation, people will have to provide this nasty code.

Hence my suggestion: let us modify the type system such that a precise implementation can be produced efficiently and such that it
provides the intuitive answers. In this example, this means that xsd:float would be treated as the set of all real numbers. This
makes it much easier to answer the above question: if f2 > f1, there are infinitely many real numbers between f1 and f2! Thus, in
all practical cases, we are getting the result that the users wanted, and we are not requiring the implementors to jump through
hoops.

Regards,

	Boris


> Thanks,
> Alan
> 
> On Jun 18, 2008, at 3:23 PM, Boris Motik wrote:
> 
> >
> > Hello,
> >
> > So here is a proposal for resolving this issue.
> >
> > 1. We exclude xsd:time, xsd:date, xsd:gYearMonth, xsd:gYear,
> > xsd:gMonthDay, xsd:gDay, xsd:gMonth, and xsd:base64Binary from the
> > list
> > of supported datatypes. Note that this doesn't preclude people from
> > implementing them (if they can figure out how to do this).
> >
> > 2. We define xsd:anyURI to be a subset of xsd:string.
> >
> > 3. We allow the "pattern" facet only on the following datatypes:
> > xsd:string, xsd:anyURI, xsd:normalizedString, xsd:token,
> > xsd:language, xsd:NMTOKEN, xsd:Name, and xsd:NCName.
> >
> > 4. We introduce a new owl:real datatype. This datatype would allow
> > for the following types of constants:
> >
> > - rational numbers written according to http://www.w3.org/2007/OWL/
> > wiki/OWL_Rational
> > - floating point numbers written in the format as specified in the
> > definition of xsd:float and xsd:double in the XML Schema
> > - decimal numbers as written in the format as specified in the
> > definition of xsd:decimal
> > - integer numbers as written in the format as specified in the
> > definition of xsd:integer and related datatypes
> >
> > Furthermore, we would make xsd:float and xsd:double (and possibly
> > xsd:decimal as well) synonyms for xsd:real. This would be the only
> > definition from the XML Schema datatype system: there, some very
> > large numbers are not members of xsd:float. I believe, though, that
> > this would bother people in practice.
> >
> > Finally, we can include xsd:nonPositiveInteger,
> > xsd:negativeInteger, xsd:long, xsd:int, xsd:short, xsd:byte,
> > xsd:nonNegativeInteger,
> > xsd:unsignedLong, xsd:unsignedInt, xsd:unsignedShort,
> > xsd:unsignedByte, and xsd:positiveInteger with the existing
> > semantics as
> > usual.
> >
> > Regards,
> >
> > 	Boris
> >
> >
>
Received on Thursday, 19 June 2008 07:51:27 UTC