Re: A possible structure of the datatype system for OWL 2 (related to ISSUE-126)

On Jul 9, 2008, at 5:58 PM, Boris Motik wrote:

> Hello,
>> -----Original Message-----
>> From: Alan Ruttenberg []
>> Sent: 09 July 2008 15:18
>> To: Boris Motik
>> Cc: 'OWL Working Group WG'
>> Subject: Re: A possible structure of the datatype system for OWL 2  
>> (related to ISSUE-126)
>> On Jul 8, 2008, at 5:16 PM, Boris Motik wrote:
>>> Hello,
>>> 1. Datatype Map
>>> ----------------
>> I wonder if we should still use the term "datatype", as there will
>> likely be confusion with the xsd sense of datatype.
>>> A datatype map consists of the following things:
>>> - a set of datatypes
>>>   - each datatype provides a set of allowed facets
>>> - a possibly infinite set of constants (likely to be renamed to
>>> literals, but I'll stick to "constant" for the moment)
>>>   - each constant consists of a lexicalValue and a typeURI
>>>   - it is written as "lexicalValue"^^typeURI
>>> Each datatype DT is assigned a value space DT^D, which is just a
>>> nonempty set.
>> Is the implication that DT -> Value space DT^D, one to one?
> Yes. This is exactly the same as in the case of classes.

I don't understand the distinction between datatype and value space,  
then. In XSD, there are two sort s of things because there is a need  
to describe lexical and value.

>> So we have type, DT, DT^D ?

Three types of things "type" as in "typeURI", DT, DT^D. So rather  
than the situation as it is in XSD with datatypes where there is  
lexical space and value space (2 sorts of things) here we have 3  
sorts of things.

Is this correct?
> I didn't really understand that.
>>> Each constant c is assigned a value c^D, which is just an object
>>> from the union of the value spaces of all datatypes.
>>> Thus, a datatype can be thought as a class with a predefined
>>> extension.
>> I'm not sure explaining it this way is helpful - might confuse rather
>> than illuminate.
> I actually think this is the proper way of thinking about  
> datatypes. Take, for example, owl:integer: you can think about it  
> as one
> big, infinite nominal that contains all integers. Hence,  
> owl:integer is a class in a sense that its interpretation contains  
> things.
> The main difference between datatypes and classes is that, in the  
> case of datatypes, the interpretation is uniquely defined by the  
> datatype map.

To clarify, we've made efforts to educate users about the differences  
between instances and data. So in our *user facing documentation*,  
saying datatypes are like classes might not be the best choice.

>>> Note that this definition does not assume any relationship between
>>> the set of supported typeURIs (which determine the allowed
>>> constants) and the set of datatypes (which determine the allowed
>>> sets of values).
>> I think we should consider calling "typeURI" "lexicalFormURI" to
>> suggest the correct thinking, as people tend to equate "type" and
>> "class". (as with rdf:type)
> I agree.
>> Can we not simplify the above to: There are "Value spaces" and
>> "lexicalFromURI"s. I'm not seeing how having "Datatypes" as an
>> additional concept helps.
> There is a distinction, albeit a subtle one. "Datatype" is a  
> syntactic category; hence, you can put datatypes into property range
> axioms. "Value space" is a semantic category. Hence, you can't work  
> with value spaces at the level of a syntax; that is, you don't
> put *the set of all integers* into an ontology when you say  
> "xsd:integer is a range of P"; rather, you put xsd:integer (a  
> datatype),
> which acts as a moniker for its value space.

lexicalFormURI are the syntax element, I would have thought. No need  
for *another* syntax class.
If so, then name it something that doesn't use the word "type" so as  
not have have an unfortunate implication.

>>> 2. Allowed datatypes
>>> ---------------------
>>> Comformant OWL 2 implementations would be required to support the
>>> following base datatypes, each of whose value spaces would be
>>> disjoint:
>>> - owl:number - the value space is the set of all real numbers
>>> - xsd:string - the value space is the set of all Unicode strings in
>>> normal form C
>>> - owl:internationalizedString - the value space set is the set of
>>> pairs of the form (string,langTag)
>>> - xsd:hexBinary - the value space is the set of all finite
>>> sequences of octets
>> I'm wondering whether we should simply say: OWL has the following
>> (following your later mail).
>> owl:Number
>> owl:CharacterString
>> owl:BitString
>> owl:Integer
>> We confuse the issue by using the xsd uris to name a different sort
>> of thing (an OWL value space, not an XSD:type)
> I personally don't mind renaming all datatypes to owl:*. I can see,  
> however, people might object, partially because of a backwards
> compatibility issue. After all, in OWL 1, you had xsd:string and  
> xsd:integer.

My proposal allows those to be used as well, in the appropriate  
context. But they are interpreted differently in OWL 2 and our  
documentation would explicitly note the difference in meaning when  
xsd:int is used in OWL, versus when it is used in XML Schema.

In my opinion it would be worse to suggest that it is the same thing,  
when it isn't.

>>> The following datatype would also be supported in OWL 2:
>>> - xsd:integer - the value space is the subset of the value space of
>>> owl:number containing all integers
>> See above.
>>> Finally, we might support the following "shortcut" datatypes, whose
>>> value spaces can be defined from the value spaces of the above
>>> mentioned datatypes using facets
>>> - various xsd:integer derivatives, such as xsd:int and xsd:long
>>> - various xsd:string derivatives, such as xsd:Name
>> In order to keep the design clean, I'd suggest that we define these
>> in the owl namespace. We can connect the xsd types to the owl  
>> version.
>> However: The use of e.g. xsd:string in restrictions is the common
>> idiom. I think we should document that some xsd datatypes, when used
>> in a restriction, are understood to mean certain owl value spaces.
>>> 3. Allowed constants
>>> ---------------------
>>> Conformant OWL 2 implementations are required to support the
>>> following constant types:
>>> - "nnn"^^xsd:int and all derivatives that fall within xsd:int - all
>>> such constants are to be interpreted as elements of owl:number
>>> - "aaEbb"^^xsd:float - all such constants save for NaN and +-inf
>>> are to be interpreted as elements of owl:number
>> Consider extending owl:number with these constants. We need some
>> interpretation of them if they are to remain intact when part of an
>> OWL file. These are effectively, "promotion" rules.
> We can have owl:numberPlus (or owl:numberOnSteroids if you  
> prefer :-) that contains these guys as well.

>>> - "abc"^^xsd:string - interpreted as "abc"
>> as you later suggest, ("abc", null) or ("abc", "") . The latter
>> avoids the issue of what to do about the pattern facted on lang.
>>> - "abc"@langTag - interpreted as a pair ("abc",langTag)
>>> 4. Discussion
>>> --------------
>>> The set of constants is chosen such that implementations don't need
>>> to support numbers with arbitrary precision, which might be quite
>>> cumbersome. In fact, implementations are only required to support
>>> 32 bit integers and single precision floating point numbers.
>> On today's hardware, I would set this to be 64 bit integers or even
>> 128 bit integers, and double precision float. Some machine's don't
>> really have single float hardware, instead rounding from double  
>> float.
> I don't mind going up to 64 bit. 128 might be a bit too much (at  
> least in Java -- a language in which many reasoners are implemented
> you don't have this).
Need only invoke code based on it in case of an overflow exception.

>>> There are efficient ways to represent these on virtually all  
>>> systems.
>>> The set of datatypes, however, allows one to refer to the sets of
>>> all integers and real numbers. This allows one to specify the
>>> ontology in a way that makes reasoning easy.
>>> Implementations are free to support other constants as well. Note
>>> that these extensions do not necessarily mean that we need new
>>> datatypes (i.e., new value spaces). For example, an implementation
>>> might choose to support arbitrary precision numbers via constants
>>> of the form "123.03"^^xsd:decimal. Note that the proposed list of
>>> datatypes already contains the appropriate value space for such
>>> constants (i.e., owl:number).
>> I think xsd:decimal should be considered a lexical form of  
>> owl:Number.
>>> The open issues are what to do with NaN and +-inf and with date-
>>> time datatypes.
> I think that, if we agree to the basic structure, we can easily  
> accommodate the remaining "extra" constants and datatypes.
> Regards,
> 	Boris
>> In the first case, I suggest above that owl:Number be real+"NaN"+"-
>> INF"+"+INF"
>> I'd also suggest that "-0" and "+0" be considered lexical forms of
>> the number 0.
>> For the date-time datatypes, I wonder whether it would work to  
>> define:
>> owl:Time (isomorphic to the reals)
>> owl:TimeZoneTime (also isomorphic to the reals)
>> There is one value space for all the lexical date-times have time
>> zone specified, and another value space for all the lexical date-
>> times. There would be no comparison possible between owl:Time and
>> owl:TimeZoneTime.
>> There would still be work necessary to determine whether the
>> repeating interval types, like monday, are feasible to implement.
>> -Alan
>>> Regards,
>>> 	Boris

Received on Wednesday, 9 July 2008 17:38:04 UTC