Re: A possible structure of the datatype system for OWL 2 (related to ISSUE-126)

On Jul 9, 2008, at 4:52 PM, Rob Shearer wrote:

> I agree with almost all the suggestions here---Boris and I  
> discussed all these points. The main argument for using the term  
> "datatype" and for using xsd names for the integer and string value  
> spaces was that there is already too much political momentum behind  
> the existing terminology and identifiers.

What I suggested was allowing the identifiers, but clearly explaining  
what they mean in specific OWL contexts. Doing so, requires, I think,  
resorting to different language than is already used to describe  
those things that are different.

Don't know. Definitely worth discussing further.

-Alan

>
> On 9 Jul 2008, at 15:18, Alan Ruttenberg wrote:
>
>>
>>
>> On Jul 8, 2008, at 5:16 PM, Boris Motik wrote:
>>> Hello,
>>>
>>> 1. Datatype Map
>>> ----------------
>>
>> I wonder if we should still use the term "datatype", as there will  
>> likely be confusion with the xsd sense of datatype.
>>
>>> A datatype map consists of the following things:
>>>
>>> - a set of datatypes
>>>  - each datatype provides a set of allowed facets
>>> - a possibly infinite set of constants (likely to be renamed to  
>>> literals, but I'll stick to "constant" for the moment)
>>>  - each constant consists of a lexicalValue and a typeURI
>>>  - it is written as "lexicalValue"^^typeURI
>>>
>>> Each datatype DT is assigned a value space DT^D, which is just a  
>>> nonempty set.
>>
>> Is the implication that DT -> Value space DT^D, one to one?
>>
>> So we have type, DT, DT^D ?
>>
>>> Each constant c is assigned a value c^D, which is just an object  
>>> from the union of the value spaces of all datatypes.
>>>
>>>
>>> Thus, a datatype can be thought as a class with a predefined  
>>> extension.
>>
>> I'm not sure explaining it this way is helpful - might confuse  
>> rather than illuminate.
>>
>>> Note that this definition does not assume any relationship  
>>> between the set of supported typeURIs (which determine the  
>>> allowed constants) and the set of datatypes (which determine the  
>>> allowed
>>> sets of values).
>>
>> I think we should consider calling "typeURI" "lexicalFormURI" to  
>> suggest the correct thinking, as people tend to equate "type" and  
>> "class". (as with rdf:type)
>>
>> Can we not simplify the above to: There are "Value spaces" and  
>> "lexicalFromURI"s. I'm not seeing how having "Datatypes" as an  
>> additional concept helps.
>>
>>>
>>> 2. Allowed datatypes
>>> ---------------------
>>>
>>> Comformant OWL 2 implementations would be required to support the  
>>> following base datatypes, each of whose value spaces would be
>>> disjoint:
>>
>>> - owl:number - the value space is the set of all real numbers
>>> - xsd:string - the value space is the set of all Unicode strings  
>>> in normal form C
>>> - owl:internationalizedString - the value space set is the set of  
>>> pairs of the form (string,langTag)
>>> - xsd:hexBinary - the value space is the set of all finite  
>>> sequences of octets
>>
>> I'm wondering whether we should simply say: OWL has the following  
>> (following your later mail).
>>
>> owl:Number
>> owl:CharacterString
>> owl:BitString
>> owl:Integer
>>
>> We confuse the issue by using the xsd uris to name a different  
>> sort of thing (an OWL value space, not an XSD:type)
>>
>>> The following datatype would also be supported in OWL 2:
>>>
>>> - xsd:integer - the value space is the subset of the value space  
>>> of owl:number containing all integers
>>
>> See above.
>>
>>> Finally, we might support the following "shortcut" datatypes,  
>>> whose value spaces can be defined from the value spaces of the above
>>> mentioned datatypes using facets
>>>
>>> - various xsd:integer derivatives, such as xsd:int and xsd:long
>>> - various xsd:string derivatives, such as xsd:Name
>>
>> In order to keep the design clean, I'd suggest that we define  
>> these in the owl namespace. We can connect the xsd types to the  
>> owl version.
>>
>> However: The use of e.g. xsd:string in restrictions is the common  
>> idiom. I think we should document that some xsd datatypes, when  
>> used in a restriction, are understood to mean certain owl value  
>> spaces.
>>
>>> 3. Allowed constants
>>> ---------------------
>>>
>>> Conformant OWL 2 implementations are required to support the  
>>> following constant types:
>>>
>>> - "nnn"^^xsd:int and all derivatives that fall within xsd:int -  
>>> all such constants are to be interpreted as elements of owl:number
>>> - "aaEbb"^^xsd:float - all such constants save for NaN and +-inf  
>>> are to be interpreted as elements of owl:number
>>
>> Consider extending owl:number with these constants. We need some  
>> interpretation of them if they are to remain intact when part of  
>> an OWL file. These are effectively, "promotion" rules.
>>
>>> - "abc"^^xsd:string - interpreted as "abc"
>>
>> as you later suggest, ("abc", null) or ("abc", "") . The latter  
>> avoids the issue of what to do about the pattern facted on lang.
>>
>>> - "abc"@langTag - interpreted as a pair ("abc",langTag)
>>>
>>>
>>> 4. Discussion
>>> --------------
>>>
>>> The set of constants is chosen such that implementations don't  
>>> need to support numbers with arbitrary precision, which might be  
>>> quite cumbersome. In fact, implementations are only required to  
>>> support 32 bit integers and single precision floating point numbers.
>>
>> On today's hardware, I would set this to be 64 bit integers or  
>> even 128 bit integers, and double precision float. Some machine's  
>> don't really have single float hardware, instead rounding from  
>> double float.
>>
>>> There are efficient ways to represent these on virtually all  
>>> systems.
>>>
>>> The set of datatypes, however, allows one to refer to the sets of  
>>> all integers and real numbers. This allows one to specify the  
>>> ontology in a way that makes reasoning easy.
>>>
>>> Implementations are free to support other constants as well. Note  
>>> that these extensions do not necessarily mean that we need new  
>>> datatypes (i.e., new value spaces). For example, an  
>>> implementation might choose to support arbitrary precision  
>>> numbers via constants  of the form "123.03"^^xsd:decimal. Note  
>>> that the proposed list of datatypes already contains the  
>>> appropriate value space for such constants (i.e., owl:number).
>>
>> I think xsd:decimal should be considered a lexical form of  
>> owl:Number.
>>
>>> The open issues are what to do with NaN and +-inf and with date- 
>>> time datatypes.
>>
>> In the first case, I suggest above that owl:Number be real+"NaN"+"- 
>> INF"+"+INF"
>> I'd also suggest that "-0" and "+0" be considered lexical forms of  
>> the number 0.
>>
>> For the date-time datatypes, I wonder whether it would work to  
>> define:
>>
>> owl:Time (isomorphic to the reals)
>> owl:TimeZoneTime (also isomorphic to the reals)
>>
>> There is one value space for all the lexical date-times have time  
>> zone specified, and another value space for all the lexical date- 
>> times. There would be no comparison possible between owl:Time and  
>> owl:TimeZoneTime.
>>
>> There would still be work necessary to determine whether the  
>> repeating interval types, like monday, are feasible to implement.
>>
>> -Alan
>>
>>>
>>> Regards,
>>>
>>> 	Boris
>>>
>>>
>>>
>>
>>
>

Received on Wednesday, 9 July 2008 16:11:07 UTC