Re: A possible structure of the datatype system for OWL 2 (related to ISSUE-126)

I agree with almost all the suggestions here---Boris and I discussed  
all these points. The main argument for using the term "datatype" and  
for using xsd names for the integer and string value spaces was that  
there is already too much political momentum behind the existing  
terminology and identifiers.

On 9 Jul 2008, at 15:18, Alan Ruttenberg wrote:

>
>
> On Jul 8, 2008, at 5:16 PM, Boris Motik wrote:
>> Hello,
>>
>> 1. Datatype Map
>> ----------------
>
> I wonder if we should still use the term "datatype", as there will  
> likely be confusion with the xsd sense of datatype.
>
>> A datatype map consists of the following things:
>>
>> - a set of datatypes
>>  - each datatype provides a set of allowed facets
>> - a possibly infinite set of constants (likely to be renamed to  
>> literals, but I'll stick to "constant" for the moment)
>>  - each constant consists of a lexicalValue and a typeURI
>>  - it is written as "lexicalValue"^^typeURI
>>
>> Each datatype DT is assigned a value space DT^D, which is just a  
>> nonempty set.
>
> Is the implication that DT -> Value space DT^D, one to one?
>
> So we have type, DT, DT^D ?
>
>> Each constant c is assigned a value c^D, which is just an object  
>> from the union of the value spaces of all datatypes.
>>
>>
>> Thus, a datatype can be thought as a class with a predefined  
>> extension.
>
> I'm not sure explaining it this way is helpful - might confuse  
> rather than illuminate.
>
>> Note that this definition does not assume any relationship between  
>> the set of supported typeURIs (which determine the allowed  
>> constants) and the set of datatypes (which determine the allowed
>> sets of values).
>
> I think we should consider calling "typeURI" "lexicalFormURI" to  
> suggest the correct thinking, as people tend to equate "type" and  
> "class". (as with rdf:type)
>
> Can we not simplify the above to: There are "Value spaces" and  
> "lexicalFromURI"s. I'm not seeing how having "Datatypes" as an  
> additional concept helps.
>
>>
>> 2. Allowed datatypes
>> ---------------------
>>
>> Comformant OWL 2 implementations would be required to support the  
>> following base datatypes, each of whose value spaces would be
>> disjoint:
>
>> - owl:number - the value space is the set of all real numbers
>> - xsd:string - the value space is the set of all Unicode strings in  
>> normal form C
>> - owl:internationalizedString - the value space set is the set of  
>> pairs of the form (string,langTag)
>> - xsd:hexBinary - the value space is the set of all finite  
>> sequences of octets
>
> I'm wondering whether we should simply say: OWL has the following  
> (following your later mail).
>
> owl:Number
> owl:CharacterString
> owl:BitString
> owl:Integer
>
> We confuse the issue by using the xsd uris to name a different sort  
> of thing (an OWL value space, not an XSD:type)
>
>> The following datatype would also be supported in OWL 2:
>>
>> - xsd:integer - the value space is the subset of the value space of  
>> owl:number containing all integers
>
> See above.
>
>> Finally, we might support the following "shortcut" datatypes, whose  
>> value spaces can be defined from the value spaces of the above
>> mentioned datatypes using facets
>>
>> - various xsd:integer derivatives, such as xsd:int and xsd:long
>> - various xsd:string derivatives, such as xsd:Name
>
> In order to keep the design clean, I'd suggest that we define these  
> in the owl namespace. We can connect the xsd types to the owl version.
>
> However: The use of e.g. xsd:string in restrictions is the common  
> idiom. I think we should document that some xsd datatypes, when used  
> in a restriction, are understood to mean certain owl value spaces.
>
>> 3. Allowed constants
>> ---------------------
>>
>> Conformant OWL 2 implementations are required to support the  
>> following constant types:
>>
>> - "nnn"^^xsd:int and all derivatives that fall within xsd:int - all  
>> such constants are to be interpreted as elements of owl:number
>> - "aaEbb"^^xsd:float - all such constants save for NaN and +-inf  
>> are to be interpreted as elements of owl:number
>
> Consider extending owl:number with these constants. We need some  
> interpretation of them if they are to remain intact when part of an  
> OWL file. These are effectively, "promotion" rules.
>
>> - "abc"^^xsd:string - interpreted as "abc"
>
> as you later suggest, ("abc", null) or ("abc", "") . The latter  
> avoids the issue of what to do about the pattern facted on lang.
>
>> - "abc"@langTag - interpreted as a pair ("abc",langTag)
>>
>>
>> 4. Discussion
>> --------------
>>
>> The set of constants is chosen such that implementations don't need  
>> to support numbers with arbitrary precision, which might be quite  
>> cumbersome. In fact, implementations are only required to support  
>> 32 bit integers and single precision floating point numbers.
>
> On today's hardware, I would set this to be 64 bit integers or even  
> 128 bit integers, and double precision float. Some machine's don't  
> really have single float hardware, instead rounding from double float.
>
>> There are efficient ways to represent these on virtually all systems.
>>
>> The set of datatypes, however, allows one to refer to the sets of  
>> all integers and real numbers. This allows one to specify the  
>> ontology in a way that makes reasoning easy.
>>
>> Implementations are free to support other constants as well. Note  
>> that these extensions do not necessarily mean that we need new  
>> datatypes (i.e., new value spaces). For example, an implementation  
>> might choose to support arbitrary precision numbers via constants   
>> of the form "123.03"^^xsd:decimal. Note that the proposed list of  
>> datatypes already contains the appropriate value space for such  
>> constants (i.e., owl:number).
>
> I think xsd:decimal should be considered a lexical form of owl:Number.
>
>> The open issues are what to do with NaN and +-inf and with date- 
>> time datatypes.
>
> In the first case, I suggest above that owl:Number be real+"NaN"+"- 
> INF"+"+INF"
> I'd also suggest that "-0" and "+0" be considered lexical forms of  
> the number 0.
>
> For the date-time datatypes, I wonder whether it would work to define:
>
> owl:Time (isomorphic to the reals)
> owl:TimeZoneTime (also isomorphic to the reals)
>
> There is one value space for all the lexical date-times have time  
> zone specified, and another value space for all the lexical date- 
> times. There would be no comparison possible between owl:Time and  
> owl:TimeZoneTime.
>
> There would still be work necessary to determine whether the  
> repeating interval types, like monday, are feasible to implement.
>
> -Alan
>
>>
>> Regards,
>>
>> 	Boris
>>
>>
>>
>
>

Received on Wednesday, 9 July 2008 15:53:00 UTC