Re: A possible structure of the datatype system for OWL 2 (related to ISSUE-126)

On Jul 8, 2008, at 5:16 PM, Boris Motik wrote:
> Hello,
>
> 1. Datatype Map
> ----------------

I wonder if we should still use the term "datatype", as there will  
likely be confusion with the xsd sense of datatype.

> A datatype map consists of the following things:
>
> - a set of datatypes
>   - each datatype provides a set of allowed facets
> - a possibly infinite set of constants (likely to be renamed to  
> literals, but I'll stick to "constant" for the moment)
>   - each constant consists of a lexicalValue and a typeURI
>   - it is written as "lexicalValue"^^typeURI
>
> Each datatype DT is assigned a value space DT^D, which is just a  
> nonempty set.

Is the implication that DT -> Value space DT^D, one to one?

So we have type, DT, DT^D ?

> Each constant c is assigned a value c^D, which is just an object  
> from the union of the value spaces of all datatypes.
>
>
> Thus, a datatype can be thought as a class with a predefined  
> extension.

I'm not sure explaining it this way is helpful - might confuse rather  
than illuminate.

> Note that this definition does not assume any relationship between  
> the set of supported typeURIs (which determine the allowed  
> constants) and the set of datatypes (which determine the allowed
> sets of values).

I think we should consider calling "typeURI" "lexicalFormURI" to  
suggest the correct thinking, as people tend to equate "type" and  
"class". (as with rdf:type)

Can we not simplify the above to: There are "Value spaces" and  
"lexicalFromURI"s. I'm not seeing how having "Datatypes" as an  
additional concept helps.

>
> 2. Allowed datatypes
> ---------------------
>
> Comformant OWL 2 implementations would be required to support the  
> following base datatypes, each of whose value spaces would be
> disjoint:

> - owl:number - the value space is the set of all real numbers
> - xsd:string - the value space is the set of all Unicode strings in  
> normal form C
> - owl:internationalizedString - the value space set is the set of  
> pairs of the form (string,langTag)
> - xsd:hexBinary - the value space is the set of all finite  
> sequences of octets

I'm wondering whether we should simply say: OWL has the following  
(following your later mail).

owl:Number
owl:CharacterString
owl:BitString
owl:Integer

We confuse the issue by using the xsd uris to name a different sort  
of thing (an OWL value space, not an XSD:type)

> The following datatype would also be supported in OWL 2:
>
> - xsd:integer - the value space is the subset of the value space of  
> owl:number containing all integers

See above.

> Finally, we might support the following "shortcut" datatypes, whose  
> value spaces can be defined from the value spaces of the above
> mentioned datatypes using facets
>
> - various xsd:integer derivatives, such as xsd:int and xsd:long
> - various xsd:string derivatives, such as xsd:Name

In order to keep the design clean, I'd suggest that we define these  
in the owl namespace. We can connect the xsd types to the owl version.

However: The use of e.g. xsd:string in restrictions is the common  
idiom. I think we should document that some xsd datatypes, when used  
in a restriction, are understood to mean certain owl value spaces.

> 3. Allowed constants
> ---------------------
>
> Conformant OWL 2 implementations are required to support the  
> following constant types:
>
> - "nnn"^^xsd:int and all derivatives that fall within xsd:int - all  
> such constants are to be interpreted as elements of owl:number
> - "aaEbb"^^xsd:float - all such constants save for NaN and +-inf  
> are to be interpreted as elements of owl:number

Consider extending owl:number with these constants. We need some  
interpretation of them if they are to remain intact when part of an  
OWL file. These are effectively, "promotion" rules.

> - "abc"^^xsd:string - interpreted as "abc"

as you later suggest, ("abc", null) or ("abc", "") . The latter  
avoids the issue of what to do about the pattern facted on lang.

> - "abc"@langTag - interpreted as a pair ("abc",langTag)
>
>
> 4. Discussion
> --------------
>
> The set of constants is chosen such that implementations don't need  
> to support numbers with arbitrary precision, which might be quite  
> cumbersome. In fact, implementations are only required to support  
> 32 bit integers and single precision floating point numbers.

On today's hardware, I would set this to be 64 bit integers or even  
128 bit integers, and double precision float. Some machine's don't  
really have single float hardware, instead rounding from double float.

> There are efficient ways to represent these on virtually all systems.
>
> The set of datatypes, however, allows one to refer to the sets of  
> all integers and real numbers. This allows one to specify the  
> ontology in a way that makes reasoning easy.
>
> Implementations are free to support other constants as well. Note  
> that these extensions do not necessarily mean that we need new  
> datatypes (i.e., new value spaces). For example, an implementation  
> might choose to support arbitrary precision numbers via constants   
> of the form "123.03"^^xsd:decimal. Note that the proposed list of  
> datatypes already contains the appropriate value space for such  
> constants (i.e., owl:number).

I think xsd:decimal should be considered a lexical form of owl:Number.

> The open issues are what to do with NaN and +-inf and with date- 
> time datatypes.

In the first case, I suggest above that owl:Number be real+"NaN"+"- 
INF"+"+INF"
I'd also suggest that "-0" and "+0" be considered lexical forms of  
the number 0.

For the date-time datatypes, I wonder whether it would work to define:

owl:Time (isomorphic to the reals)
owl:TimeZoneTime (also isomorphic to the reals)

There is one value space for all the lexical date-times have time  
zone specified, and another value space for all the lexical date- 
times. There would be no comparison possible between owl:Time and  
owl:TimeZoneTime.

There would still be work necessary to determine whether the  
repeating interval types, like monday, are feasible to implement.

-Alan

>
> Regards,
>
> 	Boris
>
>
>

Received on Wednesday, 9 July 2008 14:18:51 UTC