RE: A possible structure of the datatype system for OWL 2 (related to ISSUE-126)

Hello,

> -----Original Message-----
> From: Alan Ruttenberg [mailto:alanruttenberg@gmail.com]
> Sent: 09 July 2008 15:18
> To: Boris Motik
> Cc: 'OWL Working Group WG'
> Subject: Re: A possible structure of the datatype system for OWL 2 (related to ISSUE-126)
> 
> 
> On Jul 8, 2008, at 5:16 PM, Boris Motik wrote:
> > Hello,
> >
> > 1. Datatype Map
> > ----------------
> 
> I wonder if we should still use the term "datatype", as there will
> likely be confusion with the xsd sense of datatype.
> 
> > A datatype map consists of the following things:
> >
> > - a set of datatypes
> >   - each datatype provides a set of allowed facets
> > - a possibly infinite set of constants (likely to be renamed to
> > literals, but I'll stick to "constant" for the moment)
> >   - each constant consists of a lexicalValue and a typeURI
> >   - it is written as "lexicalValue"^^typeURI
> >
> > Each datatype DT is assigned a value space DT^D, which is just a
> > nonempty set.
> 
> Is the implication that DT -> Value space DT^D, one to one?
> 

Yes. This is exactly the same as in the case of classes.

> So we have type, DT, DT^D ?
> 

I didn't really understand that.

> > Each constant c is assigned a value c^D, which is just an object
> > from the union of the value spaces of all datatypes.
> >
> >
> > Thus, a datatype can be thought as a class with a predefined
> > extension.
> 
> I'm not sure explaining it this way is helpful - might confuse rather
> than illuminate.
> 

I actually think this is the proper way of thinking about datatypes. Take, for example, owl:integer: you can think about it as one
big, infinite nominal that contains all integers. Hence, owl:integer is a class in a sense that its interpretation contains things.
The main difference between datatypes and classes is that, in the case of datatypes, the interpretation is uniquely defined by the
datatype map.

> > Note that this definition does not assume any relationship between
> > the set of supported typeURIs (which determine the allowed
> > constants) and the set of datatypes (which determine the allowed
> > sets of values).
> 
> I think we should consider calling "typeURI" "lexicalFormURI" to
> suggest the correct thinking, as people tend to equate "type" and
> "class". (as with rdf:type)
> 

I agree.

> Can we not simplify the above to: There are "Value spaces" and
> "lexicalFromURI"s. I'm not seeing how having "Datatypes" as an
> additional concept helps.
> 

There is a distinction, albeit a subtle one. "Datatype" is a syntactic category; hence, you can put datatypes into property range
axioms. "Value space" is a semantic category. Hence, you can't work with value spaces at the level of a syntax; that is, you don't
put *the set of all integers* into an ontology when you say "xsd:integer is a range of P"; rather, you put xsd:integer (a datatype),
which acts as a moniker for its value space.

> >
> > 2. Allowed datatypes
> > ---------------------
> >
> > Comformant OWL 2 implementations would be required to support the
> > following base datatypes, each of whose value spaces would be
> > disjoint:
> 
> > - owl:number - the value space is the set of all real numbers
> > - xsd:string - the value space is the set of all Unicode strings in
> > normal form C
> > - owl:internationalizedString - the value space set is the set of
> > pairs of the form (string,langTag)
> > - xsd:hexBinary - the value space is the set of all finite
> > sequences of octets
> 
> I'm wondering whether we should simply say: OWL has the following
> (following your later mail).
> 
> owl:Number
> owl:CharacterString
> owl:BitString
> owl:Integer
> 
> We confuse the issue by using the xsd uris to name a different sort
> of thing (an OWL value space, not an XSD:type)
> 

I personally don't mind renaming all datatypes to owl:*. I can see, however, people might object, partially because of a backwards
compatibility issue. After all, in OWL 1, you had xsd:string and xsd:integer.

> > The following datatype would also be supported in OWL 2:
> >
> > - xsd:integer - the value space is the subset of the value space of
> > owl:number containing all integers
> 
> See above.
> 
> > Finally, we might support the following "shortcut" datatypes, whose
> > value spaces can be defined from the value spaces of the above
> > mentioned datatypes using facets
> >
> > - various xsd:integer derivatives, such as xsd:int and xsd:long
> > - various xsd:string derivatives, such as xsd:Name
> 
> In order to keep the design clean, I'd suggest that we define these
> in the owl namespace. We can connect the xsd types to the owl version.
> 
> However: The use of e.g. xsd:string in restrictions is the common
> idiom. I think we should document that some xsd datatypes, when used
> in a restriction, are understood to mean certain owl value spaces.
> 
> > 3. Allowed constants
> > ---------------------
> >
> > Conformant OWL 2 implementations are required to support the
> > following constant types:
> >
> > - "nnn"^^xsd:int and all derivatives that fall within xsd:int - all
> > such constants are to be interpreted as elements of owl:number
> > - "aaEbb"^^xsd:float - all such constants save for NaN and +-inf
> > are to be interpreted as elements of owl:number
> 
> Consider extending owl:number with these constants. We need some
> interpretation of them if they are to remain intact when part of an
> OWL file. These are effectively, "promotion" rules.
> 

We can have owl:numberPlus (or owl:numberOnSteroids if you prefer :-) that contains these guys as well.

> > - "abc"^^xsd:string - interpreted as "abc"
> 
> as you later suggest, ("abc", null) or ("abc", "") . The latter
> avoids the issue of what to do about the pattern facted on lang.
> 
> > - "abc"@langTag - interpreted as a pair ("abc",langTag)
> >
> >
> > 4. Discussion
> > --------------
> >
> > The set of constants is chosen such that implementations don't need
> > to support numbers with arbitrary precision, which might be quite
> > cumbersome. In fact, implementations are only required to support
> > 32 bit integers and single precision floating point numbers.
> 
> On today's hardware, I would set this to be 64 bit integers or even
> 128 bit integers, and double precision float. Some machine's don't
> really have single float hardware, instead rounding from double float.
> 

I don't mind going up to 64 bit. 128 might be a bit too much (at least in Java -- a language in which many reasoners are implemented
you don't have this). 

> > There are efficient ways to represent these on virtually all systems.
> >
> > The set of datatypes, however, allows one to refer to the sets of
> > all integers and real numbers. This allows one to specify the
> > ontology in a way that makes reasoning easy.
> >
> > Implementations are free to support other constants as well. Note
> > that these extensions do not necessarily mean that we need new
> > datatypes (i.e., new value spaces). For example, an implementation
> > might choose to support arbitrary precision numbers via constants
> > of the form "123.03"^^xsd:decimal. Note that the proposed list of
> > datatypes already contains the appropriate value space for such
> > constants (i.e., owl:number).
> 
> I think xsd:decimal should be considered a lexical form of owl:Number.
> 
> > The open issues are what to do with NaN and +-inf and with date-
> > time datatypes.
> 

I think that, if we agree to the basic structure, we can easily accommodate the remaining "extra" constants and datatypes.

Regards,

	Boris

> In the first case, I suggest above that owl:Number be real+"NaN"+"-
> INF"+"+INF"
> I'd also suggest that "-0" and "+0" be considered lexical forms of
> the number 0.
> 
> For the date-time datatypes, I wonder whether it would work to define:
> 
> owl:Time (isomorphic to the reals)
> owl:TimeZoneTime (also isomorphic to the reals)
> 
> There is one value space for all the lexical date-times have time
> zone specified, and another value space for all the lexical date-
> times. There would be no comparison possible between owl:Time and
> owl:TimeZoneTime.
> 
> There would still be work necessary to determine whether the
> repeating interval types, like monday, are feasible to implement.
> 
> -Alan
> 
> >
> > Regards,
> >
> > 	Boris
> >
> >
> >

Received on Wednesday, 9 July 2008 16:59:38 UTC