datatyping

Let me summarize a proposal for exactly what we should say about datatypes.

1. A datatype is assumed to be identified by a uriref. The assertion

aaa rdf:type rdfs:Datatype .

is intended to be interpreted by a datatype-savvy RDF engine as an 
indication that aaa is the uriref of a datatype, and that it is 
appropriate to attempt to access the information associated with that 
datatype. The exact form in which this information is to be provided 
to an RDF engine should be specified as part of the API of any such 
engine.

Such an assertion does not constitute a definition of a datatype. 
There is no way to define a datatype in RDFS. Datatypes are defined 
externally to RDFS.

2. In order to be useful, some information  about a datatype needs to 
be provided to a datatype-savvy RDF engine. The information is of 
various kinds, and some datatypes may provide only part of the 
information. Insofar as information about the datatype is 
unavailable, a datatype-savvy RDF engine will be able to draw only 
the same conclusions as a non-datatype-savvy RDF engine. Or, if you 
like, stated semantically, datatype entailment is defined relative to 
the information provided by the datatype information source. If you 
get more information, you can make more inferences; if you get none, 
then the datatype adds nothing and you are just doing RDFS. That way, 
RDFS entailment is like datatype entailment with an empty-information 
datatype.

3a. The minimal kind of information is a specification of which 
literals are syntactically correct, ie in the lexical space of the 
datatype, and which are not.
This information being unobtainable for a resource which is asserted 
to be in the class rdfs:Datatype may be considered an error condition.
3b. The second kind of information is a specification of which 
literals map to the same value in the datatype. This information can 
be conceptualized as a set of equations between typed literals with 
the same type:
"aaa"^^ddd = "bbb"^^ddd .
but it may also be provided, for example, by giving a mapping from 
lexical forms to canonical lexical forms.
3c. The third kind of information is like 3b, but specifies 
identities between forms under different datatypes:
"aaa"^^ddd = "bbb"^^eee .
This may be provided, for example, by giving schematic mappings 
between canonical lexical forms of the different datatypes under 
various boundary conditions.
3d. The fourth kind of information is subset relationships between 
value spaces of different datatypes. This can be specified directly 
by RDFS subclass assertions of the form
ddd rdfs:subClassOf eee .

Information of type 3a enable inferences of the form

aaa ppp "xxx"^^ddd .
->
aaa ppp _:x
_:x rdf:type ddd .

and hence is often sufficient to detect datatype clashes

Information of types 3b enables inferences of the form
aaa ppp "xxx"^^ddd .
-->
aaa ppp "yyy"^^ddd .

Information of type 3c enables inferences of the form

aaa ppp "xxx"^^ddd .
-->
aaa ppp "yyy"^^eee .

Information of type 3d allows RDFS class reasoning to support 
inferences of the form

aaa ppp "xxx"^^ddd .
-->
aaa ppp _:z .
_:z rdf:type eee .

--------

Is that OK?

Pat

-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola               			(850)202 4440   fax
FL 32501            				(850)291 0667    cell
phayes@ai.uwf.edu	          http://www.coginst.uwf.edu/~phayes
s.pam@ai.uwf.edu   for spam

Received on Tuesday, 3 December 2002 13:47:21 UTC