the S-B problem and what to do about it.

At yesterdays telecon, by majority vote, we took two decisions about 
datatyping which, if taken together, make RDF datatyping effectively 
unusable by anyone except librarians. I cannot produce a coherent 
model theory for datatyping which satisfies both of these conditions 
at the same time, and would suggest that one of them should be 
reconsidered. This message tries to briefly explain the problem, and 
suggest some alternative ways out of it. I recommend option C below.

The decisions were that (1) we should support the S-B idiom:

Jenny ex:age "10" .
ex:age rdfs:range xsd:integer .

and (2) that the literal in such examples should denote itself, ie 
the above asserts that Jenny's age is the string '10'.

(BTW, I know I voted for 1, but at the time I didn't think that 
anyone who voted for 1 could possibly vote for 2 as well. )

The problem with this combination is that it seems to have the 
inevitable consequence that the class named by a datatype name must 
be the set of *lexical forms* of that datatype, rather than the value 
space of the datatype. In this example, it requires RDF to treat 
<xsd:number> as a class of numerals - character strings which denote 
numbers - rather than a class of numbers. The problem now is that if 
that is what the datatype class name is used for, then there is no 
way to refer to the class of values of a datatype. This might be 
appropriate for an ontology dealing with documents, of course, such 
as Dublin Core or SS/BB, but it is fatal for any kind of applied 
ontology work or interfacing to most applications, and will be 
immediately (and correctly) rejected by DAML+OIL and Webont as 
unacceptable and unworkable.

Now, one option here would be to just smile at this - after all, we 
are the RDF WG, not Webont - but I would suggest that if we are at 
all interested in the possibility of RDF being acceptable as a 
foundation for any more ambitious efforts, rather than an dead end, 
we should pay attention to the likely needs of most people on the 
planet, who will want to use numerals to refer to numbers, rather 
than treating the entire XML datatyping scheme as a way of 
classifying character strings.

I have been trying to come up with some way out of this dilemma, and 
there are several options I can see. Right now I personally prefer B, 
but think that C is the best (simplest) compromise, and one that 
would be acceptable to the most people (both on and off the WG).

A.  The S-B idiom could be replaced by something more complicated 
which expresses the intended restriction of the literal to the space 
of lexical forms of the datatype. For example, it could be done by 
replacing the rdfs:range assertion by the linked pair of range 
assertions:

ex:age rdfs:range _:x .
xsd:integer rdfs:range _:x .

If that is acceptable, then that solves the problem, since it frees 
up the datatype name itself to have the value space as its extension 
when used as a class name. However, I have a nagging feeling that it 
will not be acceptable to some people, since anyone who thinks that 
S-B is meaningful as it stands is probably under the impression that 
datatype class names *do* refer to sets of lexical forms, and will 
therefore find this puzzling.

That would require us to reconsider decision (1).

B.  We could follow the approach outlined in 
http://www.coginst.uwf.edu/users/phayes/simpledatatype2.html
<<***Note, there are still some bugs in the MT equations in that 
document.***.>>
which supports the the S-B idiom, but assigns it a different meaning: 
according to this approach, Jenny's age would be ten rather than 
'10'. However, the intended restriction to the datatype range works, 
eg the literal "humpty"  rather than a numeral would still produce a 
datatyping error.

This proposal also the merit of not requiring the use of rdfs:drange; 
it works by 'overloading' rdf:range. It can also, with a little 
tweaking to the MT, be used with tidy literal nodes (and although, as 
noted in the document, that would need to be revised if literals were 
allowed to be subjects, I would predict that tidy literal nodes would 
need to be reconsidered in that case anyway.)

I believe this is the first proposal that combines tidy literals with 
the possibility of having datatype-sensitive denotations for 
literals, by the way.

That would require us to reconsider decision (2)

C. (New idea, see
http://www.coginst.uwf.edu/users/phayes/simpledatatype23-02-2002.html
  for a write-up.)
If we are willing to re-introduce rdfs:dtype (or some other special 
vocabulary) then it is possible to combine the two decisions 
coherently, with the S-B idiom suitably modified. That is, we can use 
rdfs:dtype to impose 'lexical' constraints on the in-line literal 
use, as in S-B, and also 'value' constraints on the bnode idiom:

Jimmy ex:age _:x .
_:x rdfs:dlex "10" .

at the same time, without getting the ranges mixed up. The first 
asserts that Jenny's age is the lexical form '10'; the second asserts 
that Jimmy's age is the number ten.  The idea here is that the 
rdfs:drange assertion is used PURELY to assign datatype restrictions 
to properties, and says NOTHING AT ALL about their ranges. So 
rdfs:drange and rdfs:range are completely independent; neither is 
assumed to be a subproperty of the other. This frees up the 
rdfs:drange from any commitment to the rdfs class interpretation of 
the datatype, and allows us to continue to utilize the class name to 
refer to the value class (and use the idiom suggested in A above to 
refer to the lexical class, if required) independently of the 
imposition of datatyping conditions.

Of course we cannot use rdfs:range in this case, so this would 
require us to reconsider decision (1); but the required change is 
minimal.

In many ways  C,  is the 'leanest' option so far, in the sense that 
it provides the most flexibility with the minimum of syntactic and MT 
complexity; it uses tidy literals which always denote themselves, and 
leaves the undatatyped MT completely unchanged.

Pat
-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes

Received on Sunday, 24 February 2002 15:54:23 UTC