- From: Pat Hayes <phayes@ai.uwf.edu>
- Date: Sun, 24 Feb 2002 14:54:26 -0600
- To: w3c-rdfcore-wg@w3.org
- Cc: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
At yesterdays telecon, by majority vote, we took two decisions about datatyping which, if taken together, make RDF datatyping effectively unusable by anyone except librarians. I cannot produce a coherent model theory for datatyping which satisfies both of these conditions at the same time, and would suggest that one of them should be reconsidered. This message tries to briefly explain the problem, and suggest some alternative ways out of it. I recommend option C below. The decisions were that (1) we should support the S-B idiom: Jenny ex:age "10" . ex:age rdfs:range xsd:integer . and (2) that the literal in such examples should denote itself, ie the above asserts that Jenny's age is the string '10'. (BTW, I know I voted for 1, but at the time I didn't think that anyone who voted for 1 could possibly vote for 2 as well. ) The problem with this combination is that it seems to have the inevitable consequence that the class named by a datatype name must be the set of *lexical forms* of that datatype, rather than the value space of the datatype. In this example, it requires RDF to treat <xsd:number> as a class of numerals - character strings which denote numbers - rather than a class of numbers. The problem now is that if that is what the datatype class name is used for, then there is no way to refer to the class of values of a datatype. This might be appropriate for an ontology dealing with documents, of course, such as Dublin Core or SS/BB, but it is fatal for any kind of applied ontology work or interfacing to most applications, and will be immediately (and correctly) rejected by DAML+OIL and Webont as unacceptable and unworkable. Now, one option here would be to just smile at this - after all, we are the RDF WG, not Webont - but I would suggest that if we are at all interested in the possibility of RDF being acceptable as a foundation for any more ambitious efforts, rather than an dead end, we should pay attention to the likely needs of most people on the planet, who will want to use numerals to refer to numbers, rather than treating the entire XML datatyping scheme as a way of classifying character strings. I have been trying to come up with some way out of this dilemma, and there are several options I can see. Right now I personally prefer B, but think that C is the best (simplest) compromise, and one that would be acceptable to the most people (both on and off the WG). A. The S-B idiom could be replaced by something more complicated which expresses the intended restriction of the literal to the space of lexical forms of the datatype. For example, it could be done by replacing the rdfs:range assertion by the linked pair of range assertions: ex:age rdfs:range _:x . xsd:integer rdfs:range _:x . If that is acceptable, then that solves the problem, since it frees up the datatype name itself to have the value space as its extension when used as a class name. However, I have a nagging feeling that it will not be acceptable to some people, since anyone who thinks that S-B is meaningful as it stands is probably under the impression that datatype class names *do* refer to sets of lexical forms, and will therefore find this puzzling. That would require us to reconsider decision (1). B. We could follow the approach outlined in http://www.coginst.uwf.edu/users/phayes/simpledatatype2.html <<***Note, there are still some bugs in the MT equations in that document.***.>> which supports the the S-B idiom, but assigns it a different meaning: according to this approach, Jenny's age would be ten rather than '10'. However, the intended restriction to the datatype range works, eg the literal "humpty" rather than a numeral would still produce a datatyping error. This proposal also the merit of not requiring the use of rdfs:drange; it works by 'overloading' rdf:range. It can also, with a little tweaking to the MT, be used with tidy literal nodes (and although, as noted in the document, that would need to be revised if literals were allowed to be subjects, I would predict that tidy literal nodes would need to be reconsidered in that case anyway.) I believe this is the first proposal that combines tidy literals with the possibility of having datatype-sensitive denotations for literals, by the way. That would require us to reconsider decision (2) C. (New idea, see http://www.coginst.uwf.edu/users/phayes/simpledatatype23-02-2002.html for a write-up.) If we are willing to re-introduce rdfs:dtype (or some other special vocabulary) then it is possible to combine the two decisions coherently, with the S-B idiom suitably modified. That is, we can use rdfs:dtype to impose 'lexical' constraints on the in-line literal use, as in S-B, and also 'value' constraints on the bnode idiom: Jimmy ex:age _:x . _:x rdfs:dlex "10" . at the same time, without getting the ranges mixed up. The first asserts that Jenny's age is the lexical form '10'; the second asserts that Jimmy's age is the number ten. The idea here is that the rdfs:drange assertion is used PURELY to assign datatype restrictions to properties, and says NOTHING AT ALL about their ranges. So rdfs:drange and rdfs:range are completely independent; neither is assumed to be a subproperty of the other. This frees up the rdfs:drange from any commitment to the rdfs class interpretation of the datatype, and allows us to continue to utilize the class name to refer to the value class (and use the idiom suggested in A above to refer to the lexical class, if required) independently of the imposition of datatyping conditions. Of course we cannot use rdfs:range in this case, so this would require us to reconsider decision (1); but the required change is minimal. In many ways C, is the 'leanest' option so far, in the sense that it provides the most flexibility with the minimum of syntactic and MT complexity; it uses tidy literals which always denote themselves, and leaves the undatatyped MT completely unchanged. Pat -- --------------------------------------------------------------------- IHMC (850)434 8903 home 40 South Alcaniz St. (850)202 4416 office Pensacola, FL 32501 (850)202 4440 fax phayes@ai.uwf.edu http://www.coginst.uwf.edu/~phayes
Received on Sunday, 24 February 2002 15:54:23 UTC