- From: Pat Hayes <phayes@ai.uwf.edu>
- Date: Mon, 22 Oct 2001 18:14:47 -0500
- To: w3c-rdfcore-wg@w3.org
Martyn Horner <martyn.horner@profium.com> wrote: >Brian McBride wrote: >..... >> > (C4) are multiple type assignments allowed? (e.g. US dollar, decimal) >> >> As above, I don't see either of these as a 'type', so I'm not sure this >> critereon is well formed. Nor is it a criterion, unless a >>preference for one or >> the other is specified. > >Seriously, dollars and deicmals are not types but encodings of data of a >certain type... surely? >The unit chosen maps the integer value into a sequence of numerals in >the same way that the choice of radix does. Therefore `decimal' and >`pounds' belong in the same syntactic position. The type which selects >the semantic domain belongs elsewhere. Radix and unit have the same role >as `lang' - they stipulate how the characters are to mapped into a >semantic sub-domain which itself has a particular type. Yes, I would agree. I think we need to not have blinkers about what someone might want to consider a datatype, and try to keep our treatment as general-purpose as possible. I think that the general picture that emerges from the process of my learning about datatypes from Peter provides a very general framework that can accomodate all the proposals made so far in a uniform way with a single semantics. Heres a sketch of how it goes. Literals are lexical items which can be somehow generically distinguished from urirefs. (And that is all we say about them in general.) The basic idea is that (unlike urirefs) they are understood to be assigned a common meaning by some 'global' conventions that are used independently of the particular interpretation; however, there may be several such 'global' conventions, so we need a general mechanism for indicating which convention is intended. A datatype is a rule which embodies some such 'global' conventions for determining the meaning of a literal, ie (mathematically speaking) it is a function from literals L to values LV. (And that is all we say about them.) Each literal has a *fixed* interpretation in a given datatype. (This is what it means to say that the interpretation of the literal is 'independent' of the particular interpretation - in the MT sense - of an RDF graph.) However, the choice of datatype to be used in interpreting any particular literal label may depend upon, or be influenced by, other information which is encoded in the graph, and therefore may depend on the particular interpretation. (This is what it means to say that the meaning of any particular literal label may depend on the interpretation.) A datatyping scheme is a set of datatypes and some method of assigning them to occurrences of literals. (And that is about all we say about them.) Datatyping schemes can be defined in various ways depending on the method used. One way is to incorporate a syntactic label for the datatype into the literal itself, and require that it be used to interpret the literal string. Another way is to regard datatypes as objects in the domain and make assertions about their relationships to the literal strings. Another, more like a conventional model theory, is to not give any explicit such method, but to talk about a 'datatyping interpretation' that assigns datatypes in some systematic way, and then state interpretation conditions which restrict the possible assignments which would make the typed assertions true. This last provides the most flexibility and has the others as special cases, and is therefore the most general solution, but (unlike the first) it provides no principled way to isolate datatype reasoning from general inference. Formally, a datatype scheme D is a set DT of things called types and two functions DTS: DT-> (L ->LV) to datatypes and DTC: DT-> powerset of LV to the range of each datatype (integers, strings, etc.), and a datatyping of a set is a function from that set into DT, ie an assignment of a datatype to everything in the set. A typed interpretation <I,D> of a graph is an interpretation I of the vocabulary plus a datatyping D of the nodes which satisfies the following conditions. (The first condition isn't a mathematical condition on the structures involved, but it is required in order to make the datatype scheme useable in any web language.) : 1. If nnn is any uri of a datatype, then I(nnn) is in DT. 2. ICEXT(d) is a subset of DTC(d) for any d in (DT intersect IR) 3. LV(n)=DTS(D(n))(label(n)) Notice that n is a node and label(n) is its label, ie the literal itself, and that D occurs only in equation 3. This provides just the amount of alignment between datatyping and interpretations to allow things like rdfs:range assertions to restrict the ICEXT mappings sufficiently to 'force' the node labelled with a literal to be properly typed. In effect, you can think of the datatyping D as a kind of variable which gets restricted by the various assertions made by a graph in just the right way to 'select' the proper way to interpret the literals. If there isn't enough information to do that, then its not completely clear what the RDF assertion is saying; but then its not entirely clear what any RDF assertion is 'really' saying, and in this case the relevant options are are least relatively clear. The relevant information can come from anywhere, in general, but we can restrict that by adding further conditions. To get the first, 'explicit syntactic' kind of datatyping, you just add one more condition, which might be written as: 4. D(n)=I(type-label(n)) for every literal node n in the graph. If you substitute this into 3 you get LV(n)= DTS(I(type-label(n)))(label(n)) and the mapping D then is completely eliminated from the equations; which shows that in this syntactically restricted case we don't really need to consider explicit datatypings at all; but, their use does not forbid syntactic datatyping if one wants to use it, and indeed this shows that both syntactic and interpreted datatyping techniques can be used together without interfering with one another. The 'bnode' suggestions can also be handled in this framework, as a rather peculiar-seeming semantic condition on rdf:value: 5. <x, y> is in IEXT(I(rdf:value)) iff x= y This isn't the 'intended' interpretation, I realize, but it does make everything work out right. What this does is to read, for example _:1 rdf:value 1234 . not as meaning '_:1 is a thing which would be written as the unicode string "1234" ' but rather as '_:1 is a thing which is gotten by interpreting the string "1234" (using the correct datatyping scheme)' which of course is just saying that _:1 is equal to 1234 (using the correct datatyping scheme). However, the conditions 1-3 above guarantee that if _:1 is known to be in a class identified by a datatype uri, then an appropriate datatyping scheme will be used, so there is no need to say that explicitly. Notice that the 'intended' reading is semantically anomalous since it requires us to take the literal, er, literally, rather than interpreting it in any way; it has a kind of use/mention glitch built into it. (Admittedly this is kind of harmless for strings, since they do denote themselves; that is why we are able to reinterpret rdf:value in the above way as meaning equality, and get away with it.) Notice also that it makes rdf:value seem kind of silly; if it means equality, and can only be used with literals, why not just substitute the literal for the blank node and get rid of the blank node? (Current answer: Because that would require us to allow literals as subjects if we want to write the equivalent of _:1 rdf:type xsd:integer . Response: So, lets have literals as subjects, why not? Or at any rate, let us face up to the fact that this prohibition is purely an ad-hoc syntactic restriction imposed for no semantic reason.) One way or another, this model theory extension seems to be able to handle any kind of datatyping that anyone has so far suggested. As (I gather from Peter P-S) it can also handle all of XML datatyping, and it can handle all of DAML+OIL (in fact it will probably be built into the next DAML+OIL model theory), I would suggest that we adopt it as standard, therefore. I will work up a draft extension to the MT document which covers it and explains the alternatives, and then people can discuss it, how's that? Pat -- --------------------------------------------------------------------- IHMC (850)434 8903 home 40 South Alcaniz St. (850)202 4416 office Pensacola, FL 32501 (850)202 4440 fax phayes@ai.uwf.edu http://www.coginst.uwf.edu/~phayes
Received on Monday, 22 October 2001 19:15:01 UTC