- From: Pat Hayes <phayes@ai.uwf.edu>
- Date: Fri, 9 Nov 2001 12:49:48 -0600
- To: w3c-rdfcore-wg@w3.org
- Cc: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
- Message-Id: <p05101027b811ba46a791@[65.212.118.147]>
After the recent email flurry I think I can distinguish five proposals and summarize their pros and cons. They can be distinguished on a primary axis of degree of localization of datatyping information, ie how 'far away' the datatyping information relevant to a literal can be from that literal itself. X. (Patrick) Very local indeed; every literal is required to have its datatype included as part of the literal label itself. Example (I may have the URV syntax wrong): aaa eg:prop <xsd:integer:10> . In fact, these literal-thingies can be regarded as a form of URI (URV), so that there are in fact no literals at all. (I will go on referring to these URVs as 'literal labels' in what follows, however, for consistency.) Datatype names play no role in the RDFS syntax. S. (Sergey) Quite local, in that literals are required to be linked directly to bNodes by edges labelled with the datatype name. The bNode denotes the value of the literal; all literals denote strings. Example: aaa eg:prop _:x . _:x xsd:integer "10" . Datatype names are names of properties. DC. (Dan) Similar; all literals are strings, and similar use of a bNode, but with separate arcs for the literal and the datatype. Example: aaa eg:prop _:x . _:x rdf:label "10" . _:x rdf:type xsd:integer . Datatype names are names of classes. P. (Peter) Not local at all, in that literals are assigned a datatype indirectly, by declaring a datatype to be the range of the property used in the triple. The range information might be anywhere in the graph, and need not be 'close' to the triple including the literal. Example: aaa eg:prop "10" . ... eg:prop rdfs:range xsd:integer . Notice that the literal label does *not* automatically denote a string in this case, in contrast to S and DC. In fact, this requires that different occurrences of the same literal may have different interpretations. Notice also that rdfs:range is the only way to specify a datatype constraint. Datatype names are names of classes. P++. (Pat) Either local or not, in that *any* piece of RDF(S) that entails that a literal is in a datatype class is sufficient to fix the datatype, including range information but also including local rdf:type information applied to the literal directly. This is therefore an extension to P. In practice, it is only a real extension if literals are allowed to be subjects, so this proposal involves extending Ntriples notation to Ntriples++ and allowing literals as subjects. The P and S examples both work here, but so does the following (in Ntriples++): aaa eg:prop _:x:"10" . _:x rdf:type xsd:integer . ie the three-node graph aaa---eg:prop--->"10"---rdf:type--->xsd:integer (BTW, compare this to the S version, also a three-node graph: aaa---eg:prop--->[]---xsd:integer--->"10" ) Datatype names can be names of classes or names of properties, or both. ----------------- OK, now some of the issues that arise. First, the P and P++ proposals both require a lot more semantic machinery. They require RDF graphs to be non-tidy on literal nodes, since literal meanings are contextual; they require extensions to the model theory to be able to handle the 'connection' between datatyping information and the literals to which that information is supposed to be applied. (We can do all that, but it does take some effort to be able to follow it all, and some of the issues that come up are subtle.) These two proposals also require any datatyping scheme to be 'proper' (a term I just invented) in the sense that Patrick identified, viz that the lexical-to-value mappings must be upward compatible in the datatype class heirarchy. XML schema is proper in this sense, but some of the artificial examples that have been used (especially the use of incompatible integer encodings) are not. The P++ proposal , in addition, requires extending Ntriples syntax and allowing literals to be subjects, which breaks RDF/XML. None of the first three proposals require all this elaboration (although they are not incompatible with it), since they all assume that literal meanings are completely specified by the literal label (to be a single literal value in X, or to be a string in S and DC), and the datatype class heirarchy, if it exists, is invisible to RDFS. They can all be straightforwardly handled in RDF/XML. The S and CD proposals require that users conform to a given 'idiom', and are often incompatible with current common usage in which literals are used to refer to things other than strings; in contrast, such usage is handled by P and P++. Also, such idioms may be incompatible with extensions to RDFS, in particular with DAML. (This needs to be checked more carefully.) The X proposal is incompatible with all current usage as it requires all literals to be replaced with URVs. However, the translation from current usage into the new form is straightforward and mechanical, and does not require any change to the triples structure (eg does not introduce any new bNodes). The DC proposal uses more triples than S, and has been criticized on the grounds that a merge with several different labels would be ambiguous, eg: aaa eg:prop _:x . _:x rdf:label "10" . _:x rdf:type xxd:octal . _:x rdf:label "1000" _:x rdf:type xxd:binary . Contrast with how this would be done in S: aaa eg:prop _:x . _:x xxd:octal "10" . _:x xxd:binary "1000" . On the other hand, DC shares with P and P++ the ability to express a value being a literal without saying what its datatype is: aaa eg:prop _:x . _:x rdf:label "10" . or, in P(++), simply: aaa eg:prop "10" . Such 'uncommitted' use of a literal label is syntactically impossible in X or S. (It is not clear whether this counts as a pro or a con; it seems to depend on whether or not one wishes to be able to check RDF for semantic integrity or conformity to an external schema.) ------------------- Here's a table summing up all this as various cons and pros (view in Courier). The brackets indicate a qualified answer, eg X doesn't strictly conform to current usage, but the change is minimal. There may be other issues not listed here, of course. In particular, I have not gone into the issues that arise if we want to be able to *describe* datatyping schemes in RDF(S) itself, rather than simply refer to them. X S DC P P++ CONS requires literals as subjects x requires change to MT x x requires DTs to be 'proper' x x requires user conform to idiom (x) x x (requires literals to be typed) x x (pro or con?) cannot express 'clashing' types x x (x) (x) PROS fully general x conforms to current usage (x) x x allows free type merging x compatible with DAML (?) x ? ----------- Hope this helps; anyway, I've done a dump of *my* mental state, thank goodness. I have to say, on balance, S looks like the best option; simple, compact, requires no changes to the MT or to RDF/XML, doesn't require commitment to a new URI scheme, doesn't go beyond the charter, and is able to handle even 'improper' datatyping schemes. It also makes sense within the proposed MT extension, so would be "upward compatible" if ever anyone wanted to extend RDF in this more elaborate way in the future. The fact that it can't handle untyped literals may not be serious, and in any case could be hacked around in practice. My only serious worry is whether it might break current DAML+OIL usage of literals. Peter?? Pat -- --------------------------------------------------------------------- IHMC (850)434 8903 home 40 South Alcaniz St. (850)202 4416 office Pensacola, FL 32501 (850)202 4440 fax phayes@ai.uwf.edu http://www.coginst.uwf.edu/~phayes
Received on Friday, 9 November 2001 13:49:56 UTC