- From: <Patrick.Stickler@nokia.com>
- Date: Mon, 12 Nov 2001 22:28:48 +0200
- To: phayes@ai.uwf.edu, w3c-rdfcore-wg@w3.org
- Cc: pfps@research.bell-labs.com
Please have a look at my X proposal summary, which includes alot more than the use of URVs (which in fact are a minimal component). It suggests (in the graph notation of the proposal, sorry don't know exactly how to do this in Ntriples): [1,S] | --- subject ----> [2,U,#aaa] | --- predicate --> [3,U,{eg:prop}] | --- object -----> [4,U,xsd:integer:10] ^ [5,S] | | | --- subject ---------- | --- predicate --> [6,U,{rdf:type}] | --- object -----> [7,U,{xsd:integer}] This latter statement [5,S] may be implicitely assumed by a system having knowledge about the xsd:integer URV scheme. Note that this is a graph notation and thus shouldn't be compared with NTriples in terms of conciseness. This is very similar in principle to the P++ proposal in that it treats literals as subjects and adopts the concept of literal and uriref labels for nodes, but the graph model (which is the key) is statement-centric rather than resource-centric and all statements are reified, and the present RDF graph model is a "view" or interpretation of the graph model used in the X proposal. Please have a look at my recent summary of the X proposal for all the gory details. Note especially that the X proposal, as defined in my summary today, does not require that literals be encoded as URVs. There are practical benefits to URV encoding, which are outlined in the summary, but one can use an X proposal approach and never use URVs. Cheers, Patrick -----Original Message----- From: ext Pat Hayes [mailto:phayes@ai.uwf.edu] Sent: 09 November, 2001 20:50 To: w3c-rdfcore-wg@w3.org Cc: Peter F. Patel-Schneider Subject: DATATYPES: mental dump. After the recent email flurry I think I can distinguish five proposals and summarize their pros and cons. They can be distinguished on a primary axis of degree of localization of datatyping information, ie how 'far away' the datatyping information relevant to a literal can be from that literal itself. X. (Patrick) Very local indeed; every literal is required to have its datatype included as part of the literal label itself. Example (I may have the URV syntax wrong): aaa eg:prop <xsd:integer:10> . In fact, these literal-thingies can be regarded as a form of URI (URV), so that there are in fact no literals at all. (I will go on referring to these URVs as 'literal labels' in what follows, however, for consistency.) Datatype names play no role in the RDFS syntax. S. (Sergey) Quite local, in that literals are required to be linked directly to bNodes by edges labelled with the datatype name. The bNode denotes the value of the literal; all literals denote strings. Example: aaa eg:prop _:x . _:x xsd:integer "10" . Datatype names are names of properties. DC. (Dan) Similar; all literals are strings, and similar use of a bNode, but with separate arcs for the literal and the datatype. Example: aaa eg:prop _:x . _:x rdf:label "10" . _:x rdf:type xsd:integer . Datatype names are names of classes. P. (Peter) Not local at all, in that literals are assigned a datatype indirectly, by declaring a datatype to be the range of the property used in the triple. The range information might be anywhere in the graph, and need not be 'close' to the triple including the literal. Example: aaa eg:prop "10" . ... eg:prop rdfs:range xsd:integer . Notice that the literal label does *not* automatically denote a string in this case, in contrast to S and DC. In fact, this requires that different occurrences of the same literal may have different interpretations. Notice also that rdfs:range is the only way to specify a datatype constraint. Datatype names are names of classes. P++. (Pat) Either local or not, in that *any* piece of RDF(S) that entails that a literal is in a datatype class is sufficient to fix the datatype, including range information but also including local rdf:type information applied to the literal directly. This is therefore an extension to P. In practice, it is only a real extension if literals are allowed to be subjects, so this proposal involves extending Ntriples notation to Ntriples++ and allowing literals as subjects. The P and S examples both work here, but so does the following (in Ntriples++): aaa eg:prop _:x:"10" . _:x rdf:type xsd:integer . ie the three-node graph aaa---eg:prop--->"10"---rdf:type--->xsd:integer (BTW, compare this to the S version, also a three-node graph: aaa---eg:prop--->[]---xsd:integer--->"10" ) Datatype names can be names of classes or names of properties, or both. ----------------- OK, now some of the issues that arise. First, the P and P++ proposals both require a lot more semantic machinery. They require RDF graphs to be non-tidy on literal nodes, since literal meanings are contextual; they require extensions to the model theory to be able to handle the 'connection' between datatyping information and the literals to which that information is supposed to be applied. (We can do all that, but it does take some effort to be able to follow it all, and some of the issues that come up are subtle.) These two proposals also require any datatyping scheme to be 'proper' (a term I just invented) in the sense that Patrick identified, viz that the lexical-to-value mappings must be upward compatible in the datatype class heirarchy. XML schema is proper in this sense, but some of the artificial examples that have been used (especially the use of incompatible integer encodings) are not. The P++ proposal , in addition, requires extending Ntriples syntax and allowing literals to be subjects, which breaks RDF/XML. None of the first three proposals require all this elaboration (although they are not incompatible with it), since they all assume that literal meanings are completely specified by the literal label (to be a single literal value in X, or to be a string in S and DC), and the datatype class heirarchy, if it exists, is invisible to RDFS. They can all be straightforwardly handled in RDF/XML. The S and CD proposals require that users conform to a given 'idiom', and are often incompatible with current common usage in which literals are used to refer to things other than strings; in contrast, such usage is handled by P and P++. Also, such idioms may be incompatible with extensions to RDFS, in particular with DAML. (This needs to be checked more carefully.) The X proposal is incompatible with all current usage as it requires all literals to be replaced with URVs. However, the translation from current usage into the new form is straightforward and mechanical, and does not require any change to the triples structure (eg does not introduce any new bNodes). The DC proposal uses more triples than S, and has been criticized on the grounds that a merge with several different labels would be ambiguous, eg: aaa eg:prop _:x . _:x rdf:label "10" . _:x rdf:type xxd:octal . _:x rdf:label "1000" _:x rdf:type xxd:binary . Contrast with how this would be done in S: aaa eg:prop _:x . _:x xxd:octal "10" . _:x xxd:binary "1000" . On the other hand, DC shares with P and P++ the ability to express a value being a literal without saying what its datatype is: aaa eg:prop _:x . _:x rdf:label "10" . or, in P(++), simply: aaa eg:prop "10" . Such 'uncommitted' use of a literal label is syntactically impossible in X or S. (It is not clear whether this counts as a pro or a con; it seems to depend on whether or not one wishes to be able to check RDF for semantic integrity or conformity to an external schema.) ------------------- Here's a table summing up all this as various cons and pros (view in Courier). The brackets indicate a qualified answer, eg X doesn't strictly conform to current usage, but the change is minimal. There may be other issues not listed here, of course. In particular, I have not gone into the issues that arise if we want to be able to *describe* datatyping schemes in RDF(S) itself, rather than simply refer to them. X S DC P P++ CONS requires literals as subjects x requires change to MT x x requires DTs to be 'proper' x x requires user conform to idiom (x) x x (requires literals to be typed) x x (pro or con?) cannot express 'clashing' types x x (x) (x) PROS fully general x conforms to current usage (x) x x allows free type merging x compatible with DAML (?) x ? ----------- Hope this helps; anyway, I've done a dump of *my* mental state, thank goodness. I have to say, on balance, S looks like the best option; simple, compact, requires no changes to the MT or to RDF/XML, doesn't require commitment to a new URI scheme, doesn't go beyond the charter, and is able to handle even 'improper' datatyping schemes. It also makes sense within the proposed MT extension, so would be "upward compatible" if ever anyone wanted to extend RDF in this more elaborate way in the future. The fact that it can't handle untyped literals may not be serious, and in any case could be hacked around in practice. My only serious worry is whether it might break current DAML+OIL usage of literals. Peter?? Pat -- --------------------------------------------------------------------- IHMC (850)434 8903 home 40 South Alcaniz St. (850)202 4416 office Pensacola, FL 32501 (850)202 4440 fax phayes@ai.uwf.edu http://www.coginst.uwf.edu/~phayes
Received on Monday, 12 November 2001 15:29:03 UTC