- From: Pat Hayes <phayes@ai.uwf.edu>
- Date: Fri, 9 Nov 2001 12:49:48 -0600
- To: w3c-rdfcore-wg@w3.org
- Cc: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
- Message-Id: <p05101027b811ba46a791@[65.212.118.147]>
After the recent email flurry I think I can distinguish five
proposals and summarize their pros and cons. They can be
distinguished on a primary axis of degree of localization of
datatyping information, ie how 'far away' the datatyping information
relevant to a literal can be from that literal itself.
X. (Patrick)
Very local indeed; every literal is required to have its datatype
included as part of the literal label itself. Example (I may have
the URV syntax wrong):
aaa eg:prop <xsd:integer:10> .
In fact, these literal-thingies can be regarded as a form of URI
(URV), so that there are in fact no literals at all. (I will go on
referring to these URVs as 'literal labels' in what follows, however,
for consistency.)
Datatype names play no role in the RDFS syntax.
S. (Sergey)
Quite local, in that literals are required to be linked directly to
bNodes by edges labelled with the datatype name. The bNode denotes
the value of the literal; all literals denote strings. Example:
aaa eg:prop _:x .
_:x xsd:integer "10" .
Datatype names are names of properties.
DC. (Dan)
Similar; all literals are strings, and similar use of a bNode, but
with separate arcs for the literal and the datatype. Example:
aaa eg:prop _:x .
_:x rdf:label "10" .
_:x rdf:type xsd:integer .
Datatype names are names of classes.
P. (Peter)
Not local at all, in that literals are assigned a datatype
indirectly, by declaring a datatype to be the range of the property
used in the triple. The range information might be anywhere in the
graph, and need not be 'close' to the triple including the literal.
Example:
aaa eg:prop "10" .
...
eg:prop rdfs:range xsd:integer .
Notice that the literal label does *not* automatically denote a
string in this case, in contrast to S and DC. In fact, this requires
that different occurrences of the same literal may have different
interpretations. Notice also that rdfs:range is the only way to
specify a datatype constraint.
Datatype names are names of classes.
P++. (Pat)
Either local or not, in that *any* piece of RDF(S) that entails that
a literal is in a datatype class is sufficient to fix the datatype,
including range information but also including local rdf:type
information applied to the literal directly. This is therefore an
extension to P. In practice, it is only a real extension if literals
are allowed to be subjects, so this proposal involves extending
Ntriples notation to Ntriples++ and allowing literals as subjects.
The P and S examples both work here, but so does the following (in
Ntriples++):
aaa eg:prop _:x:"10" .
_:x rdf:type xsd:integer .
ie the three-node graph
aaa---eg:prop--->"10"---rdf:type--->xsd:integer
(BTW, compare this to the S version, also a three-node graph:
aaa---eg:prop--->[]---xsd:integer--->"10" )
Datatype names can be names of classes or names of properties, or both.
-----------------
OK, now some of the issues that arise. First, the P and P++ proposals
both require a lot more semantic machinery. They require RDF graphs
to be non-tidy on literal nodes, since literal meanings are
contextual; they require extensions to the model theory to be able to
handle the 'connection' between datatyping information and the
literals to which that information is supposed to be applied. (We
can do all that, but it does take some effort to be able to follow it
all, and some of the issues that come up are subtle.) These two
proposals also require any datatyping scheme to be 'proper' (a term I
just invented) in the sense that Patrick identified, viz that the
lexical-to-value mappings must be upward compatible in the datatype
class heirarchy. XML schema is proper in this sense, but some of the
artificial examples that have been used (especially the use of
incompatible integer encodings) are not.
The P++ proposal , in addition, requires extending Ntriples syntax
and allowing literals to be subjects, which breaks RDF/XML.
None of the first three proposals require all this elaboration
(although they are not incompatible with it), since they all assume
that literal meanings are completely specified by the literal label
(to be a single literal value in X, or to be a string in S and DC),
and the datatype class heirarchy, if it exists, is invisible to RDFS.
They can all be straightforwardly handled in RDF/XML.
The S and CD proposals require that users conform to a given 'idiom',
and are often incompatible with current common usage in which
literals are used to refer to things other than strings; in contrast,
such usage is handled by P and P++. Also, such idioms may be
incompatible with extensions to RDFS, in particular with DAML. (This
needs to be checked more carefully.)
The X proposal is incompatible with all current usage as it requires
all literals to be replaced with URVs. However, the translation from
current usage into the new form is straightforward and mechanical,
and does not require any change to the triples structure (eg does not
introduce any new bNodes).
The DC proposal uses more triples than S, and has been criticized on
the grounds that a merge with several different labels would be
ambiguous, eg:
aaa eg:prop _:x .
_:x rdf:label "10" .
_:x rdf:type xxd:octal .
_:x rdf:label "1000"
_:x rdf:type xxd:binary .
Contrast with how this would be done in S:
aaa eg:prop _:x .
_:x xxd:octal "10" .
_:x xxd:binary "1000" .
On the other hand, DC shares with P and P++ the ability to express a
value being a literal without saying what its datatype is:
aaa eg:prop _:x .
_:x rdf:label "10" .
or, in P(++), simply:
aaa eg:prop "10" .
Such 'uncommitted' use of a literal label is syntactically impossible
in X or S. (It is not clear whether this counts as a pro or a con; it
seems to depend on whether or not one wishes to be able to check RDF
for semantic integrity or conformity to an external schema.)
-------------------
Here's a table summing up all this as various cons and pros (view in
Courier). The brackets indicate a qualified answer, eg X doesn't
strictly conform to current usage, but the change is minimal. There
may be other issues not listed here, of course. In particular, I have
not gone into the issues that arise if we want to be able to
*describe* datatyping schemes in RDF(S) itself, rather than simply
refer to them.
X S DC P P++
CONS
requires literals as subjects x
requires change to MT x x
requires DTs to be 'proper' x x
requires user conform to idiom (x) x x
(requires literals to be typed) x x (pro or con?)
cannot express 'clashing' types x x (x) (x)
PROS
fully general x
conforms to current usage (x) x x
allows free type merging x
compatible with DAML (?) x ?
-----------
Hope this helps; anyway, I've done a dump of *my* mental state, thank goodness.
I have to say, on balance, S looks like the best option; simple,
compact, requires no changes to the MT or to RDF/XML, doesn't require
commitment to a new URI scheme, doesn't go beyond the charter, and is
able to handle even 'improper' datatyping schemes. It also makes
sense within the proposed MT extension, so would be "upward
compatible" if ever anyone wanted to extend RDF in this more
elaborate way in the future. The fact that it can't handle untyped
literals may not be serious, and in any case could be hacked around
in practice. My only serious worry is whether it might break current
DAML+OIL usage of literals. Peter??
Pat
--
---------------------------------------------------------------------
IHMC (850)434 8903 home
40 South Alcaniz St. (850)202 4416 office
Pensacola, FL 32501 (850)202 4440 fax
phayes@ai.uwf.edu
http://www.coginst.uwf.edu/~phayes
Received on Friday, 9 November 2001 13:49:56 UTC