- From: Pat Hayes <phayes@ai.uwf.edu>
- Date: Tue, 6 Nov 2001 18:35:54 -0600
- To: Sergey Melnik <melnik@db.stanford.edu>
- Cc: w3c-rdfcore-wg@w3.org
>Pat Hayes wrote:
>.....
> > No, what I mean by 'neutral' is writing, say, that my shoe size is 10
>> without giving the datatyping of the literal. That is what is
>> impossible in this scheme: to use a literal before, or independently
>> of, giving that literal a datatype (because in this scheme, as I
>> understand it, the ONLY places that a literal label are allowed are
>> the object ends of triples whose predicate is a datatype name). That
>> is why I say it is only a notational variation on the simple idea of
>> incorporating datatyping information in to the literal label itself.
>> (BTW, I agree that this simple idea has its merits; but I think that
>> if we are going to insist that literals *must* be explicitly
>> datatyped, then we should impose this as an explicit syntactic
>> constraint in the very syntax of the language.)
>
>In principle, I agree. However, if we stick a single type to each
>literal we won't be able to deal with the cases where multiple literals
>are required to determine the data value unambiguously
>
>_x rdf:type ComplexNumber
>_x realDecimal "1.0"
>_x imaginaryDecimal "2.0"
>
>as indicated in
>http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Nov/0103.html
OK, though this kind of example seems to have only just surfaced and
I am not sure I like it. Would one call realDecimal a datatype?? I
don't think I would. It seems to be a kind of product of a datatype
mapping and a selector. (Are there any examples like this in XML
schema?)
If we are going to use bnodes, I would rather write this as:
_x rdf:type ComplexNumber
_x realPart _:y1
_x imaginaryPart _:y2
_:y1 xsd:number "1.0"
_:y2 xsd:number "2.0"
>....>
>> Suppose I know that some property is represented by a literal "Y2F0",
>> but have (as yet) no information about the appropriate datatyping to
>> be used to interpret that literal. How would you represent that state
>> of information?
>
>Oh, I think this leads us into inferencing, which seems to work just
>fine in both approaches. In the above case, you might assert:
>
>_n propertyIDontKnowAnythingAbout "Y2F0"
??But now you seem to have broken your own rules. The proposal, as I
understood it, was that this would be syntactically illegal: the only
place a literal label can occur is at the sharp end of an arc
labelled with a datatype name. Look, suppose you DID know that the
base64 information, then this would be written as:
_n propertyIDoKnowSomethingAbout _:1
_:1 unSticklishDatatype:base64 "Y2F0"
Right? Now, how would you 'remove' the base-64 part of that graph and
still write the information you want? There is no place to put the
literal except at the sharp end of an arc that you now don't have a
label for.
> > And now suppose that I discover, perhaps from a
>> different source, that the property in question is stated in terms of
>> a base-64 integer encoding: how would you encode that information?
>
>(X propertyIDontKnowAnythingAbout Y)
> -> (X base64 Y)
You have to write it in RDF. (If we could write Prolog then all our
problems would be solved. :-)
>.....
>> > > Like that proposal, it forces datatype information to be given
>> >> explictly and locally,
>> >
>> >Yes, and I believe that's a strength rather than a weakness. I think it
>> >is essential for the snippets of information dispersed on the Semantic
>> >Web to be as precise as possible and as self-contained as possible.
>>
>> I think that is completely wrong-headed. The whole point of using an
>> assertional language is to be able to put together pieces of
>> information from various sources and draw reasonable conclusions from
>> them.
>
>I guess we can argue about it, but this discourse is more of a
>philosophical nature. There are many reasons to reduce the impact of
>schemas on instance data. Evolution of schemas (that break instance
>data) and archival purposes are probably the most obvious ones. Another
>crucial issue is that the developers often do not properly understand
>the semantics of the (graph) encodings that they design. Part of the
>problem is, of course, that they do not even bother to develop a schema
>for their data, let alone to capture it in a machine-readable form...
>
>> If we start declaring that certain kinds of information
>> *must* be linked to others, or *can only* be inferred in a certain
>> sort of way, we might as well use Java.
>
>I don't think that's a pro argument, Pat. (_x size "10") *can only* be
>processed correctly if the tool got the schema that defines the property
>"size" and understands the schema language, right?
Yes, of course, but that's not the issue.
There are two pieces of information needed to interpret a literal:
the literal label itself, and the datatype mapping that is supposed
to be used to interpret it. All this discussion has been about how
these two different pieces of information can be encoded in some form
that allows them to be brought together reliably. One way is to just
insist that they always occur together in some sense, either by
making a kind of composite label out of them, or by providing a
bnode to which the two kinds of information must be attached in a
certain way. The MT extension, in contrast, allows them to be
separated and treated as separate assertions, and uses RDFS inference
machinery to make the connection between them. It seems to me that
this flexibility is more desirable, and more in the RDF spirit, than
any scheme which imposes strict syntactic constraints on the form of
RDF graphs, and which breaks if those constraints are violated. That
is what I meant by the above remark.
>
>> I would agree that IF the information is available locally then of
>> course it should be possible to use it locally. But the reasoners
>> should also be able to function even when all the information is not
>> available locally; they should not barf just because the information
>> provided is incomplete.
>
>I looks to me that our discussion is probably even more metaphysical
>than I originally thought. Let me try to put a different spin on your
>approach (of course, this is not the way you see it, I understand),
>which I believe allows all intefencing to work just the way you'd expect
>it. Assume that
>
>_x size "10"
>
>asserts there is a certain relationship between I(_x) and the literal
>value I("10"). (This is the `straightforward' interpretation). In other
>words, the property `size' with a literal "10" hanging off it restricts
>the number of valid interpretations of _x.
It does if the interpretation is a datatype interpretation, right.
>Now, imagine that there is a
>rule somewhere, in some schema that "breathes in life" into the above
>statement:
>
>(X size L)
> -> exists N: (X shoeSize N),
> (N rdf:type Integer),
> (N xsd:int L) .
? What are you talking about? Such rules aren't expressible in RDF.
>In this light, the original statement (_x size "10") can be viewed as a
>syntactic construction,
It IS a syntactic construction. In fact it is an RDF graph.
>which is interpreted using a inference rule into
>something that has an adequate semantic interpretation.
But it is an adequate semantic representation already. It is
perfectly meaningful (if ambiguous), and when suitably conjoined
with, or extended by, appropriate information about the datatype, and
when interpreted in way that conforms to the datatype semantics ,
then it says exactly what one would expect it to say, viz. that x's
shoe size is ten.
Even if you don't like the semantic account, that first triple is
perfectly well-formed RDF; so we should, in all conscience, either
undertake to say what it means, or make it illegal. I am still not
quite clear what your attitude to this point is; do you intend that
'rule' to be a kind of syntactic transformation or constraint on RDF
graphs? What if I write a graph and refuse to apply the rule: have I
made an error of some kind? What kind?
>In particular,
>the property `shoeSize' would connect a shoe with an integer, rather
>than with a literal value.
If the datatype is chosen appropriately (eg xsd:integer), the literal
value of the literal "10" IS an integer. In fact it is ten.
>
>I think that both approaches on the table are equivalent in the sense
>that they can ultimately provide a very similar high-level
>interpretation to any given piece of RDF instance data, although using
>quite different schemas and a different perspective. My feeling is,
>however, that by giving a reasonable (not straightforward)
>model-theoretic interpretation to (_x size "10") you finesse the fact
>that this statement is "syntactic matter" that needs further
>explanation, i.e., my means of rules.
I don't finesse it, I deny it outright. It is not syntactic in any
way: it says that _x's shoe size is a literal value. Until we know
what datatyping scheme to use, we do not know which literal value, so
it is of course ambiguous taken in isolation; but then so is much of
RDF, by its very nature as an assertional language. But I don't see
that triple as being more "syntactic" than a triple consisting
entirely of urirefs.
>
>The two reasons I am hesitant to buy your suggestion are that a) it
>reminds me of taking a random piece of XML and "interpreting" it as RDF
>using rule-based transformations,
I fail to follow this point.
>and b) it transfigures the model
>theory in such a way that I (and maybe others with similarly limited
>mental abilities) have hard times understanding it - to the contrary of
>my belief that MT is there to help clarify things.
Well, I confess that I have not done a very good job of explaining
the idea intuitively, but it really is not very complicated once you
get used to it. It doesn't change any of the rest of the model
theory, by the way: the extra machinery only comes into play when
literals are around. And I would say that while the MT is there to
help clarify things, it does that by being precise, rather than by
being simple.
In any case, this debate isn't really about rival model theories, but
about rival treatments of literals. The real point of the MT
extension is that it shows that the 'straightforward' interpretation
of a triple with a literal in it can indeed be made to work; we don't
need to transform or rewrite such triples; we can just leave them as
they are, and they work just fine, and have their 'straightforward'
meanings.
>
>BTW, the above rule-based approach addresses your concern that local
>typing information needs to be provided, does it?
I'm really not sure what this rule-based approach means. There aren't
any such rules in RDF, right?
Pat
--
---------------------------------------------------------------------
IHMC (850)434 8903 home
40 South Alcaniz St. (850)202 4416 office
Pensacola, FL 32501 (850)202 4440 fax
phayes@ai.uwf.edu
http://www.coginst.uwf.edu/~phayes
Received on Tuesday, 6 November 2001 19:36:07 UTC