Re: incomplete datatyping (was: Re: datatypes and MT) from Pat Hayes on 2001-11-07 (w3c-rdfcore-wg@w3.org from November 2001)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Tue, 6 Nov 2001 18:35:54 -0600
To: Sergey Melnik <melnik@db.stanford.edu>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p05101043b80e267e6934@[65.212.118.166]>
>Pat Hayes wrote:
>.....
>  > No, what I mean by 'neutral' is writing, say, that my shoe size is 10
>>  without giving the datatyping of the literal. That is what is
>>  impossible in this scheme: to use a literal before, or independently
>>  of, giving that literal a datatype (because in this scheme, as I
>>  understand it, the ONLY places that a literal label are allowed are
>>  the object ends of triples whose predicate is a datatype name). That
>>  is why I say it is only a notational variation on the simple idea of
>>  incorporating datatyping information in to the literal label itself.
>>  (BTW, I agree that this simple idea has its merits; but I think that
>>  if we are going to insist that literals *must* be explicitly
>>  datatyped, then we should impose this as an explicit syntactic
>>  constraint in the very syntax of the language.)
>
>In principle, I agree. However, if we stick a single type to each
>literal we won't be able to deal with the cases where multiple literals
>are required to determine the data value unambiguously
>
>_x rdf:type ComplexNumber
>_x realDecimal "1.0"
>_x imaginaryDecimal "2.0"
>
>as indicated in
>http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Nov/0103.html

OK, though this kind of example seems to have only just surfaced and 
I am not sure I like it. Would one call realDecimal a datatype?? I 
don't think I would. It seems to be a kind of product of a datatype 
mapping and a selector. (Are there any examples like this in XML 
schema?)

If we are going to use bnodes, I would rather write this as:

_x rdf:type ComplexNumber
_x realPart _:y1
_x imaginaryPart _:y2
_:y1 xsd:number "1.0"
_:y2 xsd:number "2.0"

>....>
>>  Suppose I know that some property is represented by a literal "Y2F0",
>>  but have (as yet) no information about the appropriate datatyping to
>>  be used to interpret that literal. How would you represent that state
>>  of information?
>
>Oh, I think this leads us into inferencing, which seems to work just
>fine in both approaches. In the above case, you might assert:
>
>_n propertyIDontKnowAnythingAbout "Y2F0"

??But now you seem to have broken your own rules. The proposal, as I 
understood it, was that this would be syntactically illegal: the only 
place a literal label can occur is at the sharp end of an arc 
labelled with a datatype name. Look, suppose you DID know that the 
base64 information, then this would be written as:

_n propertyIDoKnowSomethingAbout _:1
_:1 unSticklishDatatype:base64 "Y2F0"

Right? Now, how would you 'remove' the base-64 part of that graph and 
still write the information you want? There is no place to put the 
literal except at the sharp end of an arc that you now don't have a 
label for.

>  > And now suppose that I discover, perhaps from a
>>  different source, that the property in question is stated in terms of
>>  a base-64 integer encoding: how would you encode that information?
>
>(X propertyIDontKnowAnythingAbout Y)
>   -> (X base64 Y)

You have to write it in RDF. (If we could write Prolog then all our 
problems would be solved. :-)

>.....
>>  >  > Like that proposal, it forces datatype information to be given
>>  >>  explictly and locally,
>>  >
>>  >Yes, and I believe that's a strength rather than a weakness. I think it
>>  >is essential for the snippets of information dispersed on the Semantic
>>  >Web to be as precise as possible and as self-contained as possible.
>>
>>  I think that is completely wrong-headed. The whole point of using an
>>  assertional language is to be able to put together pieces of
>>  information from various sources and draw reasonable conclusions from
>>  them.
>
>I guess we can argue about it, but this discourse is more of a
>philosophical nature. There are many reasons to reduce the impact of
>schemas on instance data. Evolution of schemas (that break instance
>data) and archival purposes are probably the most obvious ones. Another
>crucial issue is that the developers often do not properly understand
>the semantics of the (graph) encodings that they design. Part of the
>problem is, of course, that they do not even bother to develop a schema
>for their data, let alone to capture it in a machine-readable form...
>
>>  If we start declaring that certain kinds of information
>>  *must* be linked to  others, or *can only* be inferred in a certain
>>  sort of way, we might as well use Java.
>
>I don't think that's a pro argument, Pat. (_x size "10") *can only* be
>processed correctly if the tool got the schema that defines the property
>"size" and understands the schema language, right?

Yes, of course, but that's not the issue.

There are two pieces of information needed to interpret a literal: 
the literal label itself, and the datatype mapping that is supposed 
to be used to interpret it. All this discussion has been about how 
these two different pieces of information can be encoded in some form 
that allows them to be brought together reliably. One way is to just 
insist that they always occur together in some sense, either by 
making a kind of composite label out of them, or by providing  a 
bnode to which the two kinds of information must be attached in a 
certain way. The MT extension, in contrast, allows them to be 
separated and treated as separate assertions, and uses RDFS inference 
machinery to make the connection between them. It seems to me that 
this flexibility is more desirable, and more in the RDF spirit, than 
any scheme which imposes strict syntactic constraints on the form of 
RDF graphs, and which breaks if those constraints are violated. That 
is what I meant by the above remark.

>
>>  I would agree that IF the information is available locally then of
>>  course it should be possible to use it locally. But the reasoners
>>  should also be able to function even when all the information is not
>>  available locally; they should not barf just because the information
>>  provided is incomplete.
>
>I looks to me that our discussion is probably even more metaphysical
>than I originally thought. Let me try to put a different spin on your
>approach (of course, this is not the way you see it, I understand),
>which I believe allows all intefencing to work just the way you'd expect
>it. Assume that
>
>_x size "10"
>
>asserts there is a certain relationship between I(_x) and the literal
>value I("10"). (This is the `straightforward' interpretation). In other
>words, the property `size' with a literal "10" hanging off it restricts
>the number of valid interpretations of _x.

It does if the interpretation is a datatype interpretation, right.

>Now, imagine that there is a
>rule somewhere, in some schema that "breathes in life" into the above
>statement:
>
>(X size L)
>   ->  exists N: (X shoeSize N),
>                 (N rdf:type Integer),
>                 (N xsd:int  L) .

? What are you talking about? Such rules aren't expressible in RDF.

>In this light, the original statement (_x size "10") can be viewed as a
>syntactic construction,

It IS a syntactic construction. In fact it is an RDF graph.

>which is interpreted using a inference rule into
>something that has an adequate semantic interpretation.

But it is an adequate semantic representation already. It is 
perfectly meaningful (if ambiguous), and when suitably conjoined 
with, or extended by, appropriate information about the datatype, and 
when interpreted in way that conforms to the datatype semantics , 
then it says exactly what one would expect it to say, viz. that x's 
shoe size is ten.

Even if you don't like the semantic account, that first triple is 
perfectly well-formed RDF; so we should, in all conscience, either 
undertake to say what it means, or make it illegal. I am still not 
quite clear what your attitude to this point is; do you intend that 
'rule' to be a kind of syntactic transformation or constraint on RDF 
graphs? What if I write a graph and refuse to apply the rule: have I 
made an error of some kind? What kind?

>In particular,
>the property `shoeSize' would connect a shoe with an integer, rather
>than with a literal value.

If the datatype is chosen appropriately (eg xsd:integer), the literal 
value of the literal "10" IS an integer. In fact it is ten.

>
>I think that both approaches on the table are equivalent in the sense
>that they can ultimately provide a very similar high-level
>interpretation to any given piece of RDF instance data, although using
>quite different schemas and a different perspective. My feeling is,
>however, that by giving a reasonable (not straightforward)
>model-theoretic interpretation to (_x size "10") you finesse the fact
>that this statement is "syntactic matter" that needs further
>explanation, i.e., my means of rules.

I don't finesse it, I deny it outright. It is not syntactic in any 
way: it says that _x's shoe size is a literal value. Until we know 
what datatyping scheme to use, we do not know which literal value, so 
it is of course ambiguous taken in isolation; but then so is much of 
RDF, by its very nature as an assertional language. But I don't see 
that triple as being more "syntactic" than a triple consisting 
entirely of urirefs.

>
>The two reasons I am hesitant to buy your suggestion are that a) it
>reminds me of taking a random piece of XML and "interpreting" it as RDF
>using rule-based transformations,

I fail to follow this point.

>and b) it transfigures the model
>theory in such a way that I (and maybe others with similarly limited
>mental abilities) have hard times understanding it - to the contrary of
>my belief that MT is there to help clarify things.

Well, I confess that I have not done a very good job of explaining 
the idea intuitively, but it really is not very complicated once you 
get used to it. It doesn't change any of the rest of the model 
theory, by the way: the extra machinery only comes into play when 
literals are around. And I would say that while the MT is there to 
help clarify things, it does that by being precise, rather than by 
being simple.

In any case, this debate isn't really about rival model theories, but 
about rival treatments of literals. The real point of the MT 
extension is that it shows that the 'straightforward' interpretation 
of a triple with a literal in it can indeed be made to work; we don't 
need to transform  or rewrite such triples; we can just leave them as 
they are, and they work just fine, and have their 'straightforward' 
meanings.

>
>BTW, the above rule-based approach addresses your concern that local
>typing information needs to be provided, does it?

I'm really not sure what this rule-based approach means. There aren't 
any such rules in RDF, right?

Pat

-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Tuesday, 6 November 2001 19:36:07 UTC