Re: incomplete datatyping (was: Re: datatypes and MT) from Sergey Melnik on 2001-11-06 (w3c-rdfcore-wg@w3.org from November 2001)

From: Sergey Melnik <melnik@db.stanford.edu>
Date: Mon, 05 Nov 2001 16:05:10 -0800
To: Pat Hayes <phayes@ai.uwf.edu>
CC: w3c-rdfcore-wg@w3.org
Message-ID: <3BE72936.C6529027@db.stanford.edu>
Pat Hayes wrote:
> 
> >Pat Hayes wrote:
> >>  ...
> >>  >>Now, this seems to me to have a fatal flaw, which arises from the
> >>  >>fact that the value spaces of two different datatypes might
> >>  >>overlap. For example, suppose that there are datatypes xxd:octal
> >>  >>and xxd:decimal, then the following would seem to be perfectly true:
> >>  >>
> >>  >>_:1 rdf:type xxd:octal
> >>  >>_:1 rdf:type xxd:decimal
> >>  >>_:1 rdf:value "32"
> >>  >>_:1 rdf:value "26"
> >>  >
> >>  >
> >>  >But that is not how Sergey would write it.  He is proposing:
> >>  >
> >>  >   _:1 xxd:octal   "32" .
> >>  >   _:1 xxd:decimal "26" .
> >>
> >>  Oh, I see. That does indeed avoid this problem, but it also throws
> >>  away the advantages of the bnode way of doing things, since now it is
> >>  impossible to be neutral about datatypes.
> >
> >Why? If you want to be neutral, use a bNode w/o any arcs attached to it
> >(if I understand what you mean by neutral).
> 
> No, what I mean by 'neutral' is writing, say, that my shoe size is 10
> without giving the datatyping of the literal. That is what is
> impossible in this scheme: to use a literal before, or independently
> of, giving that literal a datatype (because in this scheme, as I
> understand it, the ONLY places that a literal label are allowed are
> the object ends of triples whose predicate is a datatype name). That
> is why I say it is only a notational variation on the simple idea of
> incorporating datatyping information in to the literal label itself.
> (BTW, I agree that this simple idea has its merits; but I think that
> if we are going to insist that literals *must* be explicitly
> datatyped, then we should impose this as an explicit syntactic
> constraint in the very syntax of the language.)

In principle, I agree. However, if we stick a single type to each
literal we won't be able to deal with the cases where multiple literals
are required to determine the data value unambiguously

_x rdf:type ComplexNumber
_x realDecimal "1.0"
_x imaginaryDecimal "2.0"

as indicated in
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Nov/0103.html

> 
> >  > This forces the datatyping
> >>  information to be attached directly to the literal;
> >
> >Right.
> >
> >>  the only place
> >>  literals can occur in an RDF graph is at the object end of links
> >>  labelled with a datatype.
> >
> >True, although I personally do not see any problem with allowing
> >
> >"cat"  base64  "Y2F0"
> >
> >either.
> >
> >>  This seems to me to be simply a variation
> >>  on the idea of incorporating the datatype label into the literal
> >>  itself, eg by having literals be pairs of a datatype and a string.
> >
> >Not quite. Notice that we can refer both to the literal "typed" in such
> >a way, and its type by means of arcs in RDF graphs (and, for example,
> >provide additional information about the datatype or describe how the
> >literal/string is represented using the base64 encoding, if that's all
> >we know).
> 
> Suppose I know that some property is represented by a literal "Y2F0",
> but have (as yet) no information about the appropriate datatyping to
> be used to interpret that literal. How would you represent that state
> of information?

Oh, I think this leads us into inferencing, which seems to work just
fine in both approaches. In the above case, you might assert:

_n propertyIDontKnowAnythingAbout "Y2F0"

> And now suppose that I discover, perhaps from a
> different source, that the property in question is stated in terms of
> a base-64 integer encoding: how would you encode that information?

(X propertyIDontKnowAnythingAbout Y)
  -> (X base64 Y)

> And then how would I be able to put these two pieces of information
> together into a single graph, and be able to draw the obvious
> conclusion? Remember that it is not valid to merge bnodes from two
> different RDF graphs.

I think using the above rule we'd be able to derive

_n base64 "Y2F0"

right?

> >  > Like that proposal, it forces datatype information to be given
> >>  explictly and locally,
> >
> >Yes, and I believe that's a strength rather than a weakness. I think it
> >is essential for the snippets of information dispersed on the Semantic
> >Web to be as precise as possible and as self-contained as possible.
> 
> I think that is completely wrong-headed. The whole point of using an
> assertional language is to be able to put together pieces of
> information from various sources and draw reasonable conclusions from
> them.

I guess we can argue about it, but this discourse is more of a
philosophical nature. There are many reasons to reduce the impact of
schemas on instance data. Evolution of schemas (that break instance
data) and archival purposes are probably the most obvious ones. Another
crucial issue is that the developers often do not properly understand
the semantics of the (graph) encodings that they design. Part of the
problem is, of course, that they do not even bother to develop a schema
for their data, let alone to capture it in a machine-readable form...

> If we start declaring that certain kinds of information
> *must* be linked to  others, or *can only* be inferred in a certain
> sort of way, we might as well use Java.

I don't think that's a pro argument, Pat. (_x size "10") *can only* be
processed correctly if the tool got the schema that defines the property
"size" and understands the schema language, right?

> I would agree that IF the information is available locally then of
> course it should be possible to use it locally. But the reasoners
> should also be able to function even when all the information is not
> available locally; they should not barf just because the information
> provided is incomplete.

I looks to me that our discussion is probably even more metaphysical
than I originally thought. Let me try to put a different spin on your
approach (of course, this is not the way you see it, I understand),
which I believe allows all intefencing to work just the way you'd expect
it. Assume that

_x size "10"

asserts there is a certain relationship between I(_x) and the literal
value I("10"). (This is the `straightforward' interpretation). In other
words, the property `size' with a literal "10" hanging off it restricts
the number of valid interpretations of _x. Now, imagine that there is a
rule somewhere, in some schema that "breathes in life" into the above
statement:

(X size L)
  ->  exists N: (X shoeSize N),
                (N rdf:type Integer),
                (N xsd:int  L) .

In this light, the original statement (_x size "10") can be viewed as a
syntactic construction, which is interpreted using a inference rule into
something that has an adequate semantic interpretation. In particular,
the property `shoeSize' would connect a shoe with an integer, rather
than with a literal value.

I think that both approaches on the table are equivalent in the sense
that they can ultimately provide a very similar high-level
interpretation to any given piece of RDF instance data, although using
quite different schemas and a different perspective. My feeling is,
however, that by giving a reasonable (not straightforward)
model-theoretic interpretation to (_x size "10") you finesse the fact
that this statement is "syntactic matter" that needs further
explanation, i.e., my means of rules.

The two reasons I am hesitant to buy your suggestion are that a) it
reminds me of taking a random piece of XML and "interpreting" it as RDF
using rule-based transformations, and b) it transfigures the model
theory in such a way that I (and maybe others with similarly limited
mental abilities) have hard times understanding it - to the contrary of
my belief that MT is there to help clarify things.

BTW, the above rule-based approach addresses your concern that local
typing information needs to be provided, does it?

Sergey
Received on Monday, 5 November 2001 18:38:21 UTC