Re: incomplete datatyping (was: Re: datatypes and MT) from Sergey Melnik on 2001-11-07 (w3c-rdfcore-wg@w3.org from November 2001)

From: Sergey Melnik <melnik@db.stanford.edu>
Date: Tue, 06 Nov 2001 19:13:48 -0800
To: Pat Hayes <phayes@ai.uwf.edu>
CC: w3c-rdfcore-wg@w3.org
Message-ID: <3BE8A6EC.AC061EBF@db.stanford.edu>
Pat Hayes wrote:
> 
> >Pat Hayes wrote:
> >.....
> >  > No, what I mean by 'neutral' is writing, say, that my shoe size is 10
> >>  without giving the datatyping of the literal. That is what is
> >>  impossible in this scheme: to use a literal before, or independently
> >>  of, giving that literal a datatype (because in this scheme, as I
> >>  understand it, the ONLY places that a literal label are allowed are
> >>  the object ends of triples whose predicate is a datatype name). That
> >>  is why I say it is only a notational variation on the simple idea of
> >>  incorporating datatyping information in to the literal label itself.
> >>  (BTW, I agree that this simple idea has its merits; but I think that
> >>  if we are going to insist that literals *must* be explicitly
> >>  datatyped, then we should impose this as an explicit syntactic
> >>  constraint in the very syntax of the language.)
> >
> >In principle, I agree. However, if we stick a single type to each
> >literal we won't be able to deal with the cases where multiple literals
> >are required to determine the data value unambiguously
> >
> >_x rdf:type ComplexNumber
> >_x realDecimal "1.0"
> >_x imaginaryDecimal "2.0"
> >
> >as indicated in
> >http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Nov/0103.html
> 
> OK, though this kind of example seems to have only just surfaced and
> I am not sure I like it. Would one call realDecimal a datatype?? I
> don't think I would. It seems to be a kind of product of a datatype
> mapping and a selector. (Are there any examples like this in XML
> schema?)

I don't like the example very much either (as applied to complex
numbers) and I wrote a disclaimer about it in the above-cited posting.
The point I'm making is that there are cases in which the interpretation
of a node like _x above is determined by several literals and not just
using a single literal. For example, consider

_y firstName "Sergey"
_y lastName "Melnik"
_y monthOfBirth "July"
_y dayOfBirth "Tuesday"

Above, the interpretation of _y is (with a high probability) uniquely
determined by the four statements. Would one call any of the four
properties datatyping mappings? Partial mappings, at most. However,
their conjunction fulfils a purpose quite similar to that of a
datatyping mapping: associating a lexical token (here, a set of lexical
tokens) with a data value (here, with a person). Of course, there is
nothing in the above example that makes it specific to datatyping.

> If we are going to use bnodes, I would rather write this as:
> 
> _x rdf:type ComplexNumber
> _x realPart _:y1
> _x imaginaryPart _:y2
> _:y1 xsd:number "1.0"
> _:y2 xsd:number "2.0"

I'd prefer this representation too, not doubt.

> >....>
> >>  Suppose I know that some property is represented by a literal "Y2F0",
> >>  but have (as yet) no information about the appropriate datatyping to
> >>  be used to interpret that literal. How would you represent that state
> >>  of information?
> >
> >Oh, I think this leads us into inferencing, which seems to work just
> >fine in both approaches. In the above case, you might assert:
> >
> >_n propertyIDontKnowAnythingAbout "Y2F0"
> 
> ??But now you seem to have broken your own rules. The proposal, as I
> understood it, was that this would be syntactically illegal: the only
> place a literal label can occur is at the sharp end of an arc
> labelled with a datatype name.

Oh, that's a misunderstanding. From your problem description I thought
_n would stand for a literal like "cat". So the starting point is

_tom label _:1
_:1  propertyIDontKnowAnythingAbout "Y2F0"

where _:1 would be "cat", right?

Then, still the same rule (see comment about rules below) applies:

(X propertyIDontKnowAnythingAbout Y)
  -> (X base64 Y)

when you "discover" that propertyIDontKnowAnythingAbout is in fact
base64.
 
[...]

> You have to write it in RDF. (If we could write Prolog then all our
> problems would be solved. :-)

That's what we do when we define model-theoretic interpretations anyway,
don't we? What about Sec. 6 "RDFS-entailment and RDFS closures" in the
MT draft? To quote: "Apply the following rules recursively to generate
all legal RDF triples ..." I agree, you'd need to represent rules in RDF
somehow, and to make sense of them (using model-theoretic "meta-rules").

How else would you "discover" something if not by finding RDF-encoded
rules? Aren't subClassOf and subPropertyOf something else than
RDF-encoded macro-rules? I think a rule language is nothing but a fairly
general-purpose schema language. Sure, providing model-theoretic
semantics for even a very limited rule language is a much bigger pain
that doing so for RDF Schema's vocabulary. But, in principle, it's a
similar task.

(BTW, would not it be worth the pain to have an RDF-based rule language
with model-theoretic semantics? Then, no special model-theoretic
definitions for subClassOf and subPropertyOf would be required, you'd
just use rules to describe them. Don't get me wrong, I'm not advocating
my any means that RDFCore WG should get into the business of defining a
rule language. But another WG probably will.)

[...]
> >I guess we can argue about it, but this discourse is more of a
> >philosophical nature. There are many reasons to reduce the impact of
> >schemas on instance data. Evolution of schemas (that break instance
> >data) and archival purposes are probably the most obvious ones. Another
> >crucial issue is that the developers often do not properly understand
> >the semantics of the (graph) encodings that they design. Part of the
> >problem is, of course, that they do not even bother to develop a schema
> >for their data, let alone to capture it in a machine-readable form...
> >
> >>  If we start declaring that certain kinds of information
> >>  *must* be linked to  others, or *can only* be inferred in a certain
> >>  sort of way, we might as well use Java.
> >
> >I don't think that's a pro argument, Pat. (_x size "10") *can only* be
> >processed correctly if the tool got the schema that defines the property
> >"size" and understands the schema language, right?
> 
> Yes, of course, but that's not the issue.
> 
> There are two pieces of information needed to interpret a literal:
> the literal label itself, and the datatype mapping that is supposed
> to be used to interpret it. All this discussion has been about how
> these two different pieces of information can be encoded in some form
> that allows them to be brought together reliably. One way is to just
> insist that they always occur together in some sense, either by
> making a kind of composite label out of them, or by providing  a
> bnode to which the two kinds of information must be attached in a
> certain way. The MT extension, in contrast, allows them to be
> separated and treated as separate assertions, and uses RDFS inference
> machinery to make the connection between them.

Objection, your honor! ;) Let's wait until we've seen a couple of
examples of defining datatypes in the MT extension. My feeling is more
that RDFS inference machinery would be required, i.e., you wouldn't be
done by defining some denotational semantics for a property or a class,
but a more substantial tweaking of the MT would be needed each time you
introduce a new built-in datatype.

> It seems to me that
> this flexibility is more desirable, and more in the RDF spirit, than
> any scheme which imposes strict syntactic constraints on the form of
> RDF graphs, and which breaks if those constraints are violated. That
> is what I meant by the above remark.
> 
> >
> >>  I would agree that IF the information is available locally then of
> >>  course it should be possible to use it locally. But the reasoners
> >>  should also be able to function even when all the information is not
> >>  available locally; they should not barf just because the information
> >>  provided is incomplete.
> >
> >I looks to me that our discussion is probably even more metaphysical
> >than I originally thought. Let me try to put a different spin on your
> >approach (of course, this is not the way you see it, I understand),
> >which I believe allows all intefencing to work just the way you'd expect
> >it. Assume that
> >
> >_x size "10"
> >
> >asserts there is a certain relationship between I(_x) and the literal
> >value I("10"). (This is the `straightforward' interpretation). In other
> >words, the property `size' with a literal "10" hanging off it restricts
> >the number of valid interpretations of _x.
> 
> It does if the interpretation is a datatype interpretation, right.
> 
> >Now, imagine that there is a
> >rule somewhere, in some schema that "breathes in life" into the above
> >statement:
> >
> >(X size L)
> >   ->  exists N: (X shoeSize N),
> >                 (N rdf:type Integer),
> >                 (N xsd:int  L) .
> 
> ? What are you talking about? Such rules aren't expressible in RDF.

Not yet, of course. But the above rule is nothing more "dangerous" (in
the sense that it breaks MT) than giving rdfs:subClassOf transitive
semantics.

> >In this light, the original statement (_x size "10") can be viewed as a
> >syntactic construction,
> 
> It IS a syntactic construction. In fact it is an RDF graph.
> 
> >which is interpreted using a inference rule into
> >something that has an adequate semantic interpretation.
> 
> But it is an adequate semantic representation already.

In P scheme it is. In S scheme it also has an interpretation, although a
less meaningful one.

> It is
> perfectly meaningful (if ambiguous), and when suitably conjoined
> with, or extended by, appropriate information about the datatype, and
> when interpreted in way that conforms to the datatype semantics ,
> then it says exactly what one would expect it to say, viz. that x's
> shoe size is ten.
> 
> Even if you don't like the semantic account, that first triple is
> perfectly well-formed RDF; so we should, in all conscience, either
> undertake to say what it means, or make it illegal.

I don't think there is a way and need to make a valid piece of RDF
illegal. It's just that it is bad modeling style in S scheme. It is also
a bad style not to provide machine-readable definitions for
vocabularies...

> I am still not
> quite clear what your attitude to this point is; do you intend that
> 'rule' to be a kind of syntactic transformation or constraint on RDF
> graphs?

To be (maybe unnecessarily too) frank, I don't think rules are going to
work on the scale of the SW, at least in short term. On the theoretical
account, I think a rule language is just a fancy schema language, and
can be given an appropriate model-theoretic interpretation just as
rdfs:subClassOf.

> What if I write a graph and refuse to apply the rule: have I
> made an error of some kind? What kind?

What if I write a graph and refuse to make a transitive closure on
subClassOf: have I made an error? The answer to this question is the
answer to your question ;)

> >In particular,
> >the property `shoeSize' would connect a shoe with an integer, rather
> >than with a literal value.
> 
> If the datatype is chosen appropriately (eg xsd:integer), the literal
> value of the literal "10" IS an integer. In fact it is ten.
> 
> >
> >I think that both approaches on the table are equivalent in the sense
> >that they can ultimately provide a very similar high-level
> >interpretation to any given piece of RDF instance data, although using
> >quite different schemas and a different perspective. My feeling is,
> >however, that by giving a reasonable (not straightforward)
> >model-theoretic interpretation to (_x size "10") you finesse the fact
> >that this statement is "syntactic matter" that needs further
> >explanation, i.e., my means of rules.
> 
> I don't finesse it, I deny it outright. It is not syntactic in any
> way: it says that _x's shoe size is a literal value. Until we know
> what datatyping scheme to use, we do not know which literal value,

You are saying that as long as there is no schema it's a literal value.
Once a schema is there, it becomes a data value. To satisfy
monotonicity, all data values must then necessarily be literal values,
right?

> so
> it is of course ambiguous taken in isolation; but then so is much of
> RDF, by its very nature as an assertional language. But I don't see
> that triple as being more "syntactic" than a triple consisting
> entirely of urirefs.
> 
> >
> >The two reasons I am hesitant to buy your suggestion are that a) it
> >reminds me of taking a random piece of XML and "interpreting" it as RDF
> >using rule-based transformations,
> 
> I fail to follow this point.

In other words, given an arbitrarily confusing RDF representation, you
can make sense of it, leveraging the fact that literals can potentially
denote anything. Thus, for the example below

_b dc:Creator "Melnik"
_x name       "Sergey"

you could have an interpretation in which literal "Melnik" and bNode _x
denote the same thing, i.e. myself, but literal "Sergey" denotes my
first name, or a string. It just strikes me as too much of a stretch.

> >and b) it transfigures the model
> >theory in such a way that I (and maybe others with similarly limited
> >mental abilities) have hard times understanding it - to the contrary of
> >my belief that MT is there to help clarify things.
> 
> Well, I confess that I have not done a very good job of explaining
> the idea intuitively, but it really is not very complicated once you
> get used to it.

It's our job to make this idea more clear by working through some test
cases of defining a couple of built-in datatypes using the MT extension.

> It doesn't change any of the rest of the model
> theory, by the way: the extra machinery only comes into play when
> literals are around.

Don't forget the change in the abstract syntax - the MT extension relies
on untidy graphs, right?

> And I would say that while the MT is there to
> help clarify things, it does that by being precise, rather than by
> being simple.

I agree. Unfortunately, by being just precise (and not simple) is helps
only a few intellectually gifted people.

Sergey
Received on Tuesday, 6 November 2001 21:47:03 UTC