Re: RDF Semantics, non-lean RDF graphs, and redundancy of content from pat hayes on 2003-12-05 (www-rdf-comments@w3.org from October to December 2003)

From: pat hayes <phayes@ihmc.us>
Date: Fri, 5 Dec 2003 12:14:06 -0600
To: Ossi Nykänen <onykane@butler.cc.tut.fi>
Cc: www-rdf-comments@w3.org
Message-Id: <p06001f0abbf6623da09a@[10.1.31.1]>
>  > ...
>>  Hope the above helps.
>>
>>  Pat Hayes
>
>Thank you for the clarification.
>
>It seems thus fair to say that the notion of redundancy is used only in
>the sense of the formal entailment, which is of course the intent of the
>RDF Semantics.

Yes, quite.

>
>However, considering the modelling perspective (in the RDF sense, not in
>the sense of an interpretation theory), asserting
>
>G1: {
>ex:pat rdf:type ex:human .
>_:x rdf:type ex:
>}
>
>might seem intuitively stronger than asserting
>
>G2: {
>ex:pat rdf:type ex:human .
>}
>
>even if G2 formally allows inferring G1. This is not due to entailment but
>to "truth" but a potential world which models G1.

I guess I do not follow you here. The point of entailment is exactly 
that A entails B whenever B is true in any world where A is true. You 
cannot make a world which models A without B also being modelled by 
it. That is what 'true' means in this context: B is true in I *means* 
that I models B.

>Of course, RDF Semantics
>is a vehicle for analysing valid inference, but still.
>
>The reason is that asserting facts might be interpreted from an
>"economical perspective" (as few assertions as possible).

Ah, but that would be a different semantics. Under that assumption, 
if something is not asserted then you are entitled to conclude that 
it is false: negation-as-failure, as Prologgers call it.

>An agent
>asserting G1 might assert redundant information due a simple bookkeeping
>mistake, or (in the absence of negation) it might try to say that "Pat is
>a human, but there exists also something else (which I can't or am
>unwilling to identify) that is/are human."

The key word in this example is "else". There is no way to say this 
in RDFS (in OWL there is). If you can refer to two things being 
different, then indeed the assertion that "Pat is human and some 
*other* thing, different from Pat, is also human" indeed is not 
entailed by "Pat is human", and the second part of the longer 
assertion would not be redundant (and if you were to encode this 
'other' as part of the  RDF graph using OWL terminology, the graph 
would be lean).  But that is not what G1 asserts: the idea that two 
distinct nodes denote distinct things is not part of the RDF 
semantics.  G1 says that Pat is human and something (not something 
*else*) is human:  here, the second statement is clearly redundant, 
since Pat already provides evidence that something is human.

>In FOL (with identity), this could be expressed

with identity and negation, neither of which are available in RDFS.

>as (excuse the syntax):
>
>F1 = { type(pat,human), exists X: ( type(X,human) and not(X=pat))  }
>
>Obviously, F1 can not be formulated in RDF (without ontologies); all you
>can do is F2 (G1):
>
>F2 = { type(pat,human), exists X: type(X,human) }
>
>The "closest one can get" in RDF (without ontologies) is noticing that,
>assuming M is a model of G2, one can (for some M) invent a "bad" mapping A
>for the blank node _:x in G1 so that G1 is not true.

No, you cannot do that. The blank node is not available for being 
assigned a mapping: it is not a free variable in G1, but 
existentially quantified there.. Its like a pronoun, not like an 
unknown noun.  G1 says "... and SOMEthing is human", not "...and _:x 
is human", where _:x is some unknown name whose meaning can be 
specified by some external agency.  That is why one should not think 
of a blank node as an unknown URI or an unknown *name*.

For an excellent introduction to the difference between a pronoun and 
an unknown name, see http://www.baseball-almanac.com/humor4.shtml. 
Or, imagine finding a piece of paper that says "Arthur Errikson is 
human" and not knowing who Arthur Errikson is. This is not the same 
as finding a piece of paper that says "Someone is human", because in 
the first case, but not the second, it could be false: maybe, for all 
you know, 'Arthur Errikson' is the name of a duck.

>Obviously, from the entailments' point of view, this is not a big deal
>since there exists also a mapping (leading to the lean graph in our case)
>which is sufficient for the entailment.
>
>On the other hand, assuming for a while that an agent would like to
>analyse the "truth" (possible models/worlds) of a given RDF graph (e.g. to
>match a procedural schema, script, or whatever, to initiate a behaviour),
>it's tempting to say (from the "economical perspective") that G1 encodes
>more information than G2 (to base the behaviour on).

Tempting; but dangerous, because false. Conforming RDFS engines 
should not use RDF in this way (ie assuming that blank nodes mean 
'something else' or can be treated as unknown names which you might 
find more about later) and should not draw conclusions based on this 
kind of reasoning. Well, that is, they can of course do what they 
like, but if they do so then they go beyond the RDF specs, and are on 
their own. No guarantees are provided by RDF that any conclusions so 
generated will be correct. In  many cases they will be sharply 
incorrect, and may have potentially dangerously entailments.

>The point is that in practice, one can not assume that an agent with
>limited resources would be able to perform all possible entailments before
>such a matching (consider a more complex example).

We are not making that assumption. In fact, this point is exactly the 
primary reason why RDFS entailment has been designed to be monotonic, 
since only a monotonic system is secure against the addition of 
knowledge after inferences have been made from partial knowledge. The 
kind of semantics you are contemplating here is non-monotonic.

>In other words, G1
>encodes more information indirectly, not because it has more formal
>entailment potential in it, but because it selects a useful entailment

Oh, I see. You are now assuming only that G1 is more useful because 
it contains some entailments pre-made, as it were (like mathematical 
lemmas) ? Not that it says anything more about the world than G2 
says.  Then yes, of course, as a pragmatical point I will agree that 
may be true; though in this particular case I doubt it is of much 
utility since any unification or matching process that would match 
the second triple would also match the first one, so the second 
triple is even redundant in a practical sense for most reasoners.

>(obviously, the syntax doesn't capture this intention): Enlisting theorems
>in mathematics is useful even if, in principle, the are implicitly present
>in the definitions (well, at least in the ideal case). And it is
>particularly useful in Prolog or rule-based systems, to speed up
>inference.

Actually it will simply add redundancy to the backtracking in a 
Prolog-style search, so will likely slow things down.


>  >From the logical perspective, this line of thinking is a bit unorthodox,
>of course, but I do wonder if people writing RDF assertions really are
>logically-oriented.
>
>However, I do believe that in order to prevent SW from semantic
>fragmentation, W3C ought to publish an easy-to-read recommendation for
>simple RDF modelling (the RDF Primer doesn't really touch this issue [for
>scope, of course]).  This could include "safety levels" such as simple
>assertions (ok to do even if not quite familiar with RDF Semantics),
>assertions with blank nodes ("neutral structure slots" versus
>"denotations")

I don't follow that parenthesis.  Blank nodes are like the word 
'something' in English: is this really a hard idea to grasp?

>, assertions using the RDF vocabulary (e.g. interpretation
>of bags), assertions of terminology (feasibility considerations), etc. A
>sort of "know-what-you-are-asserting" guide (see my arms swinging? ;) I
>would be among the first people wanting to read it (obviously, since I
>have all these questions...).

It seems to me to be about as simple as it can get.  Each subject or 
object indicates the existence of something, sometimes by name, 
sometimes not: each triple asserts that a property holds between two 
things. Literals have some special rules, as literals often do. That 
is really all there is to it: no tricks, nothing hidden up the 
sleeves.

>But, as said in the beginning, your answer is sufficient. The above line
>of thinking is not the problem of RDF Semantics.

Hope the above may be of some further use, in any case.

Pat Hayes



-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 5 December 2003 13:14:15 UTC