Re: worries about useMentionOp and how queries relate to rules and proofs from Pat Hayes on 2005-02-05 (www-archive@w3.org from February 2005)

From: Pat Hayes <phayes@ihmc.us>
Date: Sat, 5 Feb 2005 10:29:19 -0600
To: Eric Prud'hommeaux <eric@w3.org>
Cc: Dan Connolly <connolly@w3.org>, www-archive@w3.org, Jos De Roo <jos.deroo@agfa.com>
Message-Id: <p0620072dbe2a97d412a2@[10.100.0.8]>
>On Fri, Feb 04, 2005 at 01:28:12PM -0600, Pat Hayes wrote:
>>  >
>>  >
>>  >Let's add some data to query (I think just querying the schema info
>>  >slightly obscures the problem):
>>  >
>>  >	_:somebody	  foaf:homePage <dansHomePage#topic>.
>>  >> >	<dansHomePage#topic> owl:sameAs _:somebody .
>>  >
>>  >and ask a question:
>>  >	CONSTRUCT *
>>  >	    WHERE { (?who foaf:homePage <dansHomePage#topic>) }
>>  >
>>  >A complete OWL reasoner must know that
>>  >	<dansHomePage#topic> foaf:homePage <dansHomePage#topic>.
>>  >Just for giggles, it could add
>>  >	_:x foaf:homePage <dansHomePage#topic>.
>>  >	_:y foaf:homePage <dansHomePage#topic>.
>>  >	_:y foaf:homePage <dansHomePage#topic>.
>>  >	_:z foaf:homePage <dansHomePage#topic>.
>>
>>  Well, it could, yes. So could an RDF reasoner, for that matter, since
>>  an inference to change a blanknodeID is valid in any version of RDF.
>>  But this particular example is artificial, since according to the
>>  SPARQL scoping rules these answers are all the same answer, so this
>>  engine is just repeating itself.
>>
>>  >Does anything but a query like
>>  >	CONSTRUCT *
>>  >	    WHERE { (?who foaf:homePage <dansHomePage#topic>) }
>>  >	      AND isURI(?who)
>>  >
>>  >keep the reasoner from reporting an endless series of equivlient bNode
>>  >solutions?
>>
>>  I think the SPARQL rules already stop that, if you say that you don't
>>  want repeated answers.  But look, whats to stop the answering engine
>>  from inventing a string of URIs like mine:uri1 mine:uri2... and adding
>>
>>  mine:uri1 owl:sameAs <dansHomePage#topic> .
>>  mine:uri2 owl:sameAs mine:uri1 .
>>  etc.
>>
>>  to the graph? There is no way to be totally secure against getting
>>  silly stuff back from a truly brain-damaged, or maybe malicious,
>>  answering engine.
>>
>>  >Is it logically equivilent to substitute a bNodes for any
>>  >URI in the graph?  It seems that OWL would not worry about this
>>  >limitless enumeration.
>>
>>  Yes, and indeed it should not, and almost any real OWL or RDF
>>  answering engine would not do this (though it might accidentally
>>  repeat itself when using a large graph, if the graph contains
>>  redundancies.)
>>
>>  > > >and by definition
>>  >> >	isURI(<dansHomePage#topic>)
>>  >> >but not
>>  >> >	isURI(_:somebody)
>>  >
>>  >If isURI is a constraint on a set of bindings of nodes/literals to
>>  >variables, and each node/literal is only one of URI, bNode, Literal,
>>  >then it seems like we're fine. If owl:sameAs makes some node both a
>>  >URI and a bNode, then I don't understand owl:sameAs (a definite
>>  >possibility).
>>
>>  Well, not sure what you mean by 'makes some node both'. One can
>>  assert a sameAs between a bnode and a URI, as you did above. This
>>  isnt a problem, surely.
>
>Since I think this is the crux of this issue, I will expound...
>
>SPARQL queries over RDF data, so any data that isn't expressed in
>triples is not our problem.

Agreed.

>
>(1)	_:somebody        foaf:homePage <dansHomePage>
>results in a single triple with a bNode subject.
>
>(2)	<dansHomePage#topic> owl:sameAs _:somebody .
>results in the obvious triple. I'm trying to see what comes out of
>OWL closure over this data. I believe it is only
>
>(3)	<dansHomePage#topic> foaf:homePage <dansHomePage>
>which makes (1) redundant and forgettable.

Right, well put.

>If, however, I'm confused,
>and OWL inference changes the _somebody node

No, inference engines never change anything: they only add things. So 
the OWL engine might add 3 to {1,2}, but it won't change (or indeed, 
if it is strictly an inference engine) delete either 1 or 2.

>to be both a bNode and
>the resource <dansHomePage#topic>, then SPARQL is oversimplifying
>the data model over which it queries.

But it isn't. A node can't be both a bnode and a URIref. Those are 
exclusive syntactic categories in RDF.

>
>Expressing this SPARQL query in RDF may bring up use/mention issues,
>but I don't see how asking the question about what note types are in
>the RDF graph runs the risk of confusing a reference to a node with
>the node itself.

That question doesn't do it, but what does (arguably) do it is asking 
a query about the node type rather than expressing the query as an 
RDF pattern

>  > >Also, I'm not sure why this is a use/mention problem rather than a
>>  >potential over-simplification of the RDF model.
>>
>>  The 'predicate' in isURI(x) refers to the syntax of the expression
>>  substituted for the variable, not to whatever that expression
>>  denotes. To this extent it is semantically different from an RDF
>>  pattern with a variable in it. Its meta-RDF rather than RDF.
>
>Agreed. Just as SQL steps outside of relational algebra (UNIQUE, GROUP
>BY), I'm happy doing that in SPARQL.

OK, but I think that is the nub of the issue for Dan C.

>For fun, let's imagine the predicates log:isURI and log:isBnode.

Well, hang on, I have trouble imagining that. If this really is a 
PREDICATE then its truth-value is determined by the denotation of its 
arguments, not the form of its arguments. To illustrate, here's a 
logically valid inference pattern, expressed in RDF:

aaa bbb ccc .
|=
_:x bbb ccc .

In other words, if A is B'd to C, then something is B'd to C. Now, 
however, try this with log:isURI:

aaa rdf:type log:isURI
|=?=
_:x rdf:type log:isURI

or maybe

aaa log:isURI 1
|=?=
_:x log:isURI 1

Seems to me that if log:isURI really is a predicate, this ought to be 
true: after all, it's the inference from 'A is a URI' to 'something 
is a URI', which seems hard to argue with. But I bet that with what 
you intended log:isURI to mean, the conclusion here would be false, 
right? Because you don't mean its a logical predicate over things 
denoted, you mean its a meta-predicate over things displayed in the 
triple itself, ie a predicate on the syntax rather than a predicate 
on the meaning. Mention instead of use.

I don't mean to imply that meta-predicates like this are incoherent, 
but they certainly do behave differently in reasoning than normal 
predicates: they don't even obey the same logical rules (unless you 
make the quotation explicit). So mixing them freely with normal 
predicates with no, er, protection, quickly gets things very confused 
(see: http://fisher.osu.edu/~tomassini_1/whotext.html.)

>(1)	_:somebody        foaf:homePage <dansHomePage> .
>+	_:somebody        log:isBnode   1 .
>+	foaf:homePath     log:isURI     1 .
>+	<dansHomePage>    log:isURI     1 .
>
>(2)	<dansHomePage#topic> owl:sameAs _:somebody .
>+	<dansHomePage#topic> log:isURI     1 .
>+	owl:smaeAs           log:isURI     1 .
>
>OWL inference:
>(3)	<dansHomePage#topic> foaf:homePage <dansHomePage>

Ah, but now there are also some others things in the closure, 
including all the existential generalizations on URIrefs:

_:y owl:sameAs _:somebody .
_:y log:isURI 1 .
_:y foaf:homePage <dansHomePage> .

(subgraph derived from (2) and (3) by RDF rule SE2 with _;y allocated 
to <dansHomePage#topic>, cf. http://www.w3.org/TR/rdf-mt/#simpleRules)

>
>{ ?who foaf:homePage ?page.
>   ?who log:isURI 1 } => { (?who ?page) a answer }
>would give you
>   ( <dansHomePage#topic> <dansHomePage> ) a answer }

Sure, but it will also give you what you don't want, if you apply it 
to the actual closure, because you will get a bnode binding for ?who.

Now, I know Im being deliberately awkward here, since you could 
insist on a kind of limited closure which does not introduce bnodes. 
That might work, for a while. But some inference engines need to 
generate bnodes (applying rules SE1 and 2, in effect) to get to 
perfectly legitimate conclusions; and it seems kind of tacky to say 
that they shouldnt do this when it is perfectly valid, and they would 
be conformant in doing it. (Note, Im not here talking about silly 
repeats of this kind of inference, as in your first example. Just 
doing it once screws up log:isURI.)

However, if you were to go to the other extreme, and say that you 
wanted to target the query against the actual raw RDF graph, with NO 
inferences or alterations or additions done to it, then I think this 
kind of query could make sense; its just a graph-matching query, and 
things like isBnode and isURI can be checked in any particular graph 
as syntactic conditions on variable bindings. But you just have to 
keep in mind that things all fall to pieces if you try to think of 
these as genuine RDF predicates, or when you mix these kinds of query 
with inference. Same applies to UNSAID, which makes perfect sense as 
a direct graph query, but totally screws up inference (and is screwed 
up by it).

>
>For that monotonic fuzzy feeling,
>{ ?who foaf:homePage ?page.
>   ?who log:isBnode 1 } => { (?who ?page) a answer }
>*could* give you
>   ( _:1 <dansHomePage> ) a answer }
>
>because <dansHomePage#topic> foaf:homePage <dansHomePage>
>implies _:1                  foaf:homePage <dansHomePage>
>
>Eliminating logically redundant bNodes could have a parallel operation
>that looks for things that could be bNodes on the matching side. The
>use cases aren't well-served by this extra inference. The one I see all
>the time is the variance in the object of dc:creator.
>
>	<annot1>   dc:creator    <dansHomePage#topic>.
>	<annot2>   dc:creator    _:creator2.
>	_:creator2 rdf:type      foaf:Person.
>	_:creator2 foaf:homePage <dansHomePage#topic>.
>
>If I were looking for the pages where the creator had put in some 
>sort of structure to describe themselves, I would be tempted to ask 
>cwm:
>
>{ ?page dc:creator ?creator.
>   ?creator log:isBnode 1. } => { ( ?page ) a answer. }
>
>at which point the query engine dutifully infers that both could be
>considered bNodes:
>
>	( <annot1> ) a answer.
>	( <annot2> ) a answer.
>
>Fat load of good that did me.

Well, true, but Im tempted to ask in reply, whose fault is that? The 
reasoner isn't doing anything illogical or wrong; you just aren't 
talking to it in its language.

>In the end, I think I either want to
>pull isBnode out of SPARQL because it imposes a peculiar inference
>burden and gives answers that people won't expect, or, put in some
>text that says that the constraints clause is a filter on results

Nice phrasing

>,
>does not imply inference, is non-monotonic, is fattening, leads to
>heart disease, etc.

Right, that's what I think we should do. Its like the difference 
between banning poisons, or insisting on putting them in clearly 
labelled bottles. I'm for the latter. What we definitely shouldn't 
do, however, is pretend they are food.

Pat
-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Saturday, 5 February 2005 16:28:52 UTC