RE: Objective 4.6 -- additional semantic information from Jim Hendler on 2004-06-10 (public-rdf-dawg@w3.org from April to June 2004)

From: Jim Hendler <hendler@cs.umd.edu>
Date: Wed, 9 Jun 2004 20:27:12 -0400
To: "Rob Shearer" <Rob.Shearer@networkinference.com>
Cc: "RDF Data Access Working Group" <public-rdf-dawg@w3.org>
Message-Id: <p06110490bced57454506@[10.0.1.2]>
At 16:59 -0700 6/9/04, Rob Shearer wrote:
>>  Rob - I've been suspecting you and I have been misunderstanding one
>>  another on this, now I'm sure of it.  Not quite sure where we're off,
>>  so let me ask two questions
>>     1 -  in your opinion, what does objective 4.6 add -- that
>>  is, if we
>>  accomplished this objective what would be in the language that isn't
>>  there in the design that doesn't include it?
>
>I think that a very valid objective is to try to make this query
>language the basis of querying for the entire semantic stack, which uses
>RDF as its data model. All these other languages will do nothing but
>describe RDF data models (without necessarily explicitly instantiating
>those data models), so it makes sense for a query language which asks
>questions about RDF data models to be relevent to all those other
>languages.
>
>>  Are you implying below
>>  that if we don't reach the objective it will be "wrong" for a query
>>  of an RDFS store to return the ?x = c answer below?
>
>There is a very real sense in which 'c' is *not* an answer to your
>query. If you're just querying the RDF (and not employing RDFS) then
>returning 'c' would be incorrect.
>In your simple example, there is another "virtual" RDF graph in which
>'c' is an answer. To be honest, I don't think the standard needs to
>address this concern; constructing a virtual graph, or writing an
>implementation which can query such a virtual graph without fully
>constructing it, is beyond the group's scope (in my opinion, of course,
>but I think this was the intent of the charter).
>
>However, there are cases (one of which I have presented here) in which
>there is knowledge about an RDF graph which can't simply be represented
>in a new virtual graph. Knowing that our language can be used as a basis
>for querying this knowledge would certainly go a long way toward
>justifying "yet another semantic web standard", as some have summarized
>our goals.

I think you missed my point - I was asking what you think objective 
4.6 adds - from the above I would assume you advocate removing this 
objective, is that right?  If not, what would it mean to have what 
4.6 says but not to have the ability to do RDFS or OWL inferencing?

>
>>    2 - .  The UC&R doesn't have any use case that seems to really
>>  motivate disjunctive queries and the charter says that the
>>  expressivity (using disjunct as an example) will be determined by use
>>  cases.
>
>I must admit that upon re-reading the UC&R doc I am a bit surprised that
>disjunction has fallen off the radar. I certainly think users like being
>able to form arbitrary boolean constructions.
>
>In truth, we had a few use cases which went did demonstrate this
>functionality, but they don't seem to have made it into the doc. I
>certainly don't think that document should be viewed as a comprehensive
>list, but rather as a starting point and a milestone in both our
>internal discussion and our interaction with outside groups.
>But as Kendall points out, I can only speak for myself.
>

the current document is certainly just a discussion point, but by 
charter the UC&R document dictates some later decisions about 
tradeoffs will be based on the final version of this document 
(section 1.7) so I take it prettty seriously

>>  In my experience most query languages don't actually handle
>>  disjunctive queries very well (SQL, for example, only allows the OR
>>  in the conditional of a return, which is a very limited kind of OR).
>
>I don't entirely understand this, and I'm curious. I had always rather
>assumed that WHERE clauses could contain arbitrary boolean combinations
>of predicates. Am I misunderstanding you, or am I misunderstanding SQL?

the definition of "conditional of a return" is in a WHERE clause. 
But my understanding of SQL is that you cannot make arbitrary query 
clauses -- you get the binding list and then you process the results 
against the WHERE clause -- i.e. you can say
  Fetch (complex expression, but no disjunction includign A, B, C, etc)
    WHERE
      A = "foo" OR B = "bar" ...



>
>>  I think the reason for this is that a lot of the complex border cases
>>  seem to arise around disjunctive queries (as in the case below) - so
>>  my question is really whether you see this as a trade-off, or if you
>>  take it as a given that a good DA rec would include" (a R b) OR (c S
>>  d)" and thus worry about a lot of these complex cases?
>
>I think the complex cases cases are actually not that difficult in a
>closed-world system. For plain RDF triples, I don't see performing
>queries with 'or' as a problem. In cases where more sophisticated
>languages complicate the issue, the complications only arise when you're
>actually getting the benefit of the new system. If features like
>disjunction are so rarely useful, then I fear a lot of us have wasted a
>lot of time defining whole new languages like OWL and SWRL for
>expressing things that are even more esoteric!

Well, being the person with the claim to the most time spent in the 
development of OWL of anyone in the world, I would certainly not want 
that!  However, I think you are confusing modeling and querying.  I 
can state all sorts of complex things and have a logic engine that 
generates the right answers when queried.  So, for example,  in OWL 
you can often say X and Y are disjoint and everything must be in 
either class X or Class Y.  Thus if I know John is not in Class X 
(perhaps because it is disjoint with some other class John is in), 
and I query for  ClassOf(John, ?x) the system will only return Y -- 
but the query isn't disjunctive just because the knowledge base is. 
As I understand it, many query languages don't have disjunction in 
their statements  because processing complex join trees with 
arbitrary disjunction can be a real bear (esp. with any sort of 
negation).  So in many languages if you want "(x R a) OR (y S b)" you 
would query for "(x R a)" and then have a second query of  "(y S b)" 
-- note that in most relational and OO models this is identical, 
although in deductive databases this is more of a restriction (but 
deductive DBs also have specialized handling of disjunction in the 
presence of negation as I understand it)

Note that I must admit to not being a DB expert, and only having 
taught it a couple of times many years ago, so I may be sadly 
mistaken in the above, and would welcome being corrected by someone 
who knows more about this.

>
>>    thanks
>>    JH
>>  p.s. if either of the above sound hostile or argumentative, please
>>  don't take them that way -- I haven't been privy to the discussions
>>  of the group to date, and am trying to catch up - I'm trying to read
>>  the message log, but there's too much to simply read end to end, so
>>  i'm trying to catch up on the most relevant threads, and thus may
>>  have missed things where the above were discussed.
>
>Shame. Hostile and argumentative work environments are rather my milieu.

:->


>
>>  At 8:47 -0700 6/9/04, Rob Shearer wrote:
>>  >>  Maybe I didn't understand what 4.6 was about, I assumed it was
>>  >>  something like this:
>>  >>
>>  >>  if I have
>>  >>     a rdfs:subClassOf b
>>  >>     b rdfs:subClassOf c
>>  >>
>>  >>  then I was assuming that the requirments in UC&R would
>>  imply that if
>>  >>  I query for
>>  >>     a rdfs:subClassOf ?x
>>  >>  then ?x would be bound to b
>>  >>  however, if we include 4.6, then we would be expected to return
>>  >>     ?x = b and ?x = c
>>  >
>>  >I think it would be wrong to require implementations to
>>  return both 'b'
>>  >and 'c', because that would be *requiring* implementations to process
>>  >RDFS.
>>  >But I think it would also be wrong to forbid returning 'b'
>>  and 'c' from
>>  >queries against some larger knowledgebase. In this case the
>>  problem is
>>  >hard to produce, since most (every?) extra piece of RDFS can
>>  simply be
>>  >encoded as a whole bunch of extra triples in an RDF graph, but that's
>>  >not the case for all extra sources of knowledge.
>>  >
>>  >For example, if the standard were to define disjunction such that the
>>  >predicate "(a R b) OR (c S d)" is defined to return 'true'
>>  if and only
>>  >if "a R b" returns 'true' or "c S d" returns 'true', then it's
>>  >impossible for a standards-compliant processor to return sensible
>>  >results in a situation where the disjunction is a logical
>>  consequence of
>>  >the knowledgebase, yet neither of the individual triples is.
>>  >
>>  >If, however, the standard were written in terms of the
>>  semantics of the
>>  >graphs which match such a disjunctive predicate, then more advanced
>>  >processors could legally return results that actually represented all
>>  >the knowledge at their disposal.
>>  >
>>  >This is the difference between specifying the semantics of the query
>>  >language and specifying its implementation.
>>
>>  --
>>  Professor James Hendler
>>  http://www.cs.umd.edu/users/hendler
>>  Director, Semantic Web and Agent Technologies	  301-405-2696
>>  Maryland Information and Network Dynamics Lab.
>>  301-405-6707 (Fax)
>>  Univ of Maryland, College Park, MD 20742	  240-277-3388 (Cell)
>>
>>

-- 
Professor James Hendler			  http://www.cs.umd.edu/users/hendler
Director, Semantic Web and Agent Technologies	  301-405-2696
Maryland Information and Network Dynamics Lab.	  301-405-6707 (Fax)
Univ of Maryland, College Park, MD 20742	  240-277-3388 (Cell)
Received on Wednesday, 9 June 2004 20:27:35 UTC