Re: Various result forms from Pat Hayes on 2004-05-05 (public-rdf-dawg@w3.org from April to June 2004)

From: Pat Hayes <phayes@ihmc.us>
Date: Wed, 5 May 2004 13:30:04 -0500
To: "Seaborne, Andy" <andy.seaborne@hp.com>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p06001f02bcbeb5f96f50@[10.0.100.76]>
>To try to put requirements 3.2 (Variable Binding Results) and 3.5 (Subgraph
>Results) in some kind of context ...

Comments added below from DQL perspective.

>There are a number of result forms that people have used or suggested:  I
>know of:
>
>1/ Variable bindings
>    Common for the case of get data out of RDF

Obtained by answer pattern in form of a vector of variables. Could be 
done by using XML with <qvar /> tags.

>2/ Result set in RDF
>    As 1/ - encoded in RDF

Obtained by answer pattern in form of some lexicalization of RDF with 
variables inserted. Could be done with RDF/XML with extra tags for 
variables.

>3/ RDF=>XHTML/XML
>
>4/ Subgraph extraction:
>    Actually, two forms:
>4a/ The query pattern with variables substituted for each
>     solutions and result merged.

answer pattern = query pattern (default in DQL)

>  Reexecuting the query
>     gives the same results.

Well, not necessarily. Even assuming the source graph is stable 
between queries, this would now be a yes/no query (without variables) 
, so cannot strictly produce the same result.

>4b/ The triples from the graph that were matched.
>     Would include, for example, the subclass resource
>     when asked (?x rdf:type x:superclass).

Strictly speaking, subclass inheritance reasoning is part of RDFS, 
not RDF. This raises an issue: are we talking here about RDF or 
RDF+RDFS ? We need to get quite clear on this, seems to me, since if 
we expect this to be 'open-ended' in what counts as suitable backup 
inference for an RDF query, then RDF querying is indistinguishable 
from OWL querying.

One of the issues we punted on when designing DQL was whether it made 
sense to return the 'justification' of an answer. In general, this 
can get *extremely* complicated: the only general form has to be 
something like a proof or derivation of the answer from the Kbase, 
which can get arbitrarily long and hairy, and for some inference 
engines (eg tableaux reasoners) may not even be well-defined. Anyway, 
this is clearly a research issue. I'd strongly suggest that we back 
off from this as a requirement, if we expect to finish the WG work 
this century.  It seems easy for RDF only because RDF itself is so 
simple, particularly semantically: but it starts getting hairy almost 
immediately. Even allowing typed literals makes things complicated, 
and when you get things like subPropertyOf in the mix, the conclusion 
chains get tricky to follow, eg see the examples in the text at
http://www.w3.org/TR/2004/REC-rdf-mt-20040210/#RDFSRules.

>4a == 4b for an RDF graph with no inference processing.

But what does that mean? The graph is just what it is, the inference 
comes from the semantics you assume when you look at it through RDFS 
or OWL (or whatever) colored glasses.

>5/ RDF => RDF
>    Templating - a generalisation of 4/ where a template (RDF graph with
>variables in it) is used to create new RDF at the server.  At the F2F this
>was voted against as a requirement.

I wish I had been at the F2F, as this sounds like the DQL idea of an 
answer pattern. (Though we didnt require it to be RDF, and I see no 
reason why it should be in particular: it should just be a string 
with some variables in it, or maybe some XML with <qvar> tags in it.)

Why do we need to specify the answer format? If we allow users to 
specify it (and maybe provide a handy default for lazy users or 
beginners) then we avoid quarrels and add to interoperability.

>Getting information out of RDF directly is 1,2,3.  Part of a larger
>processing system (distributed) is 4 & 5.
>
>
>1/ is about the problem of getting information (node and arc labels) out of
>RDF; 2/ is A way of recording 1/.  I have used 2/ to give access to query
>languages from (other language) toolkits that have no query capability.  I
>execute the query remotely (all it takes is to pass a string from
>application to server - the client toolkit does not need to understand the
>QL

Yes, quite.

>) and use the result set format [1] as the transfer syntax.  That's
>convenient because the client toolkit can parse and work with the returned
>RDF.  Having the client requirements simple can, for small devices, take
>many forms - this is one of them.
>
>3/ is important for the display of information directly from RDF sources.
>Using XQuery/XSLT/etc looks to be useful (practical, utilizes programmer
>skills, builds on existing work, what people expect, ...).

Right. Seems to me that we could reasonably require that answer 
patterns are legal XML , and must use our specified markup for our 
variables. Then users can easily test for unbound variables in an 
answer, for example, but this allows answers to be formatted as 
RDF/XML or as something close to plain text strings, or anything in 
between. It would also allow answers to be things like XHTML as well 
as RSS.

>
>4 & 5 are about getting some RDF out of another (larger, remote) RDF
>dataset.  The results would be further manipulated before going to the user,
>and that includes passing in on to other machines where the final
>destination of user/application is not the one making the query; instead the
>extracted subgraph is sent on to other places. This is RDF=>RDF, for
>example, passing around RSS entries.  The general requirement is that part
>of a large, remote target graph is extracted and deliver for further, local
>processing.
>
>For 4/, examples include the "tell me about" queries and the use of the
>pattern of query to define the subgraph.  In fact, 4a gives an alternative
>way of approaching the example above if the client toolkit does have the QL.
>Re-execution is much, much cheaper, essentially as there are so few negative
>search branches to follow.

But why would one ever need to re-execute, if you already have the 
answer to the query? (what am I missing here?)

Pat


>
>	Andy
>
>[1] http://www.w3.org/2003/03/rdfqr-tests/recording-query-results.html


-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 5 May 2004 14:30:40 UTC