RE: REX evaluation from Rob Shearer on 2004-06-12 (public-rdf-dawg@w3.org from April to June 2004)

From: Rob Shearer <Rob.Shearer@networkinference.com>
Date: Sat, 12 Jun 2004 16:43:38 -0700
To: "Seaborne, Andy" <andy.seaborne@hp.com>, "RDF Data Access Working Group" <public-rdf-dawg@w3.org>
Message-ID: <CFE388CECDDB1E43AB1F60136BEB49730280E1@rome.ad.networkinference.com>
> > The language really does nothing but add a few new functions
> > to XQuery:
> > 
> > individuals() returns a sequence of all the URIs used as
> > either subjects or objects in triples,
> > properties() returns a sequence of all the URIs used as
> > predicates in triples
> > dataValues() returns a sequence of all possible datavalues
> > (for now, assume they are just the values used as objects in
> > triples) related(x, y, R) is a boolean predicate representing
> > the existence of the triple 'x R y'
> > 
> > (This is actually a somewhat simplified version of the
> > syntax, but it will do for now.)
> 
> What do XQuery path expressions mean in this language?  There 
> have been
> various different uses for xpaths elsewhere - what does REX do?

This system doesn't extend the XPath stuff at all. All the functions
above return atoms, not XML structures, so there's no sense of structure
to traverse with XPath.

We've actually been through previous XQuery extensions for dealing with
RDF here at Network Inference, and we've tried the obvious
XPath-as-graph-path approach, where Rob's mother's email address might
be accessed via "Rob/hasMother/hasEmailAddress" or something. We worked
pretty hard to make it work, but users kept finding it very confusing.
For one thing, it's a lot easier to get your head around it when what
you're traversing is a real tree and not an arbitrary graph. But more
importantly, when you start treating RDF like XML then users start
thinking of it as XML, meaning they assume you're working with an
RDF/XML format. That format happens to have syntactic shortcuts which
make some of these graph-path expressions equivalent to RDF/XML XPath
expressions, but the two just aren't the same.
We found that as soon as you start usiung XPath, users start thinking
RDF/XML, and all their assumptions are wrong in all but the most trivial
cases. It seems that the only way to get users to think in terms of the
graph and not the syntax is to make the difference quite clear, so we
moved away from XPath completely.

> > 
> > A query to find all working group members and their names would be:
> > 
> > for $i in individuals()
> > for $d in dataValues()
> > where related($i, #DAWG, #memberOf)
> >   and related($i, $d, #hasName)
> > return <Member><uri>{$i}</uri><name>{$d}</name></Member>
> > 
> > 
> > 3.1: RDF Graph Pattern Matching
> > FULLY SUPPORTED
> > Each 'related' clause is nothing but a triple which might
> > include variables, and you can string them together with
> > 'and', so I think a 'where' clause qualifies as a 'graph pattern'.
> 
> How does REX interact with XQuery typing? A case I don't 
> understand is:
> 
>     :x :p :z .
>     :x :p "1" .
> 
> and related($x, $z, :p)
>
> How do I write the "for $z in ...." and match both triples?

That's a good question. Our implementation is based on OWL-DL, and in
that language datatype and object properties are completely disjoint,
which avoids the issue.

A very sensible way to perform that query would be to use the standard
xquery "union" functionality:

for $i in individuals() union dataValues()
where related(#x, #p, $i)
return {$i}

> Is there a way
> of getting all the nodes in the graph?

Does one of

for $i in individuals() return $i
for $i in individuals() union dataValues() return $i

mean what you're asking for? I'm not sure I entirely understand the
desired intent.

> Does it even matter 
> that $z can be
> either a datavalue or a URI handled?

I think there is a general problem for any RDF query language that there
is a plethora of different datatypes. Most existing query languages have
the luxury of querying data inexpressive enough that the user can be
expected to know what types they will get back, thus they can write
their query to marshal them how they want (i.e. when they get back "1",
it's their problem to figure out whether that is a string or a number
which their query turned into a string). In RDF it's quite sensible for
both "1" and 1.0 to fill the same variable (although I'd argue generally
poor RDF design).

We've toyed a little here with the things returned by "dataValues()"
being slightly more sophisticated structures than simple atoms. They
could serialize in the traditional way, but they might have attributes
that you can access using path expressions. Thus "$i/type" or "$i@type"
or something could return the datatype name for that value. You could
also provide different types of serializations via different attributes,
the idea being that a canonical serialization is often good enough, but
sometimes you want to re-encode in a "'1.0'^^xsd:float" style, so being
able to extract those two bits of data (one a string and the other a
type URI) could be useful.

> > 3.4: Subgraph Results
> > FULLY SUPPORTED
> > Results can easily be formattted as a subgraph of the
> > original graph: <rdf:RDF xmlns:rdf="..."> {
> >   for $i in individuals()
> >   for $d in dataValues()
> >   where related($i, #DAWG, #memberOf)
> >     and related($i, $d, #hasName)
> >     and isCommonlyMisspelled($d)
> >   return <rdf:Description rdf:id={$i}>
> >            <memberOf rdf:resource="#DAWG"/>
> >            <hasName>{$d}</hasName>
> >          </rdf:Description>
> > }</rdf:RDF>
> 
> It is only one case of subgraph results - it is a 
> client-defined RDF.  The
> server isn't able to contribute as in a "describe" query 
> ("tell me about") -
> this does not cover the description queries (see the 
> motorcycle parts UC).

This puzzles me a bit, and possibly comes back to the
information-theoretic problems I had with an earlier phrasing of this
requirement.

The language I've outlined allows you to use variables for properties
(predicates) as well as nodes, so it's quite trivial to echo every
triple with

for $subject in individuals()
for $object in individuals() union dataValues()
for $p in properties()
where related($subject, $object, $p)
return $subject $p $object

(the three can be formatted in XML in some reified form, or you can use
"computed constructors" to actually build something like
"<rdf:Description
rdf:id={$subject}><{$p}>{$object}</{$p}></rdf:Description>")

Tossing in a 'where' clause gets you fewer triples--a subgraph.

It's even quite straightforward encode the iterate-test-and-reencode
crap as a user-defined function.

> ----
> 
> General questions about REX's client-defined return syntax:
> (These may be XQuery questions, rather than REX ones)
> 
> 1/ How does the XML output a qname from the property URI?
> 
> Eg. How do I write:
> 
>   related($s, $o, $p)
>   return <rdf:Description rdf:id={$s} xmlns:???=????>
>             <$p>{$o}<$p>               <--- what should be 
> here for "$p"?
>          </rdf:Description>

Okay, you caught me hand-waving. For reasons I'm sure Howard understands
better than I do, XQuery doesn't let you use a "direct" constructor in
such a convenient way, so you've got to use a "computed" constructor
(http://www.w3.org/TR/xquery/#id-computedConstructors), and the syntax
becomes:

<Desc id={$s}>{element {$p}{$o}}</Desc>

(I think; Howard please correct me if my syntax is wrong.)

> 2/ How is correct syntax of the results assured?  Or is it a 
> execution time
> error for a syntax error in the XML form of the results?

I assume this is why you're not allowed to use direct constructors.
XQuery seems to enforce legal XML, and anything that isn't obviously
legal XML needs to be programmatically built by structure.

> > 4.4 User-Specifiable Serialization
> > FULLY SUPPORTED
> > One of the strengths of the language.
> 
> My reading of 4.4 "the serialization format of query results" would be
> closer to Kendall's example of users-specifiable 
> serialization which is
> about going beyond defined MIME types.  I don't read 
> "serialization format"
> (protocol issue) as "formatting" (shaping the results).

Then clearly there's some misunderstanding of this objective (probably
by me). Another something to chat about.

In truth, I don't think XQuery proper really addresses how results are
output. A true "result" in XQuery is just a sequence of things. In
practice the end results are just formatted as text, so most queries are
wrapped in an outermost formatting construct to make that text look
nice.

> However, arbitrary XML expressions, in the style of 
> CONSTRUCT, would be good
> to see in "DAWL-QL" so as to generate XML fragments at the 
> server (but as
> non-normative feature - this would not be a requirement nor design
> objective).
> 
>
Received on Saturday, 12 June 2004 19:45:28 UTC