- From: Rob Shearer <Rob.Shearer@networkinference.com>
- Date: Sat, 12 Jun 2004 16:43:38 -0700
- To: "Seaborne, Andy" <andy.seaborne@hp.com>, "RDF Data Access Working Group" <public-rdf-dawg@w3.org>
> > The language really does nothing but add a few new functions > > to XQuery: > > > > individuals() returns a sequence of all the URIs used as > > either subjects or objects in triples, > > properties() returns a sequence of all the URIs used as > > predicates in triples > > dataValues() returns a sequence of all possible datavalues > > (for now, assume they are just the values used as objects in > > triples) related(x, y, R) is a boolean predicate representing > > the existence of the triple 'x R y' > > > > (This is actually a somewhat simplified version of the > > syntax, but it will do for now.) > > What do XQuery path expressions mean in this language? There > have been > various different uses for xpaths elsewhere - what does REX do? This system doesn't extend the XPath stuff at all. All the functions above return atoms, not XML structures, so there's no sense of structure to traverse with XPath. We've actually been through previous XQuery extensions for dealing with RDF here at Network Inference, and we've tried the obvious XPath-as-graph-path approach, where Rob's mother's email address might be accessed via "Rob/hasMother/hasEmailAddress" or something. We worked pretty hard to make it work, but users kept finding it very confusing. For one thing, it's a lot easier to get your head around it when what you're traversing is a real tree and not an arbitrary graph. But more importantly, when you start treating RDF like XML then users start thinking of it as XML, meaning they assume you're working with an RDF/XML format. That format happens to have syntactic shortcuts which make some of these graph-path expressions equivalent to RDF/XML XPath expressions, but the two just aren't the same. We found that as soon as you start usiung XPath, users start thinking RDF/XML, and all their assumptions are wrong in all but the most trivial cases. It seems that the only way to get users to think in terms of the graph and not the syntax is to make the difference quite clear, so we moved away from XPath completely. > > > > A query to find all working group members and their names would be: > > > > for $i in individuals() > > for $d in dataValues() > > where related($i, #DAWG, #memberOf) > > and related($i, $d, #hasName) > > return <Member><uri>{$i}</uri><name>{$d}</name></Member> > > > > > > 3.1: RDF Graph Pattern Matching > > FULLY SUPPORTED > > Each 'related' clause is nothing but a triple which might > > include variables, and you can string them together with > > 'and', so I think a 'where' clause qualifies as a 'graph pattern'. > > How does REX interact with XQuery typing? A case I don't > understand is: > > :x :p :z . > :x :p "1" . > > and related($x, $z, :p) > > How do I write the "for $z in ...." and match both triples? That's a good question. Our implementation is based on OWL-DL, and in that language datatype and object properties are completely disjoint, which avoids the issue. A very sensible way to perform that query would be to use the standard xquery "union" functionality: for $i in individuals() union dataValues() where related(#x, #p, $i) return {$i} > Is there a way > of getting all the nodes in the graph? Does one of for $i in individuals() return $i for $i in individuals() union dataValues() return $i mean what you're asking for? I'm not sure I entirely understand the desired intent. > Does it even matter > that $z can be > either a datavalue or a URI handled? I think there is a general problem for any RDF query language that there is a plethora of different datatypes. Most existing query languages have the luxury of querying data inexpressive enough that the user can be expected to know what types they will get back, thus they can write their query to marshal them how they want (i.e. when they get back "1", it's their problem to figure out whether that is a string or a number which their query turned into a string). In RDF it's quite sensible for both "1" and 1.0 to fill the same variable (although I'd argue generally poor RDF design). We've toyed a little here with the things returned by "dataValues()" being slightly more sophisticated structures than simple atoms. They could serialize in the traditional way, but they might have attributes that you can access using path expressions. Thus "$i/type" or "$i@type" or something could return the datatype name for that value. You could also provide different types of serializations via different attributes, the idea being that a canonical serialization is often good enough, but sometimes you want to re-encode in a "'1.0'^^xsd:float" style, so being able to extract those two bits of data (one a string and the other a type URI) could be useful. > > 3.4: Subgraph Results > > FULLY SUPPORTED > > Results can easily be formattted as a subgraph of the > > original graph: <rdf:RDF xmlns:rdf="..."> { > > for $i in individuals() > > for $d in dataValues() > > where related($i, #DAWG, #memberOf) > > and related($i, $d, #hasName) > > and isCommonlyMisspelled($d) > > return <rdf:Description rdf:id={$i}> > > <memberOf rdf:resource="#DAWG"/> > > <hasName>{$d}</hasName> > > </rdf:Description> > > }</rdf:RDF> > > It is only one case of subgraph results - it is a > client-defined RDF. The > server isn't able to contribute as in a "describe" query > ("tell me about") - > this does not cover the description queries (see the > motorcycle parts UC). This puzzles me a bit, and possibly comes back to the information-theoretic problems I had with an earlier phrasing of this requirement. The language I've outlined allows you to use variables for properties (predicates) as well as nodes, so it's quite trivial to echo every triple with for $subject in individuals() for $object in individuals() union dataValues() for $p in properties() where related($subject, $object, $p) return $subject $p $object (the three can be formatted in XML in some reified form, or you can use "computed constructors" to actually build something like "<rdf:Description rdf:id={$subject}><{$p}>{$object}</{$p}></rdf:Description>") Tossing in a 'where' clause gets you fewer triples--a subgraph. It's even quite straightforward encode the iterate-test-and-reencode crap as a user-defined function. > ---- > > General questions about REX's client-defined return syntax: > (These may be XQuery questions, rather than REX ones) > > 1/ How does the XML output a qname from the property URI? > > Eg. How do I write: > > related($s, $o, $p) > return <rdf:Description rdf:id={$s} xmlns:???=????> > <$p>{$o}<$p> <--- what should be > here for "$p"? > </rdf:Description> Okay, you caught me hand-waving. For reasons I'm sure Howard understands better than I do, XQuery doesn't let you use a "direct" constructor in such a convenient way, so you've got to use a "computed" constructor (http://www.w3.org/TR/xquery/#id-computedConstructors), and the syntax becomes: <Desc id={$s}>{element {$p}{$o}}</Desc> (I think; Howard please correct me if my syntax is wrong.) > 2/ How is correct syntax of the results assured? Or is it a > execution time > error for a syntax error in the XML form of the results? I assume this is why you're not allowed to use direct constructors. XQuery seems to enforce legal XML, and anything that isn't obviously legal XML needs to be programmatically built by structure. > > 4.4 User-Specifiable Serialization > > FULLY SUPPORTED > > One of the strengths of the language. > > My reading of 4.4 "the serialization format of query results" would be > closer to Kendall's example of users-specifiable > serialization which is > about going beyond defined MIME types. I don't read > "serialization format" > (protocol issue) as "formatting" (shaping the results). Then clearly there's some misunderstanding of this objective (probably by me). Another something to chat about. In truth, I don't think XQuery proper really addresses how results are output. A true "result" in XQuery is just a sequence of things. In practice the end results are just formatted as text, so most queries are wrapped in an outermost formatting construct to make that text look nice. > However, arbitrary XML expressions, in the style of > CONSTRUCT, would be good > to see in "DAWL-QL" so as to generate XML fragments at the > server (but as > non-normative feature - this would not be a requirement nor design > objective). > >
Received on Saturday, 12 June 2004 19:45:28 UTC