- From: Howard Katz <howardk@fatdog.com>
- Date: Tue, 4 May 2004 16:16:45 -0700
- To: "RDF Data Access Working Group" <public-rdf-dawg@w3.org>
The question of external functions as an extensibility mechanism in XQuery came up during this morning's telecon, along with the topic of boolean filtering. As a personal action item, I started out with the intention of providing examples of both mechanisms in XQuery. Since I've been devoting large amounts of personal play time however to devising an RDF path notation that's patterned very tightly on XQuery and is now at least three-quarters baked :-), I thought I'd take a big leap here with your forbearance and illustrate these mechanisms in my own provisional attempt at a dawg-ql. Whether you like what I've come up with or not, I hope at a minimum that it provides a useful basis for further discussion. First, a few quick "dawg-path" examples. I'm using an "@" notation for predicates in a striped (subject/@predicate/object) syntactic style. The "@" helps disambiguate short paths and provides helpful visual cues for readability (imho). I'm playing with a BNF at the moment in which the above three-item subject/@predicate/object sequence is the longest possible path through the graph. Here are more : ============================================ * (all nodes; subject and object both) @foaf:* (a listing of all (possibly distinct) foaf properties in the graph (TBD -- in XQuery you'd need to explicitly call distinct-values() on this) *[ @foaf:* ] (any subject in any vocabulary having a foaf: property) ex:subject107/@* (all properties belonging to subject ex:subject107) ex:*/@*/* (all objects owned by ex: subjects) ex:*/@*/literal() (literals only owned by ex: subjects) ex:*/foaf:*[ literal() ] (foaf: properties of ex: subjects having literal values -- as opposed to the values themselves) ex:*/foaf:*[ literal() = "1992" ] (foaf: properties of ex: subjects having a literal string value of "1992" ) ex:*/foaf:*[ literal() = ^^xsd:string ] (and if you really want to have fun with your indices, any strings whatsoever) Note: if we were restricted to using only this xpath-style notation, we'd only be providing the equivalent of a single-variable-binding capability in the result set, which would be a major restriction. See further however ... =========================================== Here's the main query I want to illustrate. Building on Andy's example: Find all subjects having a foaf:name of "Fernando Cosmopolitan" at an asemantics mailing address. We could state this XQuery-like in several ways: (1) Somewhat verbosely --------------------------- declare function contains-string( dawg-ql:Literal+ $source, xsd:string $containsStr ) as xsd:boolean external; *[ @foaf:name[ literal() = "Fernando Cosmopolitan"^^xsd"string ]] intersect *[ @foaf:mbox[ contains-string( literal(), "asemantics.com" ) ]] returns all subject nodes meeting both conditions. The empty line is whitespace for readability (allowed in XQuery). literal() is patterned after XQuery/XPath's "kindTest" mechanism [eg., .../node() and .../text()] and returns matching literals. The Literal+ in in the function declaration OTOH is part of a type specification for the first argument to the function (see next paragraph). intersect is an operator that takes two arguments, both of either type dawg-ql:Node* or dawg-ql:Predicate* (0 or more of each), and returns the intersection of the two sets: all nodes belonging to both. We short-circuit on a null sequence result from either side. (What we do in the case of dissimilar types is fun to contemplate.) The externally supplied boolean function contains-string() shows how to provide extended string-handling capability (for example) that we won't be providing in our native language (because of complex i17n collation issues or whatever). The single-line prolog declares the function to be external -- defined on the client side of the fence; we only specify the signature. The arguments to the function and the intersect operator above provide XQuery-style type-checking capability [1]: dawg-ql:Literal+ assumes a sequen ce of one or more Literal nodes via the first parameter; xsd:string assumes a single string for the other. The function returns a single boolean. [2] The names of the arguments in the declaration ($sourceStr, $containsStr) are optional and provided in this case for documentation purposes. (2) Somewhat more terse ----------------------------- declare function contains-string( dawg-ql:Literal+, xsd:string ) as xsd:boolean external; *[ @foaf:name[ "Fernando Cosmopolitan" ]] intersect *[ @foaf:mbox[ contains-string( "asemantics.com" ) ]] The BNF automatically provides a string type for "Fernando" (ie, StringLiteral, and could easily do so for ints and floats as well). The style of function invocation in the second statement (the query "body") assumes that all (literal) node values for foaf:mailbox are passed to the function as an implicit argument, and that we're also not bothering to specify a namespace for our own function (see below). (3) Expanded for readability (both input and output) ------------------------------------------------ declare prefix "externalLib" as "http://definedOutsideTheDAWGSpecification.com"; declare function externalLib:contains-string( dawg-ql:Literal+ $sourceStr, xsd:string $containsStr ) as xsd:boolean external; let $people := *[ @foaf:name[ "Fernando Cosmopolitan" ]] let $mailBoxes := *[ @foaf:mbox[ externalLib:contains-string( literal(), "asemantics.com" ) ]] return (: not sure if parens required for precedence; never a bad idea if unsure :) for $match in ( $people intersect $mailBoxes ) return (: we construct an output sequence of multiple 3-item subsequences :) ( "subject = ", $match, chr(10) ) (: last item is a string-function-provided linefeed :) This example demonstrates an XQuery-like variable-binding style of output annotation and assumes that the dawg-ql data model, similar to XQuery, allows heterogeneous sequences of items, including in this case items of type xsd:string and dawg-ql:Node (in the let variables and the return sequence) and dawg-ql:Literal (in the function call). I'm also adding a namespace declaration for the external function in the prolog to disambiguate it from our own built-ins (all function and variable names in XQuery are QNames, which is kind of cool.) There's more, such as mechanisms for returning triples in the result sequence and the like, but I think that's sufficient to get the pot bubbling ... :-) Comments? Howard [1] Don't freak at the mention of XQuery type-checking capability. The bulk of the complexity in XQuery (je contend) comes from all the complications arising from the need to be able to type XML nodes using XML Schema; we ain't got nowhere near that degree of difficulty (unless you want to be able to specify XPath-like descents into XMLLiterals; I don't want to go there myself, particularly given timeframes). [2] On a technical note, I'm assuming that under the hood this boolean function is called repeatedly and implicitly and presented with each candidate Literal argument in turn, that a boolean result is returned for each test, and that subject nodes on paths failing the test are then dropped. I can also visualize a "bulk"-type argument-passing mechanism (probably more efficient), in which all candidate literals are passed to the function once en masse; what gets returned (this function has a different signature from the one above) is the sequence of 0 or more literal nodes that satisfy the query; only path containing those nodes are retained.
Received on Tuesday, 4 May 2004 19:15:11 UTC