RE: XQuery-style extensibility and filtering

Howeard,

Interesting reading.  It would be good to at least have path-like syntax to
make queries more natural (not necessary a syntax that is exclusively
paths).  Any thoughts on in coroporating named variables into paths or would
this only come via FLWOR-like structures?

See also Treehugger [1] (as Damian has just today given a talk about it!).

> let $people := *[ @foaf:name[ "Fernando Cosmopolitan" ]]
> let $mailBoxes := *[ @foaf:mbox[ externalLib:contains-string( literal(),
> "asemantics.com" ) ]]
> return
>       (: not sure if parens required for precedence; never a
> bad idea if
> unsure :)
>       for $match in ( $people intersect $mailBoxes )
>       return   (: we construct an output sequence of multiple 3-item
>              subsequences :) ( "subject = ", $match, chr(10) ) (: last
> item is a string-function-provided linefeed :)

One thing I would like to understand, in the history of XQuery presumably,
is how it ended up as a procedure style, rather than declarative style.  Or
did the history predate XQuery? This example gives the algortihm for
execution (a loop).  [I realise its just the abstraction in the algorithm -
it can be implemented in various ways such as applying standard optimizing
compiler techniques or some of the relational database technology when
nested loops are analysed for indexes should the data be persistent.]

	Andy

[1] http://rdfweb.org/people/damian/treehugger/

-------- Original Message --------
> From: public-rdf-dawg-request@w3.org <>
> Date: 5 May 2004 00:15
> 
> The question of external functions as an extensibility
> mechanism in XQuery
> came up during this morning's telecon, along with the topic of boolean
> filtering. As a personal action item, I started out with the intention
> of providing examples of both mechanisms in XQuery. Since I've
> been devoting
> large amounts of personal play time however to devising an
> RDF path notation
> that's patterned very tightly on XQuery and is now at least
> three-quarters baked :-), I thought I'd take a big leap here with your
> forbearance and
> illustrate these mechanisms in my own provisional attempt at
> a dawg-ql.
> Whether you like what I've come up with or not, I hope at a
> minimum that it
> provides a useful basis for further discussion.
> 
> First, a few quick "dawg-path" examples. I'm using an "@" notation for
> predicates in a striped (subject/@predicate/object) syntactic
> style. The "@"
> helps disambiguate short paths and provides helpful visual cues for
> readability (imho). I'm playing with a BNF at the moment in
> which the above
> three-item subject/@predicate/object sequence is the longest
> possible path
> through the graph. Here are more :
> 
> ============================================
> 
>       *
> (all nodes; subject and object both)
> 
>       @foaf:*
> (a listing of all (possibly distinct) foaf properties in the
> graph (TBD --
> in XQuery you'd need to explicitly call distinct-values() on this)
> 
>      *[ @foaf:* ]
> (any subject in any vocabulary having a foaf: property)
> 
>      ex:subject107/@*
> (all properties belonging to subject ex:subject107)
> 
>    ex:*/@*/*
> (all objects owned by ex: subjects)
> 
>     ex:*/@*/literal()
> (literals only owned by ex: subjects)
> 
>     ex:*/foaf:*[ literal() ]
> (foaf: properties of ex: subjects having literal values -- as
> opposed to the
> values themselves)
> 
>     ex:*/foaf:*[ literal() = "1992" ]
> (foaf: properties of ex: subjects having a literal string
> value of "1992" )
> 
>    ex:*/foaf:*[ literal() = ^^xsd:string ]
> 
> (and if you really want to have fun with your indices, any strings
> whatsoever) 
> 
> Note: if we were restricted to using only this xpath-style
> notation, we'd
> only be providing the equivalent of a single-variable-binding
> capability in the result set, which would be a major restriction. See
> further however ...
> 
> ===========================================
> 
> Here's the main query I want to illustrate. Building on
> Andy's example: Find
> all subjects having a foaf:name of "Fernando Cosmopolitan" at
> an asemantics
> mailing address.
> 
> We could state this XQuery-like in several ways:
> 
> (1) Somewhat verbosely
> ---------------------------
> 
> declare function contains-string( dawg-ql:Literal+ $source, xsd:string
> $containsStr ) as xsd:boolean external;
> 
> *[ @foaf:name[ literal() = "Fernando
> Cosmopolitan"^^xsd"string ]] intersect
> *[ @foaf:mbox[ contains-string( literal(), "asemantics.com" ) ]]
> 
> returns all subject nodes meeting both conditions. The empty line is
> whitespace for readability (allowed in XQuery). literal() is patterned
> after XQuery/XPath's "kindTest" mechanism [eg., .../node() and
> .../text()] and returns matching literals. The Literal+ in in the
> function declaration OTOH is part of a type specification for the first
> argument to the function (see next paragraph). intersect is an operator
> that takes two 
> arguments, both of
> either type dawg-ql:Node* or dawg-ql:Predicate* (0 or more of
> each), and
> returns the intersection of the two sets: all nodes belonging
> to both. We
> short-circuit on a null sequence result from either side. (What we do
> in the case of dissimilar types is fun to contemplate.)
> 
> The externally supplied boolean function contains-string()
> shows how to
> provide extended string-handling capability (for example)
> that we won't be
> providing in our native language (because of complex i17n
> collation issues
> or whatever). The single-line prolog declares the function to
> be external --
> defined on the client side of the fence; we only specify the signature.
> The arguments to the function and the intersect operator above provide
> XQuery-style type-checking capability [1]: dawg-ql:Literal+
> assumes a sequen
> ce of one or more Literal nodes via the first parameter;
> xsd:string assumes
> a single string for the other. The function returns a single
> boolean. [2]
> The names of the arguments in the declaration ($sourceStr,
> $containsStr) are optional and provided in this case for documentation
> purposes. 
> 
> (2) Somewhat more terse
> -----------------------------
> 
> declare function contains-string( dawg-ql:Literal+, xsd:string ) as
> xsd:boolean external; *[ @foaf:name[ "Fernando Cosmopolitan" ]]
> intersect *[ @foaf:mbox[ contains-string( "asemantics.com" ) ]]
> 
> The BNF automatically provides a string type for "Fernando" (ie,
> StringLiteral, and could easily do so for ints and floats as
> well). The
> style of function invocation in the second statement (the
> query "body")
> assumes that all (literal) node values for foaf:mailbox are
> passed to the
> function as an implicit argument, and that we're also not bothering to
> specify a namespace for our own function (see below).
> 
> (3) Expanded for readability (both input and output)
> ------------------------------------------------
> 
> declare prefix "externalLib" as
> "http://definedOutsideTheDAWGSpecification.com";
> declare function externalLib:contains-string(
> dawg-ql:Literal+ $sourceStr,
> xsd:string $containsStr ) as xsd:boolean external;
> 
> let $people := *[ @foaf:name[ "Fernando Cosmopolitan" ]]
> let $mailBoxes := *[ @foaf:mbox[ externalLib:contains-string( literal(),
> "asemantics.com" ) ]]
> return
>       (: not sure if parens required for precedence; never a
> bad idea if
> unsure :)
>       for $match in ( $people intersect $mailBoxes )
>       return   (: we construct an output sequence of multiple 3-item
>              subsequences :) ( "subject = ", $match, chr(10) ) (: last
> item is a string-function-provided linefeed :)
> 
> 
> This example demonstrates an XQuery-like variable-binding
> style of output
> annotation and assumes that the dawg-ql data model, similar to XQuery,
> allows heterogeneous sequences of items, including in this
> case items of
> type xsd:string and dawg-ql:Node (in the let variables and the return
> sequence) and dawg-ql:Literal (in the function call).  I'm
> also adding a
> namespace declaration for the external function in the prolog to
> disambiguate it from our own built-ins (all function and
> variable names in
> XQuery are QNames, which is kind of cool.)
> 
> There's more, such as mechanisms for returning triples in the result
> sequence and the like, but I think that's sufficient to get
> the pot bubbling
> ... :-)
> 
> Comments?
> Howard
> 
> [1] Don't freak at the mention of XQuery type-checking
> capability. The bulk
> of the complexity in XQuery (je contend) comes from all the
> complications arising from the need to be able to type XML nodes using
> XML 
> Schema; we
> ain't got nowhere near that degree of difficulty (unless you
> want to be able
> to specify XPath-like descents into XMLLiterals; I don't want
> to go there
> myself, particularly given timeframes).
> 
> [2] On a technical note, I'm assuming that under the hood this boolean
> function is called repeatedly and implicitly and presented with each
> candidate Literal argument in turn, that a boolean result is
> returned for
> each test, and that subject nodes on paths failing the test are then
> dropped. I can also visualize a "bulk"-type argument-passing mechanism
> (probably more efficient), in which all candidate literals
> are passed to the
> function once en masse; what gets returned (this function has
> a different
> signature from the one above) is the sequence of 0 or more
> literal nodes
> that satisfy the query; only path containing those nodes are retained.

Received on Wednesday, 5 May 2004 11:15:00 UTC