RE: XQuery-style extensibility and filtering from Howard Katz on 2004-05-06 (public-rdf-dawg@w3.org from April to June 2004)

From: Howard Katz <howardk@fatdog.com>
Date: Thu, 6 May 2004 08:31:13 -0700
To: "Seaborne, Andy" <andy.seaborne@hp.com>, <public-rdf-dawg@w3.org>
Message-ID: <IKEOLCDFPBBPPAHGNKKOEEDAELAA.howardk@fatdog.com>
Hi Andy,

Let me first forgive your misspelling of my name ("Howeard"), which
reawakens unhappy memories of the dreaded childhood nickname, "How weird!" I
think I know how Kendall feels!!

(I'm just teasing you here. I often misspell the name myself. :-)

Before I respond to your questions, I realized after the fact that I could
have provided a simpler solution to my own query the other day than the one
I proposed. While using "intersect" might have shown off the ability of
XQuery to do set intersections (it's actually spelled "intersection" in
XQuery btw), it was needlessly complex. The query as posed could have been
solved more straightforwardly and efficiently by applying two filters in
sequence :

    *[ @foaf:name[ literal() = ... ] ]  [ @foaf:mbox[ contains-string
... ] ]

The first filter, [ @foaf:name ... ], would be applied to *, returning all
subjects having the specified foaf:name. The second filter, [ @foaf:mailbox
... ], would then be applied to that initial result sequence, further
pruning the sequence and resulting in a more efficient solution. We're able
to do sequential operations in this fashion because of XQuery's functional
nature: expressions return results, which in turn act as the inputs to other
expressions. Which serves as a very nice segue into your email ... :-) I've
interspersed my responses to your questions below.

>
> Howeard,
>
> Interesting reading.  It would be good to at least have path-like
> syntax to
> make queries more natural (not necessary a syntax that is exclusively
> paths).  Any thoughts on in coroporating named variables into
> paths or would
> this only come via FLWOR-like structures?

You could do the former. While variables in XQuery are only bound using
for/let's in FLWORs and some/every's in quanitified expressions, as in:

     let $namedCandidates := *[ @foaf:name[ literal() ... ]]
     return
           $namedCandidates[ @foaf:mbox[ ... ]]
     (: solves the same query as the above :)

I don't see anything particularly odd about recasting this as :

     $namedCandidates := *[ @foaf:name[ literal() ... ]]
     $namedCandidates[ @foaf:mbox[ ... ]]

I'm not sure this buys you a whole lot however. You still have to
bind/assign to the variable in a separate step prior to presenting the path
that uses it; the FLWOR just provides a bit of extra syntactic sugar to
sweeten the structure.

Just to finish proselytizing, FLWOR power ( :-) really becomes evident when
you use FOR clauses to bind individual nodes in a sequence. Ie :

     for $foafPerson in foaf:Person[ @foaf:gender = "male" ]
     return
           ( "a FOAF person = ", $foafPerson/@foaf:name/literal(), chr(10) )

resulting in something like :

     a  FOAF person = Roger Smith
     a  FOAF person = Peter Johanssen
     a  FOAF person = Fernando Cosmopolitan
 ...

> See also Treehugger [1] (as Damian has just today given a talk about it!).
>
> > let $people := *[ @foaf:name[ "Fernando Cosmopolitan" ]]
> > let $mailBoxes := *[ @foaf:mbox[ externalLib:contains-string( literal(),
> > "asemantics.com" ) ]]
> > return
> >       (: not sure if parens required for precedence; never a
> > bad idea if
> > unsure :)
> >       for $match in ( $people intersect $mailBoxes )
> >       return   (: we construct an output sequence of multiple 3-item
> >              subsequences :) ( "subject = ", $match, chr(10) ) (: last
> > item is a string-function-provided linefeed :)
>
> One thing I would like to understand, in the history of XQuery presumably,
> is how it ended up as a procedure style, rather than declarative
> style.  Or
> did the history predate XQuery? This example gives the algortihm for
> execution (a loop).  [I realise its just the abstraction in the
> algorithm -
> it can be implemented in various ways such as applying standard optimizing
> compiler techniques or some of the relational database technology when
> nested loops are analysed for indexes should the data be persistent.]

I'm probably not the best person to answer this one, since I've often been
confused by the terminology myself. The experts generally characterize
XQuery as functional and declarative, not procedural. I usually describe it
as a functional language that has a procedural look and feel. I've only
recently come to understand and appreciate exactly what "functional" means
in an XQuery context, since it makes it (from one perspective) particularly
easy and elegant to implement (believe it or not).

Interestingly, Michael Kay in his chapter on XQuery and XSLT in the Experts
book says:

"FLWOR expressions are generally understood in very declarative terms, being
based on the operations of the relational calculus such as Cartesian
products, selection, and projection. By contrast, the equivalent XSLT
constructs are often understood in procedural terms: People think of
xslt:for-each as the analog of a loop in a procedural programming language.
But beneath the surface, after cutting through the formal language used in
explaining the semantics, there is very little real difference in
functionality."

Maybe a working group member lurks who would be willing to talk about the
style of the language ?

Howard

> 	Andy
>
> [1] http://rdfweb.org/people/damian/treehugger/
>
> -------- Original Message --------
> > From: public-rdf-dawg-request@w3.org <>
> > Date: 5 May 2004 00:15
> >
> > The question of external functions as an extensibility
> > mechanism in XQuery
> > came up during this morning's telecon, along with the topic of boolean
> > filtering. As a personal action item, I started out with the intention
> > of providing examples of both mechanisms in XQuery. Since I've
> > been devoting
> > large amounts of personal play time however to devising an
> > RDF path notation
> > that's patterned very tightly on XQuery and is now at least
> > three-quarters baked :-), I thought I'd take a big leap here with your
> > forbearance and
> > illustrate these mechanisms in my own provisional attempt at
> > a dawg-ql.
> > Whether you like what I've come up with or not, I hope at a
> > minimum that it
> > provides a useful basis for further discussion.
> >
> > First, a few quick "dawg-path" examples. I'm using an "@" notation for
> > predicates in a striped (subject/@predicate/object) syntactic
> > style. The "@"
> > helps disambiguate short paths and provides helpful visual cues for
> > readability (imho). I'm playing with a BNF at the moment in
> > which the above
> > three-item subject/@predicate/object sequence is the longest
> > possible path
> > through the graph. Here are more :
> >
> > ============================================
> >
> >       *
> > (all nodes; subject and object both)
> >
> >       @foaf:*
> > (a listing of all (possibly distinct) foaf properties in the
> > graph (TBD --
> > in XQuery you'd need to explicitly call distinct-values() on this)
> >
> >      *[ @foaf:* ]
> > (any subject in any vocabulary having a foaf: property)
> >
> >      ex:subject107/@*
> > (all properties belonging to subject ex:subject107)
> >
> >    ex:*/@*/*
> > (all objects owned by ex: subjects)
> >
> >     ex:*/@*/literal()
> > (literals only owned by ex: subjects)
> >
> >     ex:*/foaf:*[ literal() ]
> > (foaf: properties of ex: subjects having literal values -- as
> > opposed to the
> > values themselves)
> >
> >     ex:*/foaf:*[ literal() = "1992" ]
> > (foaf: properties of ex: subjects having a literal string
> > value of "1992" )
> >
> >    ex:*/foaf:*[ literal() = ^^xsd:string ]
> >
> > (and if you really want to have fun with your indices, any strings
> > whatsoever)
> >
> > Note: if we were restricted to using only this xpath-style
> > notation, we'd
> > only be providing the equivalent of a single-variable-binding
> > capability in the result set, which would be a major restriction. See
> > further however ...
> >
> > ===========================================
> >
> > Here's the main query I want to illustrate. Building on
> > Andy's example: Find
> > all subjects having a foaf:name of "Fernando Cosmopolitan" at
> > an asemantics
> > mailing address.
> >
> > We could state this XQuery-like in several ways:
> >
> > (1) Somewhat verbosely
> > ---------------------------
> >
> > declare function contains-string( dawg-ql:Literal+ $source, xsd:string
> > $containsStr ) as xsd:boolean external;
> >
> > *[ @foaf:name[ literal() = "Fernando
> > Cosmopolitan"^^xsd"string ]] intersect
> > *[ @foaf:mbox[ contains-string( literal(), "asemantics.com" ) ]]
> >
> > returns all subject nodes meeting both conditions. The empty line is
> > whitespace for readability (allowed in XQuery). literal() is patterned
> > after XQuery/XPath's "kindTest" mechanism [eg., .../node() and
> > .../text()] and returns matching literals. The Literal+ in in the
> > function declaration OTOH is part of a type specification for the first
> > argument to the function (see next paragraph). intersect is an operator
> > that takes two
> > arguments, both of
> > either type dawg-ql:Node* or dawg-ql:Predicate* (0 or more of
> > each), and
> > returns the intersection of the two sets: all nodes belonging
> > to both. We
> > short-circuit on a null sequence result from either side. (What we do
> > in the case of dissimilar types is fun to contemplate.)
> >
> > The externally supplied boolean function contains-string()
> > shows how to
> > provide extended string-handling capability (for example)
> > that we won't be
> > providing in our native language (because of complex i17n
> > collation issues
> > or whatever). The single-line prolog declares the function to
> > be external --
> > defined on the client side of the fence; we only specify the signature.
> > The arguments to the function and the intersect operator above provide
> > XQuery-style type-checking capability [1]: dawg-ql:Literal+
> > assumes a sequen
> > ce of one or more Literal nodes via the first parameter;
> > xsd:string assumes
> > a single string for the other. The function returns a single
> > boolean. [2]
> > The names of the arguments in the declaration ($sourceStr,
> > $containsStr) are optional and provided in this case for documentation
> > purposes.
> >
> > (2) Somewhat more terse
> > -----------------------------
> >
> > declare function contains-string( dawg-ql:Literal+, xsd:string ) as
> > xsd:boolean external; *[ @foaf:name[ "Fernando Cosmopolitan" ]]
> > intersect *[ @foaf:mbox[ contains-string( "asemantics.com" ) ]]
> >
> > The BNF automatically provides a string type for "Fernando" (ie,
> > StringLiteral, and could easily do so for ints and floats as
> > well). The
> > style of function invocation in the second statement (the
> > query "body")
> > assumes that all (literal) node values for foaf:mailbox are
> > passed to the
> > function as an implicit argument, and that we're also not bothering to
> > specify a namespace for our own function (see below).
> >
> > (3) Expanded for readability (both input and output)
> > ------------------------------------------------
> >
> > declare prefix "externalLib" as
> > "http://definedOutsideTheDAWGSpecification.com";
> > declare function externalLib:contains-string(
> > dawg-ql:Literal+ $sourceStr,
> > xsd:string $containsStr ) as xsd:boolean external;
> >
> > let $people := *[ @foaf:name[ "Fernando Cosmopolitan" ]]
> > let $mailBoxes := *[ @foaf:mbox[ externalLib:contains-string( literal(),
> > "asemantics.com" ) ]]
> > return
> >       (: not sure if parens required for precedence; never a
> > bad idea if
> > unsure :)
> >       for $match in ( $people intersect $mailBoxes )
> >       return   (: we construct an output sequence of multiple 3-item
> >              subsequences :) ( "subject = ", $match, chr(10) ) (: last
> > item is a string-function-provided linefeed :)
> >
> >
> > This example demonstrates an XQuery-like variable-binding
> > style of output
> > annotation and assumes that the dawg-ql data model, similar to XQuery,
> > allows heterogeneous sequences of items, including in this
> > case items of
> > type xsd:string and dawg-ql:Node (in the let variables and the return
> > sequence) and dawg-ql:Literal (in the function call).  I'm
> > also adding a
> > namespace declaration for the external function in the prolog to
> > disambiguate it from our own built-ins (all function and
> > variable names in
> > XQuery are QNames, which is kind of cool.)
> >
> > There's more, such as mechanisms for returning triples in the result
> > sequence and the like, but I think that's sufficient to get
> > the pot bubbling
> > ... :-)
> >
> > Comments?
> > Howard
> >
> > [1] Don't freak at the mention of XQuery type-checking
> > capability. The bulk
> > of the complexity in XQuery (je contend) comes from all the
> > complications arising from the need to be able to type XML nodes using
> > XML
> > Schema; we
> > ain't got nowhere near that degree of difficulty (unless you
> > want to be able
> > to specify XPath-like descents into XMLLiterals; I don't want
> > to go there
> > myself, particularly given timeframes).
> >
> > [2] On a technical note, I'm assuming that under the hood this boolean
> > function is called repeatedly and implicitly and presented with each
> > candidate Literal argument in turn, that a boolean result is
> > returned for
> > each test, and that subject nodes on paths failing the test are then
> > dropped. I can also visualize a "bulk"-type argument-passing mechanism
> > (probably more efficient), in which all candidate literals
> > are passed to the
> > function once en masse; what gets returned (this function has
> > a different
> > signature from the one above) is the sequence of 0 or more
> > literal nodes
> > that satisfy the query; only path containing those nodes are retained.
>
Received on Thursday, 6 May 2004 11:29:42 UTC