Re: [expath] MongoDB Module: Working Draft from Liam R. E. Quin on 2015-03-15 (public-expath@w3.org from March 2015)

From: Liam R. E. Quin <liam@w3.org>
Date: Sun, 15 Mar 2015 17:42:37 -0400
To: Hans-Juergen Rennau <hrennau@yahoo.de>
Cc: "expath@googlegroups.com" <expath@googlegroups.com>, "christian.gruen@gmail.com" <christian.gruen@gmail.com>, "public-expath@w3.org" <public-expath@w3.org>, "dannes@exist-db.org" <dannes@exist-db.org>
Message-ID: <1426455757.15169.11.camel@w3.org>
On Sun, 2015-03-15 at 16:51 +0000, Hans-Juergen Rennau wrote:
> [...] the original view that XQuery is not concerned with side 
> effects would block really important developments of XQuery.

We spend several years on this in the XQuery WG, with several 
proposals.

The only satisfactory answer I've seen is that XQuery was designed to 
be embedded in other languages, rather like SQL, and not as a complete 
system. When you try and introduce side-effects you have problems.

We agreed to add a conformance level that lets an implemntation 
support a restricted fn:put() without supporting XQuery Update, so 
that lets you write multiple files.

However, there is no way to arrange a sequence of XQuery expressions 
to be evaluated in any particular order.

The XQuery scripting extensions tried to add a ";" operator that would 
commit all transactions up to that point. This doesn't work too well, 
partly because uers get terribly confused between ; and , and partly 
because of effects on optimization.

A better approach in many cases is to lift the state-changing part of 
an application out and use something like XProc.  I'd wanted us to add 
sequences and rendezvous expressions to XQuery, but I also felt (still 
feel) XProc is a better place, at least if the XProc people can 
improve their (our) documentation and examples.

At any rate I don't see the XQuery Working Group making much more 
progress in the area of managing side-effects.

> For the time being, a very practical approach might deal with bogus 
> tokens which may enforce the sequence, like this:
> let $fooToken := local:foo(...)let $barToken := local:bar($fooToken, 
> ...)return   mongodb:find($barToken, ...)

If you use a function annotation to say foo() is side-effecting this 
may work even if it returns an empty sequence.


> Therefore, in general, the most problematic case is a function 
> returning the empty sequence, because this removes any chance to 
> enforce anything.
> Cheers,Hans-Jürgen
>  
> 
>      Christian Grün <christian.gruen@gmail.com> schrieb am 13:40 
> Sonntag, 15.März 2015:
>    
> 
>  Hi Hans-Jürgen,
> 
> Finally some feedback:
> 
> > Could you give an example where a non-empty value of the function 
> > call might
> > disturb the client? If he is not interested, he either does not 
> > assign the
> > value to a variable, or, if the call is within a let or for 
> > clause, simply
> > does not use the bound variable. I just have not yet understood 
> > the issue.
> 
> One general problem with all non-deterministic and side-effecting 
> expressions in XQuery is that their behavior is not formalized in 
> the specification of XQuery (and I doubt that anyone will tackle 
> this in the foreseeable future..).
> 
> Let's take an example. In the following query...
> 
>   let $a := local:a()
>   let $b := local:b()
>   return ()
> 
> ...a query processor may decide to first evaluate local:b() and then 
> local:a() without violating the rules of the language. It may also 
> skip the evaluation of the all expressions, because the result will 
> be an empty sequence anyway.
> 
> In the MongoDB spec, we added the simple sentence, saying that "A 
> query processor must ensure that non-deterministic functions are not 
> relocated or rewritten in the query, and that its results are not 
> cached at runtime.". This rule is by no means complete, but it is 
> supposed to indicate that the order in which a users has written 
> down function calls in the original query should be preserved. 
> Currently, it does not tell anything about the question if an 
> expression must be evaluated, and that it must not be optimized 
> away. It could be added, but in the end I believe that a general 
> EXPath document may be a better place for that.
> 
> Back to the original problem statement: We observed that client may 
> not be interested in any result output if a query is only updating. 
> If a user inserts some data and runs a MongoDB query, the expression 
> could look as follows:
> 
>   let $id := mongo:connect(...)
>   return (mongo:insert(...), mongo:find(...))
> 
> We could "swallow" the result of an updating function by e.g. adding 
> a false() predicate to the insert expression..
> 
>   mongo:insert(...)[false()]
> 
> …but we then need to be sure that it is not optimized away by the 
> query processor. The same applies if we bind the result of the 
> updating function to a dummy variable:
> 
>   let $id := mongo:connect(...)
>   let $_ := mongo:insert(...)
>   return mongo:find(...)
> 
> I need to add, though, that it could even apply in my first example 
> (a query processor could check if the result of a function call will 
> be an empty sequence, and skip the call), but it may not be as 
> obvious.
> 
> In a nutshell: The general challenge here is much broader. My
> practical approach would be to define functions in a way that 
> results will only be returned if functions are non-updating. This is 
> the way it has been done in the other EXPath modules so far.
> 
> Feedback is welcome as usual, Christian
>
Received on Sunday, 15 March 2015 21:42:42 UTC