Re: Left field from James Fuller on 2016-04-22 (public-xml-processing-model-wg@w3.org from April 2016)

From: James Fuller <jim@webcomposite.com>
Date: Fri, 22 Apr 2016 13:07:47 +0200
To: Norman Walsh <ndw@nwalsh.com>
Cc: XProc WG <public-xml-processing-model-wg@w3.org>
Message-ID: <CAEaz5mucroEjsv7yfztncqmoQSm3qCmtWPBSUw83SK78516xUw@mail.gmail.com>
apart from the ever present (minor) questions of syntax this is nice ...

a few random comments;

I think you had some typos in some of the examples.

We could use a variable to set expression language, like

   $p:expression-language = "'com.marklogic:xquery:1.0-ml"

The logic branching reminds me that boolean logic is just a
constrained case of fuzzy logic ... there is some attraction to having
a more rules based approach and I wonder if we could not achieve that
here.

Lastly, your examples reinforce (for me) the need to delineate between
pipes and variables.

J



On 22 April 2016 at 04:04, Norman Walsh <ndw@nwalsh.com> wrote:
> This message describes the results of several hours thought (walking
> beside a swollen river can be relaxing). It’s not connected directly
> to the current draft, it is in some ways a radical departure.
> Submitted for your amusement.
>
> Alex’s uplifting comments at the end of the last telcon left me
> wondering what I’d implement if it was just me. Chances are, even if
> the WG folds up its tent and I withdraw from standards work generally,
> I’ll still want to build something. I’ve got itches, after all.
>
> At a high level, we have a graph description language that connects
> together components. The graph is useful independent of the specific
> implementation details for each component.
>
> So I’m going to draw a box around the components. The interface is:
>
>   1. Component has some mechanism to read bindings from the
>      input binding context.
>   2. Component has some mechanism for contributing results
>      to the output binding context.
>
> There’s no difference between an input binding context and an output
> binding context. A binding context is a mapping from names to objects
> or sequences of objects. Ports are handled in the implementation. If
> you refer to one in the surface syntax, the pipeline waits until the
> thread writing to that pipe closes it and then delivers the sequence
> of objects it produced as the value of the reference. If you don’t
> reference it, underlying implementation of the step can (maybe) stream
> it. Quality of implementation issue.
>
> My itches are mostly XML related, but there’s nothing about pipeline
> processing that’s XML specific. I’m happy with XQuery and XML. You
> want JSON and JavaScript, fine: as long as you can implement the
> interface contract above, you’re good.
>
> We also want inline expressions, for conditionals at least. Fine.
> Those are in the current expression language. (There’s nothing that
> says you can’t mix multiple expression languages in the same
> pipeline.)
>
> How you map objects provided in a binding context into objects in your
> implementation language is your business. I don’t care. How you map
> objects constructed in your implementation back to things in the
> binding context is your business. I don’t care.
>
> The pipeline processor constructs and initial binding context which we
> allow the author to name. The author can also enumerate bindings that
> must or may be in the context in order for execution to begin.
>
> The implementation can put arbitrary additional bindings in the
> context.
>
> When the pipeline finishes, the binding context produced by the
> syntactically last flow is the result of the pipeline.
>
> The author can enumerate bindings that must or may be in the pipeline
> result context in order for the pipeline to be considered successful.
>
> Let’s look at some examples. (In each case, I’ll put the example first
> then walk through it.)
>
>     ============================================================
>     xproc version 2.0;
>     expressions xquery;
>     input source [ $source as document-node(),
>                    $style as document-node(),
>                    $schema as document-node()?,
>                    $version as xs:decimal ];
>
>     output [ $result as document-node() ];
>
>     $source -> [ $stylesheet=ref("style"), $source=ref("schema") ]
>             -> xslt()
>             -> [ $result=ref("result") ]
>     ============================================================
>
>     xproc version 2.0;
>
> Maybe it won’t be XProc, but for now…
>
>     expressions xquery;
>
> The expression language is XQuery.
>
>     input source [ $source as document-node(),
>                    $style as document-node(),
>                    $schema as document-node()?,
>                    $version? as xs:decimal ];
>
> The input context must have source and style documents. It may have a
> schema document, if it doesn’t the schema document will be the empty
> sequence. It may have a version, if it does, it must be an xs:decimal.
> If it doesn’t, then it just isn’t in the context.
>
>     output [ $output as document-node() ];
>
> The output binding context must contain an “output” document. If it
> doesn’t, the pipeline will report failure.
>
>     $source -> [ $stylesheet=ref("style"), $source=ref("schema") ]
>             -> xslt()
>             -> [ $output=ref("result") ]
>
> The xslt step expects documents on ports named stylesheet and source, so
> we extract them from the input binding context. We pass that to the XSLT
> step. The ref (pseudo)function reads the binding named from input
> binding context.
>
> When two binding context are connected by “->”, the following semantics
> are applied.
>
>   Every named binding in the context on the right hand side of -> is
>   added to the context. Then any *unreferenced* binding from the context
>   on the left hand side of -> is copied over.
>
>   Steps *should* copy through any bindings that they do not choose
>   to consume. They *may* pass through additional bindings.
>
> This means that the xslt step can read the source and stylesheet from
> the binding context. It can also read the schema, and version, bindings
> as well as any random bindings passed in to the implementation.
>
> After XSLT runs, construct the pipeline result binding. If we’d just
> named the output result, we could have left it all alone and just
> returned the xslt() result.
>
> Let’s look at another XQuery example:
>
>     ============================================================
>     xproc version 2.0;
>     expressions xquery;
>     input source [ $source as document-node(),
>                    $style as document-node(),
>                    $schema as document-node()?,
>                    $version as xs:decimal ];
>
>     $source -> [ $stylesheet=ref("style"), $source=ref("source") ]
>             -> ( $source/*/@version gt 2.0 )
>                { [ref("source"), "schema2.xsd"] -> validate-with-xsd() }
>             -> ( $source/*/@version le 2.0 )
>                { [ref("source"), "schema1.xsd"] -> validate-with-xsd() }
>             -> [ $result=ref("result") ]
>     ============================================================
>
>     xproc version 2.0;
>     expressions xquery;
>     input source [ $source as document-node(),
>                    $style as document-node(),
>                    $schema as document-node()?,
>                    $version as xs:decimal ];
>
> Here we’ve said nothing about the output, so it is whatever it is. The
> only reason to mention it is to make assertions about what must be in
> it.
>
>     $source -> [ $stylesheet=ref("style"), $source=ref("source") ]
>             -> ( $source/*/@version gt 2.0 )
>                { [ref("source"), "schema2.xsd"] -> validate-with-xsd() }
>
> The construction “() { … }” is the gating operator that Henry refered to.
> The expression can read from the input binding context. If it evaluates
> to true, then the flow in the following { } is evaluated and that’s
> the result. If the expression evaluates to false, it is as if the
> gated expression had not been in the pipeline. The input binding context
> is passed unchanged on to the next step in the flow.
>
>             -> ( $source/*/@version le 2.0 )
>                { [ref("source"), "schema1.xsd"] -> validate-with-xsd() }
>
> You can get the effect of a “switch” statement by putting them one after
> another with carefully constructed conditions, but they are entirely
> independent.
>
>             -> [ $result=ref("result") ]
>
> This is a vacuous binding construction but it serves to make the
> result explicit.
>
> Maybe you’d rather use Javascript?
>
>     ============================================================
>     xproc version 2.0;
>     expressions javascript;
>     input source [ $geodata instanceof Array,
>                    $loc instanceof Object ];
>
>     output [ $result ];
>
>     $source -> [ $geo=ref("geodata"), $loc=ref("loc") ]
>             -> {{ var found=undefined;
>                   for (var i=0; i<geo.length; i++) {
>                     if (loc.id === geo[i].id) {
>                        found = loc
>                     }
>                   }
>                   if (typeof found !== undefined) {
>                     xproc.result("result", found)
>                   }
>                }}
>     ============================================================
>
>     xproc version 2.0;
>     expressions javascript;
>
> Expressions are in JavaScript not XQuery.
>
>     input source [ $geodata instanceof Array,
>                    $loc instanceof Object ];
>
> I’m not really sure what’s right for typing things in JavaScript.
>
>     output [ $result ];
>
> The output binding must contain something (anything) called “result”.
>
>     $source -> [ $geo=ref("geodata"), $loc=ref("loc") ]
>
> Here we remap the bindings.
>
>             -> {{ var found=undefined;
>
> The {{ introduces a native language expression. From here to }} it’s
> all JavaScript. The implementation was responsible for making geo and
> loc available as reasonable types.
>
>                   for (var i=0; i<geo.length; i++) {
>                     if (loc.id === geo[i].id) {
>                        found = loc
>                     }
>                   }
>                   if (typeof found !== undefined) {
>                     xproc.result("result", found)
>
> The implementation is responsible for providing some mechanism for putting
> things in the output binding context.
>
>                   }
>                }}
>
> In fact, if you look closely, this pipeline will fail if the loc isn’t
> found in the array. (Because the binding context will not contain a
> binding for “result”.)
>
> One more XQuery example.
>
>     ============================================================
>     xproc version 2.0;
>     expressions xquery;
>     input source [ $source as document-node(),
>                    $style as document-node(),
>                    $schema as document-node()?,
>                    $version as xs:decimal ];
>
>     output [ $result as document-node() ];
>
>     $source -> xslt() >> $xsltout
>
>     $xsltout -> [ $source=ref("result"), "schema.xsd" ] -> validate() >> $validprimary
>
>     $xsltout -> [ source=ref("secondary") ]
>              -> iterate { [ref("result"), "schema.xsd"] -> validate() } >> $validsecondary
>
>     [ $validprimary, $validsecondary ] -> join() -> [ $result=ref("result") ]
>     ============================================================
>
>     xproc version 2.0;
>     expressions xquery;
>     input source [ $source as document-node(),
>                    $style as document-node(),
>                    $schema as document-node()?,
>                    $version as xs:decimal ];
>
>     output [ $result as document-node() ];
>
>     $source -> xslt() >> $xsltout
>
> $xsltout is a named reference to a binding context.
>
>     $xsltout -> [ $source=ref("result"), "schema.xsd" ] -> validate() >> $validprimary
>
> This flow starts with that binding context and maps from it. The resulting binding
> context is also given a name.
>
>     $xsltout -> [ source=ref("secondary") ]
>              -> iterate { [ref("result"), "schema.xsd"] -> validate() } >> $validsecondary
>
> The iterate operator takes each object that appears on a port named
> source and applies the following flow to it. The resulting output
> binding contexts are merged together, with matching names forming
> sequences.
>
> We also name the output binding context that it produces.
>
>     [ $validprimary, $validsecondary ] -> join() -> [ $result=ref("result") ]
>
> Finally, join reads from the first two bindings in its input binding context
> and places the things it reads into a single sequence on its result output.
>
> A closing thought. There’s a potential quoting issue with expressions
> in () and expressions in {{ }}. I propose that if the first character
> after “(” or “{{” is a Unicode “punctuation open” character then the
> end of the expression is delimited by the corresponding Unicode
> “punctuation close” character followed by either “)” or “}}”. No space
> is allowed between the bracket and the Unicode character.
>
> So:
>
> Ok : ( 3 + 4 > 6 )
> Bad: ( (3 + 4) < 4 )
> Fix: (< (3 + 4) < 4 >)
>
> We could imagine more complex quoting rules about balanced delimiters,
> but screw it. If you need a delimiter in the expression, change the
> damned delimiter.
>
>                                         Be seeing you,
>                                           norm
>
> --
> Norman Walsh
> Lead Engineer
> MarkLogic Corporation
> Phone: +1 512 761 6676
> www.marklogic.com
Received on Friday, 22 April 2016 11:08:15 UTC