- From: James Fuller <jim@webcomposite.com>
- Date: Fri, 22 Apr 2016 13:07:47 +0200
- To: Norman Walsh <ndw@nwalsh.com>
- Cc: XProc WG <public-xml-processing-model-wg@w3.org>
apart from the ever present (minor) questions of syntax this is nice ... a few random comments; I think you had some typos in some of the examples. We could use a variable to set expression language, like $p:expression-language = "'com.marklogic:xquery:1.0-ml" The logic branching reminds me that boolean logic is just a constrained case of fuzzy logic ... there is some attraction to having a more rules based approach and I wonder if we could not achieve that here. Lastly, your examples reinforce (for me) the need to delineate between pipes and variables. J On 22 April 2016 at 04:04, Norman Walsh <ndw@nwalsh.com> wrote: > This message describes the results of several hours thought (walking > beside a swollen river can be relaxing). It’s not connected directly > to the current draft, it is in some ways a radical departure. > Submitted for your amusement. > > Alex’s uplifting comments at the end of the last telcon left me > wondering what I’d implement if it was just me. Chances are, even if > the WG folds up its tent and I withdraw from standards work generally, > I’ll still want to build something. I’ve got itches, after all. > > At a high level, we have a graph description language that connects > together components. The graph is useful independent of the specific > implementation details for each component. > > So I’m going to draw a box around the components. The interface is: > > 1. Component has some mechanism to read bindings from the > input binding context. > 2. Component has some mechanism for contributing results > to the output binding context. > > There’s no difference between an input binding context and an output > binding context. A binding context is a mapping from names to objects > or sequences of objects. Ports are handled in the implementation. If > you refer to one in the surface syntax, the pipeline waits until the > thread writing to that pipe closes it and then delivers the sequence > of objects it produced as the value of the reference. If you don’t > reference it, underlying implementation of the step can (maybe) stream > it. Quality of implementation issue. > > My itches are mostly XML related, but there’s nothing about pipeline > processing that’s XML specific. I’m happy with XQuery and XML. You > want JSON and JavaScript, fine: as long as you can implement the > interface contract above, you’re good. > > We also want inline expressions, for conditionals at least. Fine. > Those are in the current expression language. (There’s nothing that > says you can’t mix multiple expression languages in the same > pipeline.) > > How you map objects provided in a binding context into objects in your > implementation language is your business. I don’t care. How you map > objects constructed in your implementation back to things in the > binding context is your business. I don’t care. > > The pipeline processor constructs and initial binding context which we > allow the author to name. The author can also enumerate bindings that > must or may be in the context in order for execution to begin. > > The implementation can put arbitrary additional bindings in the > context. > > When the pipeline finishes, the binding context produced by the > syntactically last flow is the result of the pipeline. > > The author can enumerate bindings that must or may be in the pipeline > result context in order for the pipeline to be considered successful. > > Let’s look at some examples. (In each case, I’ll put the example first > then walk through it.) > > ============================================================ > xproc version 2.0; > expressions xquery; > input source [ $source as document-node(), > $style as document-node(), > $schema as document-node()?, > $version as xs:decimal ]; > > output [ $result as document-node() ]; > > $source -> [ $stylesheet=ref("style"), $source=ref("schema") ] > -> xslt() > -> [ $result=ref("result") ] > ============================================================ > > xproc version 2.0; > > Maybe it won’t be XProc, but for now… > > expressions xquery; > > The expression language is XQuery. > > input source [ $source as document-node(), > $style as document-node(), > $schema as document-node()?, > $version? as xs:decimal ]; > > The input context must have source and style documents. It may have a > schema document, if it doesn’t the schema document will be the empty > sequence. It may have a version, if it does, it must be an xs:decimal. > If it doesn’t, then it just isn’t in the context. > > output [ $output as document-node() ]; > > The output binding context must contain an “output” document. If it > doesn’t, the pipeline will report failure. > > $source -> [ $stylesheet=ref("style"), $source=ref("schema") ] > -> xslt() > -> [ $output=ref("result") ] > > The xslt step expects documents on ports named stylesheet and source, so > we extract them from the input binding context. We pass that to the XSLT > step. The ref (pseudo)function reads the binding named from input > binding context. > > When two binding context are connected by “->”, the following semantics > are applied. > > Every named binding in the context on the right hand side of -> is > added to the context. Then any *unreferenced* binding from the context > on the left hand side of -> is copied over. > > Steps *should* copy through any bindings that they do not choose > to consume. They *may* pass through additional bindings. > > This means that the xslt step can read the source and stylesheet from > the binding context. It can also read the schema, and version, bindings > as well as any random bindings passed in to the implementation. > > After XSLT runs, construct the pipeline result binding. If we’d just > named the output result, we could have left it all alone and just > returned the xslt() result. > > Let’s look at another XQuery example: > > ============================================================ > xproc version 2.0; > expressions xquery; > input source [ $source as document-node(), > $style as document-node(), > $schema as document-node()?, > $version as xs:decimal ]; > > $source -> [ $stylesheet=ref("style"), $source=ref("source") ] > -> ( $source/*/@version gt 2.0 ) > { [ref("source"), "schema2.xsd"] -> validate-with-xsd() } > -> ( $source/*/@version le 2.0 ) > { [ref("source"), "schema1.xsd"] -> validate-with-xsd() } > -> [ $result=ref("result") ] > ============================================================ > > xproc version 2.0; > expressions xquery; > input source [ $source as document-node(), > $style as document-node(), > $schema as document-node()?, > $version as xs:decimal ]; > > Here we’ve said nothing about the output, so it is whatever it is. The > only reason to mention it is to make assertions about what must be in > it. > > $source -> [ $stylesheet=ref("style"), $source=ref("source") ] > -> ( $source/*/@version gt 2.0 ) > { [ref("source"), "schema2.xsd"] -> validate-with-xsd() } > > The construction “() { … }” is the gating operator that Henry refered to. > The expression can read from the input binding context. If it evaluates > to true, then the flow in the following { } is evaluated and that’s > the result. If the expression evaluates to false, it is as if the > gated expression had not been in the pipeline. The input binding context > is passed unchanged on to the next step in the flow. > > -> ( $source/*/@version le 2.0 ) > { [ref("source"), "schema1.xsd"] -> validate-with-xsd() } > > You can get the effect of a “switch” statement by putting them one after > another with carefully constructed conditions, but they are entirely > independent. > > -> [ $result=ref("result") ] > > This is a vacuous binding construction but it serves to make the > result explicit. > > Maybe you’d rather use Javascript? > > ============================================================ > xproc version 2.0; > expressions javascript; > input source [ $geodata instanceof Array, > $loc instanceof Object ]; > > output [ $result ]; > > $source -> [ $geo=ref("geodata"), $loc=ref("loc") ] > -> {{ var found=undefined; > for (var i=0; i<geo.length; i++) { > if (loc.id === geo[i].id) { > found = loc > } > } > if (typeof found !== undefined) { > xproc.result("result", found) > } > }} > ============================================================ > > xproc version 2.0; > expressions javascript; > > Expressions are in JavaScript not XQuery. > > input source [ $geodata instanceof Array, > $loc instanceof Object ]; > > I’m not really sure what’s right for typing things in JavaScript. > > output [ $result ]; > > The output binding must contain something (anything) called “result”. > > $source -> [ $geo=ref("geodata"), $loc=ref("loc") ] > > Here we remap the bindings. > > -> {{ var found=undefined; > > The {{ introduces a native language expression. From here to }} it’s > all JavaScript. The implementation was responsible for making geo and > loc available as reasonable types. > > for (var i=0; i<geo.length; i++) { > if (loc.id === geo[i].id) { > found = loc > } > } > if (typeof found !== undefined) { > xproc.result("result", found) > > The implementation is responsible for providing some mechanism for putting > things in the output binding context. > > } > }} > > In fact, if you look closely, this pipeline will fail if the loc isn’t > found in the array. (Because the binding context will not contain a > binding for “result”.) > > One more XQuery example. > > ============================================================ > xproc version 2.0; > expressions xquery; > input source [ $source as document-node(), > $style as document-node(), > $schema as document-node()?, > $version as xs:decimal ]; > > output [ $result as document-node() ]; > > $source -> xslt() >> $xsltout > > $xsltout -> [ $source=ref("result"), "schema.xsd" ] -> validate() >> $validprimary > > $xsltout -> [ source=ref("secondary") ] > -> iterate { [ref("result"), "schema.xsd"] -> validate() } >> $validsecondary > > [ $validprimary, $validsecondary ] -> join() -> [ $result=ref("result") ] > ============================================================ > > xproc version 2.0; > expressions xquery; > input source [ $source as document-node(), > $style as document-node(), > $schema as document-node()?, > $version as xs:decimal ]; > > output [ $result as document-node() ]; > > $source -> xslt() >> $xsltout > > $xsltout is a named reference to a binding context. > > $xsltout -> [ $source=ref("result"), "schema.xsd" ] -> validate() >> $validprimary > > This flow starts with that binding context and maps from it. The resulting binding > context is also given a name. > > $xsltout -> [ source=ref("secondary") ] > -> iterate { [ref("result"), "schema.xsd"] -> validate() } >> $validsecondary > > The iterate operator takes each object that appears on a port named > source and applies the following flow to it. The resulting output > binding contexts are merged together, with matching names forming > sequences. > > We also name the output binding context that it produces. > > [ $validprimary, $validsecondary ] -> join() -> [ $result=ref("result") ] > > Finally, join reads from the first two bindings in its input binding context > and places the things it reads into a single sequence on its result output. > > A closing thought. There’s a potential quoting issue with expressions > in () and expressions in {{ }}. I propose that if the first character > after “(” or “{{” is a Unicode “punctuation open” character then the > end of the expression is delimited by the corresponding Unicode > “punctuation close” character followed by either “)” or “}}”. No space > is allowed between the bracket and the Unicode character. > > So: > > Ok : ( 3 + 4 > 6 ) > Bad: ( (3 + 4) < 4 ) > Fix: (< (3 + 4) < 4 >) > > We could imagine more complex quoting rules about balanced delimiters, > but screw it. If you need a delimiter in the expression, change the > damned delimiter. > > Be seeing you, > norm > > -- > Norman Walsh > Lead Engineer > MarkLogic Corporation > Phone: +1 512 761 6676 > www.marklogic.com
Received on Friday, 22 April 2016 11:08:15 UTC