- From: Norman Walsh <ndw@nwalsh.com>
- Date: Thu, 21 Apr 2016 21:04:24 -0500
- To: public-xml-processing-model-wg@w3.org
- Message-ID: <87mvomz9ef.fsf@nwalsh.com>
This message describes the results of several hours thought (walking beside a swollen river can be relaxing). It’s not connected directly to the current draft, it is in some ways a radical departure. Submitted for your amusement. Alex’s uplifting comments at the end of the last telcon left me wondering what I’d implement if it was just me. Chances are, even if the WG folds up its tent and I withdraw from standards work generally, I’ll still want to build something. I’ve got itches, after all. At a high level, we have a graph description language that connects together components. The graph is useful independent of the specific implementation details for each component. So I’m going to draw a box around the components. The interface is: 1. Component has some mechanism to read bindings from the input binding context. 2. Component has some mechanism for contributing results to the output binding context. There’s no difference between an input binding context and an output binding context. A binding context is a mapping from names to objects or sequences of objects. Ports are handled in the implementation. If you refer to one in the surface syntax, the pipeline waits until the thread writing to that pipe closes it and then delivers the sequence of objects it produced as the value of the reference. If you don’t reference it, underlying implementation of the step can (maybe) stream it. Quality of implementation issue. My itches are mostly XML related, but there’s nothing about pipeline processing that’s XML specific. I’m happy with XQuery and XML. You want JSON and JavaScript, fine: as long as you can implement the interface contract above, you’re good. We also want inline expressions, for conditionals at least. Fine. Those are in the current expression language. (There’s nothing that says you can’t mix multiple expression languages in the same pipeline.) How you map objects provided in a binding context into objects in your implementation language is your business. I don’t care. How you map objects constructed in your implementation back to things in the binding context is your business. I don’t care. The pipeline processor constructs and initial binding context which we allow the author to name. The author can also enumerate bindings that must or may be in the context in order for execution to begin. The implementation can put arbitrary additional bindings in the context. When the pipeline finishes, the binding context produced by the syntactically last flow is the result of the pipeline. The author can enumerate bindings that must or may be in the pipeline result context in order for the pipeline to be considered successful. Let’s look at some examples. (In each case, I’ll put the example first then walk through it.) ============================================================ xproc version 2.0; expressions xquery; input source [ $source as document-node(), $style as document-node(), $schema as document-node()?, $version as xs:decimal ]; output [ $result as document-node() ]; $source -> [ $stylesheet=ref("style"), $source=ref("schema") ] -> xslt() -> [ $result=ref("result") ] ============================================================ xproc version 2.0; Maybe it won’t be XProc, but for now… expressions xquery; The expression language is XQuery. input source [ $source as document-node(), $style as document-node(), $schema as document-node()?, $version? as xs:decimal ]; The input context must have source and style documents. It may have a schema document, if it doesn’t the schema document will be the empty sequence. It may have a version, if it does, it must be an xs:decimal. If it doesn’t, then it just isn’t in the context. output [ $output as document-node() ]; The output binding context must contain an “output” document. If it doesn’t, the pipeline will report failure. $source -> [ $stylesheet=ref("style"), $source=ref("schema") ] -> xslt() -> [ $output=ref("result") ] The xslt step expects documents on ports named stylesheet and source, so we extract them from the input binding context. We pass that to the XSLT step. The ref (pseudo)function reads the binding named from input binding context. When two binding context are connected by “->”, the following semantics are applied. Every named binding in the context on the right hand side of -> is added to the context. Then any *unreferenced* binding from the context on the left hand side of -> is copied over. Steps *should* copy through any bindings that they do not choose to consume. They *may* pass through additional bindings. This means that the xslt step can read the source and stylesheet from the binding context. It can also read the schema, and version, bindings as well as any random bindings passed in to the implementation. After XSLT runs, construct the pipeline result binding. If we’d just named the output result, we could have left it all alone and just returned the xslt() result. Let’s look at another XQuery example: ============================================================ xproc version 2.0; expressions xquery; input source [ $source as document-node(), $style as document-node(), $schema as document-node()?, $version as xs:decimal ]; $source -> [ $stylesheet=ref("style"), $source=ref("source") ] -> ( $source/*/@version gt 2.0 ) { [ref("source"), "schema2.xsd"] -> validate-with-xsd() } -> ( $source/*/@version le 2.0 ) { [ref("source"), "schema1.xsd"] -> validate-with-xsd() } -> [ $result=ref("result") ] ============================================================ xproc version 2.0; expressions xquery; input source [ $source as document-node(), $style as document-node(), $schema as document-node()?, $version as xs:decimal ]; Here we’ve said nothing about the output, so it is whatever it is. The only reason to mention it is to make assertions about what must be in it. $source -> [ $stylesheet=ref("style"), $source=ref("source") ] -> ( $source/*/@version gt 2.0 ) { [ref("source"), "schema2.xsd"] -> validate-with-xsd() } The construction “() { … }” is the gating operator that Henry refered to. The expression can read from the input binding context. If it evaluates to true, then the flow in the following { } is evaluated and that’s the result. If the expression evaluates to false, it is as if the gated expression had not been in the pipeline. The input binding context is passed unchanged on to the next step in the flow. -> ( $source/*/@version le 2.0 ) { [ref("source"), "schema1.xsd"] -> validate-with-xsd() } You can get the effect of a “switch” statement by putting them one after another with carefully constructed conditions, but they are entirely independent. -> [ $result=ref("result") ] This is a vacuous binding construction but it serves to make the result explicit. Maybe you’d rather use Javascript? ============================================================ xproc version 2.0; expressions javascript; input source [ $geodata instanceof Array, $loc instanceof Object ]; output [ $result ]; $source -> [ $geo=ref("geodata"), $loc=ref("loc") ] -> {{ var found=undefined; for (var i=0; i<geo.length; i++) { if (loc.id === geo[i].id) { found = loc } } if (typeof found !== undefined) { xproc.result("result", found) } }} ============================================================ xproc version 2.0; expressions javascript; Expressions are in JavaScript not XQuery. input source [ $geodata instanceof Array, $loc instanceof Object ]; I’m not really sure what’s right for typing things in JavaScript. output [ $result ]; The output binding must contain something (anything) called “result”. $source -> [ $geo=ref("geodata"), $loc=ref("loc") ] Here we remap the bindings. -> {{ var found=undefined; The {{ introduces a native language expression. From here to }} it’s all JavaScript. The implementation was responsible for making geo and loc available as reasonable types. for (var i=0; i<geo.length; i++) { if (loc.id === geo[i].id) { found = loc } } if (typeof found !== undefined) { xproc.result("result", found) The implementation is responsible for providing some mechanism for putting things in the output binding context. } }} In fact, if you look closely, this pipeline will fail if the loc isn’t found in the array. (Because the binding context will not contain a binding for “result”.) One more XQuery example. ============================================================ xproc version 2.0; expressions xquery; input source [ $source as document-node(), $style as document-node(), $schema as document-node()?, $version as xs:decimal ]; output [ $result as document-node() ]; $source -> xslt() >> $xsltout $xsltout -> [ $source=ref("result"), "schema.xsd" ] -> validate() >> $validprimary $xsltout -> [ source=ref("secondary") ] -> iterate { [ref("result"), "schema.xsd"] -> validate() } >> $validsecondary [ $validprimary, $validsecondary ] -> join() -> [ $result=ref("result") ] ============================================================ xproc version 2.0; expressions xquery; input source [ $source as document-node(), $style as document-node(), $schema as document-node()?, $version as xs:decimal ]; output [ $result as document-node() ]; $source -> xslt() >> $xsltout $xsltout is a named reference to a binding context. $xsltout -> [ $source=ref("result"), "schema.xsd" ] -> validate() >> $validprimary This flow starts with that binding context and maps from it. The resulting binding context is also given a name. $xsltout -> [ source=ref("secondary") ] -> iterate { [ref("result"), "schema.xsd"] -> validate() } >> $validsecondary The iterate operator takes each object that appears on a port named source and applies the following flow to it. The resulting output binding contexts are merged together, with matching names forming sequences. We also name the output binding context that it produces. [ $validprimary, $validsecondary ] -> join() -> [ $result=ref("result") ] Finally, join reads from the first two bindings in its input binding context and places the things it reads into a single sequence on its result output. A closing thought. There’s a potential quoting issue with expressions in () and expressions in {{ }}. I propose that if the first character after “(” or “{{” is a Unicode “punctuation open” character then the end of the expression is delimited by the corresponding Unicode “punctuation close” character followed by either “)” or “}}”. No space is allowed between the bracket and the Unicode character. So: Ok : ( 3 + 4 > 6 ) Bad: ( (3 + 4) < 4 ) Fix: (< (3 + 4) < 4 >) We could imagine more complex quoting rules about balanced delimiters, but screw it. If you need a delimiter in the expression, change the damned delimiter. Be seeing you, norm -- Norman Walsh Lead Engineer MarkLogic Corporation Phone: +1 512 761 6676 www.marklogic.com
Received on Friday, 22 April 2016 02:04:57 UTC