Left field

This message describes the results of several hours thought (walking
beside a swollen river can be relaxing). It’s not connected directly
to the current draft, it is in some ways a radical departure.
Submitted for your amusement.

Alex’s uplifting comments at the end of the last telcon left me
wondering what I’d implement if it was just me. Chances are, even if
the WG folds up its tent and I withdraw from standards work generally,
I’ll still want to build something. I’ve got itches, after all.

At a high level, we have a graph description language that connects
together components. The graph is useful independent of the specific
implementation details for each component.

So I’m going to draw a box around the components. The interface is:

  1. Component has some mechanism to read bindings from the
     input binding context.
  2. Component has some mechanism for contributing results
     to the output binding context.

There’s no difference between an input binding context and an output
binding context. A binding context is a mapping from names to objects
or sequences of objects. Ports are handled in the implementation. If
you refer to one in the surface syntax, the pipeline waits until the
thread writing to that pipe closes it and then delivers the sequence
of objects it produced as the value of the reference. If you don’t
reference it, underlying implementation of the step can (maybe) stream
it. Quality of implementation issue.

My itches are mostly XML related, but there’s nothing about pipeline
processing that’s XML specific. I’m happy with XQuery and XML. You
want JSON and JavaScript, fine: as long as you can implement the
interface contract above, you’re good.

We also want inline expressions, for conditionals at least. Fine.
Those are in the current expression language. (There’s nothing that
says you can’t mix multiple expression languages in the same
pipeline.)

How you map objects provided in a binding context into objects in your
implementation language is your business. I don’t care. How you map
objects constructed in your implementation back to things in the
binding context is your business. I don’t care.

The pipeline processor constructs and initial binding context which we
allow the author to name. The author can also enumerate bindings that
must or may be in the context in order for execution to begin.

The implementation can put arbitrary additional bindings in the
context.

When the pipeline finishes, the binding context produced by the
syntactically last flow is the result of the pipeline.

The author can enumerate bindings that must or may be in the pipeline
result context in order for the pipeline to be considered successful.

Let’s look at some examples. (In each case, I’ll put the example first
then walk through it.)

    ============================================================
    xproc version 2.0;
    expressions xquery;
    input source [ $source as document-node(),
                   $style as document-node(),
                   $schema as document-node()?,
                   $version as xs:decimal ];

    output [ $result as document-node() ];

    $source -> [ $stylesheet=ref("style"), $source=ref("schema") ]
            -> xslt()
            -> [ $result=ref("result") ]
    ============================================================

    xproc version 2.0;

Maybe it won’t be XProc, but for now…

    expressions xquery;

The expression language is XQuery.

    input source [ $source as document-node(),
                   $style as document-node(),
                   $schema as document-node()?,
                   $version? as xs:decimal ];

The input context must have source and style documents. It may have a
schema document, if it doesn’t the schema document will be the empty
sequence. It may have a version, if it does, it must be an xs:decimal.
If it doesn’t, then it just isn’t in the context.

    output [ $output as document-node() ];

The output binding context must contain an “output” document. If it
doesn’t, the pipeline will report failure.

    $source -> [ $stylesheet=ref("style"), $source=ref("schema") ]
            -> xslt()
            -> [ $output=ref("result") ]

The xslt step expects documents on ports named stylesheet and source, so
we extract them from the input binding context. We pass that to the XSLT
step. The ref (pseudo)function reads the binding named from input
binding context.

When two binding context are connected by “->”, the following semantics
are applied.

  Every named binding in the context on the right hand side of -> is
  added to the context. Then any *unreferenced* binding from the context
  on the left hand side of -> is copied over.

  Steps *should* copy through any bindings that they do not choose
  to consume. They *may* pass through additional bindings.

This means that the xslt step can read the source and stylesheet from
the binding context. It can also read the schema, and version, bindings
as well as any random bindings passed in to the implementation.

After XSLT runs, construct the pipeline result binding. If we’d just
named the output result, we could have left it all alone and just
returned the xslt() result.

Let’s look at another XQuery example:

    ============================================================
    xproc version 2.0;
    expressions xquery;
    input source [ $source as document-node(),
                   $style as document-node(),
                   $schema as document-node()?,
                   $version as xs:decimal ];

    $source -> [ $stylesheet=ref("style"), $source=ref("source") ]
            -> ( $source/*/@version gt 2.0 )
               { [ref("source"), "schema2.xsd"] -> validate-with-xsd() }
            -> ( $source/*/@version le 2.0 )
               { [ref("source"), "schema1.xsd"] -> validate-with-xsd() }
            -> [ $result=ref("result") ]
    ============================================================

    xproc version 2.0;
    expressions xquery;
    input source [ $source as document-node(),
                   $style as document-node(),
                   $schema as document-node()?,
                   $version as xs:decimal ];

Here we’ve said nothing about the output, so it is whatever it is. The
only reason to mention it is to make assertions about what must be in
it.

    $source -> [ $stylesheet=ref("style"), $source=ref("source") ]
            -> ( $source/*/@version gt 2.0 )
               { [ref("source"), "schema2.xsd"] -> validate-with-xsd() }

The construction “() { … }” is the gating operator that Henry refered to.
The expression can read from the input binding context. If it evaluates
to true, then the flow in the following { } is evaluated and that’s
the result. If the expression evaluates to false, it is as if the
gated expression had not been in the pipeline. The input binding context
is passed unchanged on to the next step in the flow.

            -> ( $source/*/@version le 2.0 )
               { [ref("source"), "schema1.xsd"] -> validate-with-xsd() }

You can get the effect of a “switch” statement by putting them one after
another with carefully constructed conditions, but they are entirely
independent.

            -> [ $result=ref("result") ]

This is a vacuous binding construction but it serves to make the
result explicit.

Maybe you’d rather use Javascript?

    ============================================================
    xproc version 2.0;
    expressions javascript;
    input source [ $geodata instanceof Array,
                   $loc instanceof Object ];

    output [ $result ];

    $source -> [ $geo=ref("geodata"), $loc=ref("loc") ]
            -> {{ var found=undefined;
                  for (var i=0; i<geo.length; i++) {
                    if (loc.id === geo[i].id) {
                       found = loc
                    }
                  }
                  if (typeof found !== undefined) {
                    xproc.result("result", found)
                  }
               }}
    ============================================================

    xproc version 2.0;
    expressions javascript;

Expressions are in JavaScript not XQuery.

    input source [ $geodata instanceof Array,
                   $loc instanceof Object ];

I’m not really sure what’s right for typing things in JavaScript.

    output [ $result ];

The output binding must contain something (anything) called “result”.

    $source -> [ $geo=ref("geodata"), $loc=ref("loc") ]

Here we remap the bindings.

            -> {{ var found=undefined;

The {{ introduces a native language expression. From here to }} it’s
all JavaScript. The implementation was responsible for making geo and
loc available as reasonable types.

                  for (var i=0; i<geo.length; i++) {
                    if (loc.id === geo[i].id) {
                       found = loc
                    }
                  }
                  if (typeof found !== undefined) {
                    xproc.result("result", found)

The implementation is responsible for providing some mechanism for putting
things in the output binding context.

                  }
               }}

In fact, if you look closely, this pipeline will fail if the loc isn’t
found in the array. (Because the binding context will not contain a
binding for “result”.)

One more XQuery example.

    ============================================================
    xproc version 2.0;
    expressions xquery;
    input source [ $source as document-node(),
                   $style as document-node(),
                   $schema as document-node()?,
                   $version as xs:decimal ];

    output [ $result as document-node() ];

    $source -> xslt() >> $xsltout

    $xsltout -> [ $source=ref("result"), "schema.xsd" ] -> validate() >> $validprimary

    $xsltout -> [ source=ref("secondary") ]
             -> iterate { [ref("result"), "schema.xsd"] -> validate() } >> $validsecondary

    [ $validprimary, $validsecondary ] -> join() -> [ $result=ref("result") ]
    ============================================================

    xproc version 2.0;
    expressions xquery;
    input source [ $source as document-node(),
                   $style as document-node(),
                   $schema as document-node()?,
                   $version as xs:decimal ];

    output [ $result as document-node() ];

    $source -> xslt() >> $xsltout

$xsltout is a named reference to a binding context.

    $xsltout -> [ $source=ref("result"), "schema.xsd" ] -> validate() >> $validprimary

This flow starts with that binding context and maps from it. The resulting binding
context is also given a name.

    $xsltout -> [ source=ref("secondary") ]
             -> iterate { [ref("result"), "schema.xsd"] -> validate() } >> $validsecondary

The iterate operator takes each object that appears on a port named
source and applies the following flow to it. The resulting output
binding contexts are merged together, with matching names forming
sequences.

We also name the output binding context that it produces.

    [ $validprimary, $validsecondary ] -> join() -> [ $result=ref("result") ]

Finally, join reads from the first two bindings in its input binding context
and places the things it reads into a single sequence on its result output.

A closing thought. There’s a potential quoting issue with expressions
in () and expressions in {{ }}. I propose that if the first character
after “(” or “{{” is a Unicode “punctuation open” character then the
end of the expression is delimited by the corresponding Unicode
“punctuation close” character followed by either “)” or “}}”. No space
is allowed between the bracket and the Unicode character.

So:

Ok : ( 3 + 4 > 6 )
Bad: ( (3 + 4) < 4 )
Fix: (< (3 + 4) < 4 >)

We could imagine more complex quoting rules about balanced delimiters,
but screw it. If you need a delimiter in the expression, change the
damned delimiter.

                                        Be seeing you,
                                          norm

-- 
Norman Walsh
Lead Engineer
MarkLogic Corporation
Phone: +1 512 761 6676
www.marklogic.com

Received on Friday, 22 April 2016 02:04:57 UTC