Thinking about port set expressions and block expressions from Norman Walsh on 2016-04-15 (public-xml-processing-model-wg@w3.org from April 2016)

From: Norman Walsh <ndw@nwalsh.com>
Date: Fri, 15 Apr 2016 10:29:01 -0500
To: public-xml-processing-model-wg@w3.org
Message-ID: <87fuumc2ma.fsf@nwalsh.com>
It's been a long time since we talked about the signatures for steps.
Suppose we used square brackets there too:

  declare step xslt($mode as xs:QName, $params as map())
        [$source, $stylesheet; $result, $secondary]
  {
    ...
  }

  Aside: we seem to have some inconsistency about whether port names
  have $-prefixes or not.

I'm not sure it's syntactically the best thing ever, but ...

Now we can say that a port set expression provides a mapping of
readable ports for the step that follows:

  [source="doc.xml", "style.xsl"] -> xslt()

(I kind of like the arrow for readability but I think Henry's right,
it has no purpose except readability.)

I imagine that the semantics of that mapping are something like this:

Given a set of readable ports and a set of input ports, you match up
the named ones first then you match up the remaining ones in order.

So each of the following binds source and stylesheet for the following
xslt() step just as you'd expect:

  [source="doc.xml", "style.xsl"] -> xslt()
  [source="doc.xml", stylesheet="style.xsl"] -> xslt()
  [stylesheet="style.xsl", source="doc.xml"] -> xslt()
  ["doc.xml", "style.xsl"] -> xslt()

For a case where the named and ordinal ones are "out of order":

  [stylesheet="style.xsl", "doc.xml"] -> xslt()

I think there are two alternatives. We can say that this binding works
by matching up the stylesheet port by name and the source port from
the remaining possibilities ordinally or we can make it an error
because you have to "go backward" to make it work. (I can't think of a
concise way to express the error condition, but I think I can see it
pretty clearly.)

I'm naturally inclined to prefer to make it an error, but I'm not sure
that's the right thing because I don’t think we want to make extra
ports an error. I think both:

  [source="doc.xml", "style.xsl", "alt.xml"] -> xslt()
  [source="doc.xml", alt="alt.xml", stylesheet="style.xsl"] -> xslt()

bind source to "doc.xml" and stylesheet to "style.xsl" in the XSLT
step. The extra binding is just ignored; there's no way for the XSLT
step to read it.

Another interesting case is when the names don't match up at all:

  [result="doc.xml", "style.xsl"]

I think the right answer here is to say that names which don't match
are ignored. So the preceding port set expression is exactly
equivalent to ["doc.xml", "style.xsl"] for a following xslt() step;
if the following step has a 'result' input port then it gets doc.xml.

Absent ports are just treated as empty:

  [stylesheet="style.xsl"] -> xslt()

binds the stylesheet to "style.xsl" and leaves the source input empty.
I suppose you could also do that this way: [(), "style.xsl"] though
I'm not sure we've worked out what kinds of expressions can go in a
port set expression.

The result of a step is, I think, a port set expression that binds
the output ports to the results. So the xslt() step produces a binding
that's equivalent to this:

   [result="result.xml", secondary="secondary.xml"]

By the rule that says that port names that don't match are simply
ignored, we can still say that:

   xslt() -> store(href="output.xml")

would store "result.xml" into "output.xml" but would do nothing with
the secondary output. If you want to remap things, you have to put in
a port set expression.

   xslt() -> [source=port("secondary")] -> store("secondary.xml")

And it occurs to me that a port set expression doesn't even need
the port() function if we say that it can refer to the names of
readable ports directly:

   xslt() -> [$secondary] -> store("secondary.xml")

All of the preceding is back-formation from my idea for block
expressions, which is to make them anonymous steps.

xproc version = "2.0";
inputs  $source as document-node();
outputs $result as document-node();

[$source] → λ()[$in;$out] { if (xs:decimal($in/*/@version) < 2.0)
                            then [$in,"v1schema.xsd"] → validate-with-xml-schema() ≫ $out
                            else [$in,"v2schema.xsd"] → validate-with-xml-schema() ≫ $out }
          → [$out,"stylesheet.xsl"] → xslt()
≫ $result

It's a little bit of extra syntax, but I think JavaScript and futures
have made anonymous functions commonplace.

They also afford some interesting flexibility, consider:

xproc version = "2.0";
inputs  $source as document-node();
outputs $result as document-node();
options $minver as xs:decimal := 2.0,
        $v1schema := "v1schema.xsd",
        $v2schema := "v2schema.xsd";

[$source] → λ($vertest as xs:decimal := $minver,
              $v1 as xs:string := $v1schema,
              $v2 as xs:string := $v2schema)
            [$in;$out]
            { if (xs:decimal($in/*/@version) < $vertest)
                            then [$in,$v1] → validate-with-xml-schema() ≫ $out
                            else [$in,$v2] → validate-with-xml-schema() ≫ $out }
          → [$out,"stylesheet.xsl"] → xslt()
≫ $result

I'm not sure how useful that is, really, but ...

                                        Be seeing you,
                                          norm

-- 
Norman Walsh
Lead Engineer
MarkLogic Corporation
Phone: +1 512 761 6676
www.marklogic.com
Received on Friday, 15 April 2016 15:29:28 UTC