Re: Adjusting Syntax & Semantics - removing the need for port variables, block expressions, and other declarations from Norman Walsh on 2016-04-20 (public-xml-processing-model-wg@w3.org from April 2016)

From: Norman Walsh <ndw@nwalsh.com>
Date: Tue, 19 Apr 2016 19:33:30 -0500
To: public-xml-processing-model-wg@w3.org
Message-ID: <87vb3dnmp1.fsf@nwalsh.com>
Alex Miłowski <alex@milowski.com> writes:
> Here are some random and possibly radical thoughts ...

Thanks, Alex.

> # Conditionals
>
> I've been thinking about conditionals and with the port set expression
> and the concept of current readable ports, we can do away with the
> need for block expressions to wrap a conditional.  A conditional is
> just another expression that can be inserted in a chained sequence
> that produces a set of readable ports.
>
> Here are the rules as I see it:
>
> 1. The condition operates on the current readable port (and whatever
> is lexically in scope).

Do we still have a “current readable port”? I thought we had a set of
ports that came from the preceding step or port set expression.

  [source="doc.xml", lookup="table.xml"] -> my:step()

In what sense is ‘source’ or ‘lookup’ more or less the current
readable port than the other?

Is it equivalent to say that it reads from the first ordinal port for
its context? That might work but it’s awfully limiting not to be able
to have a name for the input in case you want to use it more than
once.

> 2. A single chained sequence is allowed for each branch of the conditional.
>
> 3. A branch's chained sequence operates as if it was embedded in the
> chained sequence.  Whatever branch executes should start and finish as
> if it was used as the replacement for the conditional.
>
> This means we can now do:
>
> $source
>   → if (xs:decimal($source/*/@version) < 2.0)

In the @version test, $source has nothing to do with the input to the
conditional, right? It’s just reading the same $source variable that’s
being fed in.

>        then [schema="v1schema.xsd"] → validate-with-xml-schema()

I don’t understand this at all. I could understand

         then [$source, schema="v1schema.xsd"] → validate-with-xml-schema()

but you seem to be explicitly avoiding that.

>        else [schema="v2schema.xsd"] → validate-with-xml-schema()
>   → [port(),"stylesheet.xsl"]
>   → xslt()
>   ≫ $result
>
> # No Port Variables
>
> Now, secondarily, I think we don't want to conflate values with ports.
> Ports are things that can be streamed and possibly span system
> boundaries.  Variables bind to values for use in expressions or for
> options.  We can compute values from ports and that creates a
> dependency on the flow of data (and system boundaries).
>
> So, maybe ports shouldn't use the $ syntax.  We'll need to bind the
> default context for XPath to the first readable port for our
> expression to work.  We can revisit that complexity later.
>
> Here is the previous chain without $ syntax:
>
> source
>   → if (xs:decimal(/*/@version) < 2.0)

Ok. I think I see. But just to be clear, I could still do this if
I wanted to, right?

  source >> $src
  source -> if (xs:decimal($src/*/@version) …

>        then [schema="v1schema.xsd"] → validate-with-xml-schema()

But I still really, really don’t understand how the default context is
implicitly being inserted into the port bindings for
validate-with-xml-schema(). What’s more, I don’t think you necessarily
always want it to be the first one. What if what was being passed in
was the schema and what varied was the document that I wanted to
parse?

>        else [schema="v2schema.xsd"] → validate-with-xml-schema()
>   → [port(),"stylesheet.xsl"]
>   → xslt()
>   ≫ result
>
> # No Pipeline Declarations

I think we’re going to decide that we need some, even if we don’t need
the ones we’ve got today. I’m all for fewer required declarations; I’m
less sanguine about forbidding explicitness; and I’m very reluctant to
forbid the concept of pipeline declarations.

> I don't think we need to differentiate between the declaration of a
> flow and the pipeline.  We conflated these in XProc 1.0.

Good.

> We can learn everything by inspecting the flow.
>
> Implementations should feel free to invoke the most obvious thing
> (e.g. the last flow) or require a user to provide a name on
> invocation.
>
> Doing this gets rid of a top-level declaration but requires a flow
> wrapper for more complicated pipelines and invocation scenarios.
>
> Our example reduces to adding a version label:
>
> xproc version = "2.0";
>
> source
>   → if (xs:decimal(/*/@version) < 2.0)
>        then [schema="v1schema.xsd"] → validate-with-xml-schema()
>        else [schema="v2schema.xsd"] → validate-with-xml-schema()
>   → [port(),"stylesheet.xsl"]
>   → xslt()
>   ≫ result
>
> An implementation can deduce that "source" and "result" are unbound
> and need the user to bind them for invocation.

Mmmmmm. I might come around, but I’ll have to think about that for a
while. It’s going to lead to some odd error messages. Imagine that
you’ve got a pipeline that has several flows and reuses “source” a few
times. In one case, you misspell it “sourc”. Instead of getting a
“reference to undeclared input on line 653” you’re going to gets
“binding required for port ‘sourc’ on line 653”. I guess we’ll get
used to that, but…

Is the goal to allow brevity or require brevity?

> # No Options
>
> Because we have no pipeline declaration, if you want options, you need
> to wrap things in a flow:
>
> xproc version = "2.0";
>
> [ source : document-node() ] flow($mode : xs:string = '') [ result :
> document-node() ]
> {
> source
>   → if (xs:decimal(/*/@version) < 2.0)
>        then [schema="v1schema.xsd"] → validate-with-xml-schema()
>        else [schema="v2schema.xsd"] → validate-with-xml-schema()
>   → [port(),"stylesheet.xsl"]
>   → xslt(mode=$mode)
>   ≫ result
> }
>
> Now an implementation needs a parameter to invoke the flow.  This also
> presumes it would pick the last flow.

Uuuuuhhhmmmmm. I see the mathematical elegance, but I’m not sure it’s
exactly user friendly.

> # Cleaning Up Flow Declarations
>
> The syntax for anonymous flows:
>
>    [] flow(...) [] { ... }
>
> and named:
>
>    declare flow [] name(...) [] { ... }
>
> is possibly confusing.

It’s also not winning me over aesthetically.

Given the choice between:

   [ source : document-node() ]
   flow($mode : xs:string = '')
   [ result : document-node() ] { … }

and, for example,

   [ source : document-node(); result : document-node()]
   flow($mode : xs:string = '') { … }

I find the former more of a challenge.

> I think distinguishing between operation (step) parameters and flow
> input/output ports is really important.  Such operations have a
> signature of pre-conditions that consists of input ports, the
> invocation with parameters, and post conditions that manifest as the
> readable ports.  That's why I've ordered the syntax as above.
>
> I wonder whether the chain operator might make it more readable:
>
>    [ ] → flow(...) → []

That strikes me as potentially confusing. It’s just too easy to read
that as some kind of flow in parenthesis. Maybe a different delimiter:

     [ ] :: flow(...) :: []

What happens if we just use more words:

  declare flow
    requires inputs  [ source ]
    produces outputs [ result ]
  {
    source
      → if (xs:decimal(/*/@version) < 2.0)
           then [schema="v1schema.xsd"] → validate-with-xml-schema()
           else [schema="v2schema.xsd"] → validate-with-xml-schema()
      → [port(),"stylesheet.xsl"]
      → xslt(mode=$mode)
      ≫ result
  }

  declare flow named validate
    requires inputs  [ source ]
    produces outputs [ result ]
  { … }



                                        Be seeing you,
                                          norm

-- 
Norman Walsh
Lead Engineer
MarkLogic Corporation
Phone: +1 512 761 6676
www.marklogic.com
Received on Wednesday, 20 April 2016 00:34:04 UTC