Adjusting Syntax & Semantics - removing the need for port variables, block expressions, and other declarations from Alex Miłowski on 2016-04-19 (public-xml-processing-model-wg@w3.org from April 2016)

From: Alex Miłowski <alex@milowski.com>
Date: Tue, 19 Apr 2016 16:20:11 -0700
To: XProc WG <public-xml-processing-model-wg@w3.org>
Message-ID: <CABp3FNLAR8Z2iUmONVNmcCGro+YwwWoFR3fyzymmRGUM_TyjhQ@mail.gmail.com>
Here are some random and possibly radical thoughts ...


# Conditionals

I've been thinking about conditionals and with the port set expression
and the concept of current readable ports, we can do away with the
need for block expressions to wrap a conditional.  A conditional is
just another expression that can be inserted in a chained sequence
that produces a set of readable ports.

Here are the rules as I see it:

1. The condition operates on the current readable port (and whatever
is lexically in scope).

2. A single chained sequence is allowed for each branch of the conditional.

3. A branch's chained sequence operates as if it was embedded in the
chained sequence.  Whatever branch executes should start and finish as
if it was used as the replacement for the conditional.

This means we can now do:

$source
  → if (xs:decimal($source/*/@version) < 2.0)
       then [schema="v1schema.xsd"] → validate-with-xml-schema()
       else [schema="v2schema.xsd"] → validate-with-xml-schema()
  → [port(),"stylesheet.xsl"]
  → xslt()
  ≫ $result


# No Port Variables

Now, secondarily, I think we don't want to conflate values with ports.
Ports are things that can be streamed and possibly span system
boundaries.  Variables bind to values for use in expressions or for
options.  We can compute values from ports and that creates a
dependency on the flow of data (and system boundaries).

So, maybe ports shouldn't use the $ syntax.  We'll need to bind the
default context for XPath to the first readable port for our
expression to work.  We can revisit that complexity later.

Here is the previous chain without $ syntax:

source
  → if (xs:decimal(/*/@version) < 2.0)
       then [schema="v1schema.xsd"] → validate-with-xml-schema()
       else [schema="v2schema.xsd"] → validate-with-xml-schema()
  → [port(),"stylesheet.xsl"]
  → xslt()
  ≫ result


# No Pipeline Declarations

I don't think we need to differentiate between the declaration of a
flow and the pipeline.  We conflated these in XProc 1.0.

We can learn everything by inspecting the flow.

Implementations should feel free to invoke the most obvious thing
(e.g. the last flow) or require a user to provide a name on
invocation.

Doing this gets rid of a top-level declaration but requires a flow
wrapper for more complicated pipelines and invocation scenarios.

Our example reduces to adding a version label:

xproc version = "2.0";

source
  → if (xs:decimal(/*/@version) < 2.0)
       then [schema="v1schema.xsd"] → validate-with-xml-schema()
       else [schema="v2schema.xsd"] → validate-with-xml-schema()
  → [port(),"stylesheet.xsl"]
  → xslt()
  ≫ result

An implementation can deduce that "source" and "result" are unbound
and need the user to bind them for invocation.


# No Options

Because we have no pipeline declaration, if you want options, you need
to wrap things in a flow:

xproc version = "2.0";

[ source : document-node() ] flow($mode : xs:string = '') [ result :
document-node() ]
{
source
  → if (xs:decimal(/*/@version) < 2.0)
       then [schema="v1schema.xsd"] → validate-with-xml-schema()
       else [schema="v2schema.xsd"] → validate-with-xml-schema()
  → [port(),"stylesheet.xsl"]
  → xslt(mode=$mode)
  ≫ result
}

Now an implementation needs a parameter to invoke the flow.  This also
presumes it would pick the last flow.

When that is ambiguous, halt-and-catch fire ... require the user to
name things and invoke them by name:

xproc version = "2.0";

declare flow [ source : document-node() ] validate() [ result :
document-node() ]
{
source
  → if (xs:decimal(/*/@version) < 2.0)
       then [schema="v1schema.xsd"] → validate-with-xml-schema()
       else [schema="v2schema.xsd"] → validate-with-xml-schema()
  ≫ result
}

declare flow [ source : document-node() ] transform($mode : xs:string
= '') [ result : document-node() ]
{
source
  → validate()
  → [port(),"stylesheet.xsl"]
  → xslt(mode=$mode)
  ≫ result
}


# Cleaning Up Flow Declarations

The syntax for anonymous flows:

   [] flow(...) [] { ... }

and named:

   declare flow [] name(...) [] { ... }

is possibly confusing.

I think distinguishing between operation (step) parameters and flow
input/output ports is really important.  Such operations have a
signature of pre-conditions that consists of input ports, the
invocation with parameters, and post conditions that manifest as the
readable ports.  That's why I've ordered the syntax as above.

I wonder whether the chain operator might make it more readable:

   [ ] → flow(...) → []

I think we would want a delimiter on the outside:

   ([ ] → flow(...) → []) { ... }

and

   declare flow ([ ] → name(...) → []) { ... }

and how we have a type signature expression of

    [ ] → flow(...) → []

for all flows - named or otherwise.

The anonymous example becomes (dropping types for compactness):

xproc version = "2.0";

( [ source ] →  flow($mode) → [ result ])
{
  source
    → if (xs:decimal(/*/@version) < 2.0)
         then [schema="v1schema.xsd"] → validate-with-xml-schema()
         else [schema="v2schema.xsd"] → validate-with-xml-schema()
    → [port(),"stylesheet.xsl"]
    → xslt(mode=$mode)
    ≫ result
 }


The named example:

xproc version = "2.0";

declare flow ([ source ] → validate() → [ result ])
{
  source
    → if (xs:decimal(/*/@version) < 2.0)
         then [schema="v1schema.xsd"] → validate-with-xml-schema()
         else [schema="v2schema.xsd"] → validate-with-xml-schema()
    ≫ result
}

declare flow ([ source ] → transform($mode) → [ result ])
{
  source
    → validate()
    → [port(),"stylesheet.xsl"]
    → xslt(mode=$mode)
    ≫ result
}


-- 
--Alex Miłowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics
Received on Tuesday, 19 April 2016 23:20:39 UTC