another attempt at synthesis from James Fuller on 2016-04-27 (public-xml-processing-model-wg@w3.org from April 2016)

From: James Fuller <jim@webcomposite.com>
Date: Wed, 27 Apr 2016 10:53:37 +0200
To: XProc WG <public-xml-processing-model-wg@w3.org>
Message-ID: <CAEaz5msa95qxOHpFc6rqbkdsHb-X_B9AoSy==upKKv49243ffQ@mail.gmail.com>
after digesting Norm/Alex recent emails here is another attempt to
synthesis /clarify:

  flow = graph of components connected via bindings
  component = step with input/output bindings
  input bindings  =  set of named binding
  output bindings  =  set of named binding
  binding = ( variable | pipe )
  pipe =
  variable =
  value = atomised (variable | pipe )

and another attempt to rewrite my last example in long form:

1    xproc version = "2.0";
2
3    inputs [ $source as document-node()];
4    outputs [ $result as document-node()];
5
6    $p:expression-language = "'com.marklogic:xquery:1.0-ml";
7    $version as xs:decimal = 3.0;
8
9    [$source=ref($source)] ->
10     → (xs:decimal(/*/@version) < $version){[$source=ref($source),
$schema="v1schema.xsd"] → validate-with-xml-schema()}
11          (xs:decimal(/*/@version) >
$version){[$source=ref($source), $schema="v1schema.xsd"] →
validate-with-xml-schema()}
12     → [$source=ref($source), $stylesheet="stylesheet.xsl"]
13     → xslt()
14     ≫ [$myresult as document-node() = ref($result)]
15
16  "http://www.example.org/someotherdata.xml"
17    ≫ $result
18
19  [$source=ref($myresults)]
 20   → add-attribute(match="/",attribute-name="id",
attribute-value="el{count($myresults)}")
 21   ≫ $result

the following is an analysis attempting to identify scope of all
bindings at any particular moment in a flow(s).

1
2
3    outer $source
4    outer $source, outer $result
5    outer $source, outer $result
6    $p:expression-languge (for brevity ignored later on)
7    $version
8    outer $source, outer $result
9    $version,  outer $source, outer $result, inner $source set by
ref(), inner $source
10  $version, inner $source set by ref(), inner $schema set by uri
ref, inner $result
11  $version, inner $source set by ref(), inner $schema set by uri
ref, inner $result
12  $version, inner $source set by ref(), inner $stylesheet set by uri ref
13  $version, inner $source, inner $stylesheet, inner $result, inner $secondary
14  $version, outer $source, outer $result, inner $result, inner
$secondary, $myresult
15  $version, outer $source, outer $result, $myresult
16  $version, outer $source, outer $result, $myresult, inner $source
set by uri ref, inner $result
17  $version, outer $source, outer $result, $myresult, inner $result
18  $version, outer $source, outer $result, $myresult
19  $version, outer $source, outer $result, $myresult
20  $version, inner $source, inner $result
21  $version, outer $source, outer $result, inner $result, $myresult

this analysis hints at a potentially clean separation between operators eg. this

  $source ->

is equiv. to

 [$source = ref($source)] ->

where ref($source) refers to outer $source.

the converse is where things could have got complicated.

  >> [$myresult as document-node() = ref($result)]

but that is easily fixed as we could just say that if ref() (when used
on rhs) always refers to inner bindings.

Probing this a bit further (and 'Pálit od boku' eg. shooting from the hip)

  "someuri.xml" >> $result

is

  [$source=doc("someuri.xml)] -> identity() ->[$result= ref($result)]

FWIW, I think we need to allow doc() in the binding expressions
(instead of represent as a document-get() step).

You could then pipe output bindings to each other

  $source >> $output

which is

  [$source=$source] -> identity() ->[$output= ref($result)]

and it follows that

  $output1 >> $output2

is

  [$source=$output1] -> identity() ->[$output2= ref($result)]

though what really is going on 'under the covers' is our ordinal story.

  >> $myresult

is internally

  >> [$myresult = ref(1)]

and

  >> [$myresult = ref("result")]

is provided as 'sugar'. This allows us to leave behind minted names
for inputs/output bindings.

We know multiple output bindings are potentially most complicated

  xslt() >> $result

as this internally 'means'

  >> [$result= ref(1)]

so then

  >> $result, $secondary

is valid ... or even the following might have some charm

  >> $result
  >> $secondary

to repeat myself, both are internally equiv. to

  >> [$result = ref(1), $secondary= ref(2)]

where ref on rhs refers to inner output bindings.

I am less interested in the n+1 unknown number of output ports but I
could see this working with some kind of map of bindings

  >> [ $result[] as document-node()]

thats about enough mental games.

------------------------------

Stepping back the short form of the flow example would be

xproc version = "2.0";
inputs [ $source as document-node()];
outputs [ $result as document-node()];

$version as xs:decimal = 3.0;

$source->
  → (xs:decimal(/*/@version) < $version){[$schema="v1schema.xsd"] →
validate-with-xml-schema()}
       (xs:decimal(/*/@version) > $version){[$schema="v1schema.xsd"] →
validate-with-xml-schema()}
  → [$stylesheet="stylesheet.xsl"]
  → xslt()
 ≫ $myresult as document-node()

"http://www.example.org/someotherdata.xml"
  ≫ $result

$myresults
  → add-attribute(match="/",attribute-name="id",
attribute-value="el{count($myresults)}")
  ≫ $result

where heuristics for determining scope for bindings are:

  [ ... ] -> set input binding and ref() refers to outer bindings

  >> [...] set output binding and ref() refers to inner bindings

thoughts ?


thx, J

open issues
* as I've written the branching logic is a bit tedious, we should be
able to refine over time
* its not entirely clear how we can set multiple bindings to outer $result
* denoting pipe as input/output is part of flow functional signature
but might also imply read/write constraints which we currently do not
surface
* ordinal is the real story and probably still has a lot of dark corners
Received on Wednesday, 27 April 2016 08:54:05 UTC