Re: Moving away from an expression language from Alex Miłowski on 2016-02-13 (public-xml-processing-model-wg@w3.org from February 2016)

From: Alex Miłowski <alex@milowski.com>
Date: Sat, 13 Feb 2016 15:25:35 +0100
To: XProc WG <public-xml-processing-model-wg@w3.org>
Message-ID: <CABp3FNLYvjP=cKdhgs-HjCEHbESKeGxfpPwKT9X+rD62XtkmkA@mail.gmail.com>
I've been thinking about this more and removing the expression
language complicates the simple cases far too much.  Yet, I really
don't want to require embedding XPath for environments where there is
no need for it.  Meanwhile, it those environments, the expression
language might be something else (e.g., a chunk of JavaScript).

We can still conceptualize an expression as something like step that
acts on inputs and produces outputs.  In all our uses, the results of
expressions have contextual uses (e.g., controlling the flo via a
conditional).

First, let's observe that in scripting languages, it is common to
evaluate expressions with some syntax or function call.  We can
certainly choose to evaluate expressions as step invocations where the
step is passed the script as a parameter:

  eval("xs:decimal($1/*/@version) < 2.0")

We can shorten this by a specialized syntax:

  (`xs:decimal($1/*/@version) < 2.0`)

and then we need to define the parameters:

  [$x as document-node()](`xs:decimal($x/*/@version) < 2.0`)

Now, it a JavaScript, that might be:

  [$x as document-node()](`parseFloat(x.documentElement.getAttribute("version"))<2.0`)

Finally, we need a declaration that indicates the scripting language used:

  declare default script "text/javascript";

I'm not sure it is a good idea to mix scripting languages.  We may
want to allow that for extensibility.

So, our example 3 becomes:

xproc version = "2.0";

declare default script "application/xquery";

inputs $source as document-node();
outputs $result as document-node();

$source → { if ([$x as document-node](`xs:int($x/*/@version) < 2.0`)($source))
                     then [$source,"v1schema.xsd"] →
validate-with-xml-schema() ≫ $result
                    else [$source,"v2schema.xsd"] →
validate-with-xml-schema() ≫ $result
                  }
             → [$1,"stylesheet.xsl"] → xslt() ≫ $result

That syntax might be more verbose when used inline but if we allowed
the current in-scope output ports by default:

$source → { if ((`xs:int($source/*/@version) < 2.0`))
                     then [$source,"v1schema.xsd"] →
validate-with-xml-schema() ≫ $result
                    else [$source,"v2schema.xsd"] →
validate-with-xml-schema() ≫ $result
                  }
             → [$1,"stylesheet.xsl"] → xslt() ≫ $result

Because we have flow declarations, you can package scripts without a
step declaration:

declare flow check-version()
  inputs $source as document-node()
 outputs $result as xs:boolean
{
   if ((`xs:int($source/*/@version) < 2.0`))
   then true ≫ $result
   else false ≫ $result
}

assuming we have a way for atomic values to be literals (e.g. a boolean value).

Finally, it would be nice to be able to write this:

declare flow check-version()
  inputs $source as document-node()
 outputs $result as xs:boolean
{
   (`xs:int($source/*/@version) < 2.0`) ≫ $result
}

On Fri, Feb 12, 2016 at 4:01 PM, Alex Miłowski <alex@milowski.com> wrote:
> One of the risks we have is that the use of XPath as an expression
> language makes XProc difficult for non-XML environments where no such
> implementation exists.
>
> We have several uses of XPath:
>
> 1. Projections (e.g., $in//section)
> 2. replace ($in//section) { ...}
> 3. variables
> 4. conditionals
>
> For (1) we make projects a step specific to the data format.
>
> For (2) ... not sure.
>
> For (3): variables are output port variables and we dump let.  You
> need to use a step to do manipulations.  We need to make embedding
> steps or mapping them to implementations possible.
>
> For (4):
>
> I think we can get rid of the expression language by enhancing the
> step description so the expressions can be put into a step as its
> implementation and the step gets used in a flow.
>
> So, we currently have this:
>
> if (xs:decimal($1/*/@version) < 2.0)
> then [$1,"v1schema.xsd"] → validate-with-xml-schema() ≫ @1
> else [$1,"v2schema.xsd"] → validate-with-xml-schema() ≫ @1
>
> we would now have:
>
> step check-version1()
>     inputs $source as document-node()
>   outputs $result as xs:boolean
>   from "my:check-version1" in "script.xq";
>
> $1 → check-version1() ≫ $isv1
> if ($isv1)
> then [$1,"v1schema.xsd"] → validate-with-xml-schema() ≫ @1
> else [$1,"v2schema.xsd"] → validate-with-xml-schema() ≫ @1
>
> and "script.xq" is:
>
> function my:check-version1($source) as xs:boolean
> {
>    return xs:decimal($1/*/@version) < 2.0 d
> }
>
>
> Now, this is now two files instead of one.  We can fix this by
> allowing embedding of the script.  It is unclear how the parsing would
> work:
>
> step check-version1()
>     inputs $source as document-node()
>   outputs $result as xs:boolean
>   script "application/xquery"
> {
>    return xs:decimal($1/*/@version) < 2.0 d
> }
>
> Also, when there is more than one output port, the return will be more
> complicated and need to be a map.  In other languages, it will be a
> similar construct.
>
> We probably want simple literal comparisons to enable steps to return
> emulated values that then control which flow is executed.
>
>
> --
> --Alex Miłowski
> "The excellence of grammar as a guide is proportional to the paucity of the
> inflexions, i.e. to the degree of analysis effected by the language
> considered."
>
> Bertrand Russell in a footnote of Principles of Mathematics



-- 
--Alex Miłowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics
Received on Saturday, 13 February 2016 14:26:04 UTC