RE: New draft XProc 2.0 spec from Toman, Vojtech on 2014-10-29 (public-xml-processing-model-wg@w3.org from October 2014)

From: Toman, Vojtech <vojtech.toman@emc.com>
Date: Wed, 29 Oct 2014 06:52:53 -0400
To: "public-xml-processing-model-wg@w3.org" <public-xml-processing-model-wg@w3.org>
CC: "public-xml-processing-model-comments@w3.org" <public-xml-processing-model-comments@w3.org>
Message-ID: <F3C7EBECE80AC346BE4D1C5A9BB4A41F3043060834@MX11A.corp.emc.com>
Hi all,

First, congratulations (and thanks!) for the draft, it's a big step forward.

I don't know where to best post my comments, whether on Github or here, so I chose e-mail. I am not addressing each proposal individually. I simple read the whole thing from top to bottom and the comments follow the current document order.

Regards,
Vojtech

---

2.2 Documents:

  "Definition: A document is a representation and its [Media Types]."

  Since the media type is a required document property, wouldn't the
  following be more correct?

  "A document is a representation and its properties. [...] The
  document properties must always include a 'content-type' key which
  identifies the [Media Types] of the representation. [...]"

  Is the base URI part of the document properties?

2.3 Inputs and Outputs:

  "Within a compound step, the declared outputs of the step can be
  connected to any of the various available outputs of contained steps
  in combination with other inputs (see Section 2.5, 'Connections')."

  I find "in combination with other inputs" potentially
  confusing. What about "data sources" instead of "inputs"?

  Typo: "Input ports may specify a *cointent* type, or list of content
  types, that they accept. If an input port provides a set of
  acceptable content types, it is a dynamic error (err:XD1003) if an
  input document that arrives on the port has a content type not in
  that set. "

  Regarding the last part of the above sentence, I think it should say
  something like this (in order to support wildcards and more general
  content types): "if an input document that arrives on the port has a
  content type that does not match any content type in that set."

2.5.1 Namespace Fixup

  I think this section needs to make it more clear that it applies to
  XML documents only

2.6.1 Initial Environment

  "Definition: An initial environment is a connection for each of the
  readable ports and a set of option bindings used to construct the
  in-scope bindings."

  I think we need to be clearer whether the initial environment is
  used before the processor starts evaluating the pipeline, or whether
  it is used when the processor "enters" the top-level pipeline and
  starts executing its contents. I think the intention is the latter,
  but it took me some time to understand that.

2.8.9 Document properties

  p:document-properties($doc as document-node()) as
  map(xs:string,xs:string)

  As defined, p:document-properties cannot be used on non-XML
  documents which is kind of disappointing (but understandable). In
  order to access document properties of an arbitrary document, we
  will need a get-properties step, I think.

2.12 Options

  "Some steps require a set of name/value pairs for the operations
  they perform. For example, an author may specify parameters to an
  XSLT transformation or external variables to an XQuery. Such values
  are passed to the step via an option that requires a map item value
  [XSLT 3.0] . The map item contains the mapping of between the names
  and the values whose interpretation is specific to the step."

  Personally, I would say along these lines: "[...] The usual
  (recommended??) approach to pass such values via an option that
  requires a map item value [XSLT 3.0]; the map item contains the
  mapping of between the names and the values whose interpretation is
  specific to the step. [...]"

  Using maps is just one specific approach that we have adopted in the
  V2 standard step library. But users are free to come up with their
  own solutions in their custom steps.

4.8.1 Syntactic Shortcut for Option Values

  "If the option value includes curly braces, it is treated as an
  attribute value template. The context node for attribute value
  templates in an option shortcut value comes from the default
  readable port for the step on which they occur. If there is no such
  port, the context node is undefined."

  ... plus the usual bit: "It is a dynamic error (err:XD0026) if the
  expression makes reference to the context node, size, or position
  when the context item is undefined"

  I think we have to be more specific about what constitutes an AVT
  and what is just a constant value + rules for escaping etc. For
  example, what happens if you do <my:step separator="{"/>?

5.1 p:input

  The grammar still uses the media-types attribute but the text that
  follows talks about content-types

5.13 p:document

  The the document has a non-XML content type, the behavior is
  implementation defined?

7.1.4 p:cast-content-type

  What if the document content type says "application/octet-stream"
  but I know that it is in fact "application/xml"? In this case I
  really cannot use p:cast-content-type to simply fix the content type
  - the data will end up in a c:data wrapper...

  Instead of having the c:data wrapping and the base64
  encoding/decoding logic in p:cast-content-type, I think I would
  rather have the step simply change the content-type document
  property (without any checks, wrappers etc.) and then have special
  purpose base64 encode/decode steps (maybe with a "wrapper"
  option). The p:unescape-markup is close to this.

  Also, what about supporting sequences of documents? The way it is
  specified now, you will have to use a for loop.

  I am also thinking that it might be practical to add a
  "cast-content-type" attribute to p:pipe for a quick (and compact)
  way of setting/fixing the content-type. You would not have to
  pollute your pipeline graph with extra steps.

7.1.23 p:set-properties

  I think we will need p:get-properties as well.

  I wonder: do the values of the properties have to be strings?
  Suppose I am processing images - it might be useful to pass the
  thumbnails with them as well (I admit this is a completely made up
  use case, but you should get the idea)?

7.1.25 p:split-sequence (and maybe other steps as well, but this will
be for a longer discussion)

  Why can't you apply this step to non-XML documents? It might still
  be useful to be able to do things like splitting the sequence based
  on the even/odd position etc. But of course this would require
  amendments to the XDM in XProc.

7.1.26: p:store

  I think the serialization options apply only to XML media types

7.1.33: p:xslt

  We will need to say something about the content types that flow out
  of the step and what they are based on (output properties in the
  stylesheet?)


> -----Original Message-----
> From: Norman Walsh [mailto:ndw@nwalsh.com]
> Sent: Wednesday, October 29, 2014 2:21 AM
> To: public-xml-processing-model-wg@w3.org
> Cc: public-xml-processing-model-comments@w3.org
> Subject: New draft XProc 2.0 spec
> 
> Hello folks,
> 
> Following the editing efforts that Alex and I put in over the last couple of
> days at TPAC and with the WG's consent, we have adopted five proposals as
> part of the new status quo draft:
> 
> Norm attempted to address:
> 
>     https://github.com/xproc/specification/issues/29
> 
>        2.2 Integrate non-XML documents into pipelines
> 
>     https://github.com/xproc/specification/issues/46
> 
>        3.2 Associate arbitrary metadata with documents
> 
>     https://github.com/xproc/specification/issues/33
> 
>        2.6 Allow attribute value templates
> 
>     https://github.com/xproc/specification/issues/39
> 
>        2.7.6 Syntax: allow curly brace expansion in p:inline
> 
>     https://github.com/xproc/specification/issues/84
> 
>        add <p:option name="version"/> for p:xquery
> 
>     https://github.com/xproc/specification/issues/83
> 
>        add <p:option name="version"/> for p:validate-with-xml-schema
> 
> Alex attempted to address
> 
>     https://github.com/xproc/specification/issues/67
> 
>        Editorialize the term "connection"
> 
> Alex also incorporated updates with respect to parameters as options.
> 
> The new draft, with diffs from the previous status quo draft can be found at
> 
>   https://xproc.github.io/specification/langspec/xproc20/head/
> 
> In addition, diffs for the individual proposals can be seen at
> 
>   https://ndw.github.io/specification/
> 
> and
> 
>   https://alexmilowski.github.io/specification/
> 
>                                         Be seeing you,
>                                           norm
> 
> --
> Norman Walsh
> Lead Engineer
> MarkLogic Corporation
> Phone: +1 512 761 6676
> www.marklogic.com
Received on Wednesday, 29 October 2014 10:53:44 UTC