- From: Toman, Vojtech <vojtech.toman@emc.com>
- Date: Wed, 29 Oct 2014 06:52:53 -0400
- To: "public-xml-processing-model-wg@w3.org" <public-xml-processing-model-wg@w3.org>
- CC: "public-xml-processing-model-comments@w3.org" <public-xml-processing-model-comments@w3.org>
Hi all,
First, congratulations (and thanks!) for the draft, it's a big step forward.
I don't know where to best post my comments, whether on Github or here, so I chose e-mail. I am not addressing each proposal individually. I simple read the whole thing from top to bottom and the comments follow the current document order.
Regards,
Vojtech
---
2.2 Documents:
"Definition: A document is a representation and its [Media Types]."
Since the media type is a required document property, wouldn't the
following be more correct?
"A document is a representation and its properties. [...] The
document properties must always include a 'content-type' key which
identifies the [Media Types] of the representation. [...]"
Is the base URI part of the document properties?
2.3 Inputs and Outputs:
"Within a compound step, the declared outputs of the step can be
connected to any of the various available outputs of contained steps
in combination with other inputs (see Section 2.5, 'Connections')."
I find "in combination with other inputs" potentially
confusing. What about "data sources" instead of "inputs"?
Typo: "Input ports may specify a *cointent* type, or list of content
types, that they accept. If an input port provides a set of
acceptable content types, it is a dynamic error (err:XD1003) if an
input document that arrives on the port has a content type not in
that set. "
Regarding the last part of the above sentence, I think it should say
something like this (in order to support wildcards and more general
content types): "if an input document that arrives on the port has a
content type that does not match any content type in that set."
2.5.1 Namespace Fixup
I think this section needs to make it more clear that it applies to
XML documents only
2.6.1 Initial Environment
"Definition: An initial environment is a connection for each of the
readable ports and a set of option bindings used to construct the
in-scope bindings."
I think we need to be clearer whether the initial environment is
used before the processor starts evaluating the pipeline, or whether
it is used when the processor "enters" the top-level pipeline and
starts executing its contents. I think the intention is the latter,
but it took me some time to understand that.
2.8.9 Document properties
p:document-properties($doc as document-node()) as
map(xs:string,xs:string)
As defined, p:document-properties cannot be used on non-XML
documents which is kind of disappointing (but understandable). In
order to access document properties of an arbitrary document, we
will need a get-properties step, I think.
2.12 Options
"Some steps require a set of name/value pairs for the operations
they perform. For example, an author may specify parameters to an
XSLT transformation or external variables to an XQuery. Such values
are passed to the step via an option that requires a map item value
[XSLT 3.0] . The map item contains the mapping of between the names
and the values whose interpretation is specific to the step."
Personally, I would say along these lines: "[...] The usual
(recommended??) approach to pass such values via an option that
requires a map item value [XSLT 3.0]; the map item contains the
mapping of between the names and the values whose interpretation is
specific to the step. [...]"
Using maps is just one specific approach that we have adopted in the
V2 standard step library. But users are free to come up with their
own solutions in their custom steps.
4.8.1 Syntactic Shortcut for Option Values
"If the option value includes curly braces, it is treated as an
attribute value template. The context node for attribute value
templates in an option shortcut value comes from the default
readable port for the step on which they occur. If there is no such
port, the context node is undefined."
... plus the usual bit: "It is a dynamic error (err:XD0026) if the
expression makes reference to the context node, size, or position
when the context item is undefined"
I think we have to be more specific about what constitutes an AVT
and what is just a constant value + rules for escaping etc. For
example, what happens if you do <my:step separator="{"/>?
5.1 p:input
The grammar still uses the media-types attribute but the text that
follows talks about content-types
5.13 p:document
The the document has a non-XML content type, the behavior is
implementation defined?
7.1.4 p:cast-content-type
What if the document content type says "application/octet-stream"
but I know that it is in fact "application/xml"? In this case I
really cannot use p:cast-content-type to simply fix the content type
- the data will end up in a c:data wrapper...
Instead of having the c:data wrapping and the base64
encoding/decoding logic in p:cast-content-type, I think I would
rather have the step simply change the content-type document
property (without any checks, wrappers etc.) and then have special
purpose base64 encode/decode steps (maybe with a "wrapper"
option). The p:unescape-markup is close to this.
Also, what about supporting sequences of documents? The way it is
specified now, you will have to use a for loop.
I am also thinking that it might be practical to add a
"cast-content-type" attribute to p:pipe for a quick (and compact)
way of setting/fixing the content-type. You would not have to
pollute your pipeline graph with extra steps.
7.1.23 p:set-properties
I think we will need p:get-properties as well.
I wonder: do the values of the properties have to be strings?
Suppose I am processing images - it might be useful to pass the
thumbnails with them as well (I admit this is a completely made up
use case, but you should get the idea)?
7.1.25 p:split-sequence (and maybe other steps as well, but this will
be for a longer discussion)
Why can't you apply this step to non-XML documents? It might still
be useful to be able to do things like splitting the sequence based
on the even/odd position etc. But of course this would require
amendments to the XDM in XProc.
7.1.26: p:store
I think the serialization options apply only to XML media types
7.1.33: p:xslt
We will need to say something about the content types that flow out
of the step and what they are based on (output properties in the
stylesheet?)
> -----Original Message-----
> From: Norman Walsh [mailto:ndw@nwalsh.com]
> Sent: Wednesday, October 29, 2014 2:21 AM
> To: public-xml-processing-model-wg@w3.org
> Cc: public-xml-processing-model-comments@w3.org
> Subject: New draft XProc 2.0 spec
>
> Hello folks,
>
> Following the editing efforts that Alex and I put in over the last couple of
> days at TPAC and with the WG's consent, we have adopted five proposals as
> part of the new status quo draft:
>
> Norm attempted to address:
>
> https://github.com/xproc/specification/issues/29
>
> 2.2 Integrate non-XML documents into pipelines
>
> https://github.com/xproc/specification/issues/46
>
> 3.2 Associate arbitrary metadata with documents
>
> https://github.com/xproc/specification/issues/33
>
> 2.6 Allow attribute value templates
>
> https://github.com/xproc/specification/issues/39
>
> 2.7.6 Syntax: allow curly brace expansion in p:inline
>
> https://github.com/xproc/specification/issues/84
>
> add <p:option name="version"/> for p:xquery
>
> https://github.com/xproc/specification/issues/83
>
> add <p:option name="version"/> for p:validate-with-xml-schema
>
> Alex attempted to address
>
> https://github.com/xproc/specification/issues/67
>
> Editorialize the term "connection"
>
> Alex also incorporated updates with respect to parameters as options.
>
> The new draft, with diffs from the previous status quo draft can be found at
>
> https://xproc.github.io/specification/langspec/xproc20/head/
>
> In addition, diffs for the individual proposals can be seen at
>
> https://ndw.github.io/specification/
>
> and
>
> https://alexmilowski.github.io/specification/
>
> Be seeing you,
> norm
>
> --
> Norman Walsh
> Lead Engineer
> MarkLogic Corporation
> Phone: +1 512 761 6676
> www.marklogic.com
Received on Wednesday, 29 October 2014 10:53:45 UTC