- From: Toman, Vojtech <vojtech.toman@emc.com>
- Date: Wed, 29 Oct 2014 06:52:53 -0400
- To: "public-xml-processing-model-wg@w3.org" <public-xml-processing-model-wg@w3.org>
- CC: "public-xml-processing-model-comments@w3.org" <public-xml-processing-model-comments@w3.org>
Hi all, First, congratulations (and thanks!) for the draft, it's a big step forward. I don't know where to best post my comments, whether on Github or here, so I chose e-mail. I am not addressing each proposal individually. I simple read the whole thing from top to bottom and the comments follow the current document order. Regards, Vojtech --- 2.2 Documents: "Definition: A document is a representation and its [Media Types]." Since the media type is a required document property, wouldn't the following be more correct? "A document is a representation and its properties. [...] The document properties must always include a 'content-type' key which identifies the [Media Types] of the representation. [...]" Is the base URI part of the document properties? 2.3 Inputs and Outputs: "Within a compound step, the declared outputs of the step can be connected to any of the various available outputs of contained steps in combination with other inputs (see Section 2.5, 'Connections')." I find "in combination with other inputs" potentially confusing. What about "data sources" instead of "inputs"? Typo: "Input ports may specify a *cointent* type, or list of content types, that they accept. If an input port provides a set of acceptable content types, it is a dynamic error (err:XD1003) if an input document that arrives on the port has a content type not in that set. " Regarding the last part of the above sentence, I think it should say something like this (in order to support wildcards and more general content types): "if an input document that arrives on the port has a content type that does not match any content type in that set." 2.5.1 Namespace Fixup I think this section needs to make it more clear that it applies to XML documents only 2.6.1 Initial Environment "Definition: An initial environment is a connection for each of the readable ports and a set of option bindings used to construct the in-scope bindings." I think we need to be clearer whether the initial environment is used before the processor starts evaluating the pipeline, or whether it is used when the processor "enters" the top-level pipeline and starts executing its contents. I think the intention is the latter, but it took me some time to understand that. 2.8.9 Document properties p:document-properties($doc as document-node()) as map(xs:string,xs:string) As defined, p:document-properties cannot be used on non-XML documents which is kind of disappointing (but understandable). In order to access document properties of an arbitrary document, we will need a get-properties step, I think. 2.12 Options "Some steps require a set of name/value pairs for the operations they perform. For example, an author may specify parameters to an XSLT transformation or external variables to an XQuery. Such values are passed to the step via an option that requires a map item value [XSLT 3.0] . The map item contains the mapping of between the names and the values whose interpretation is specific to the step." Personally, I would say along these lines: "[...] The usual (recommended??) approach to pass such values via an option that requires a map item value [XSLT 3.0]; the map item contains the mapping of between the names and the values whose interpretation is specific to the step. [...]" Using maps is just one specific approach that we have adopted in the V2 standard step library. But users are free to come up with their own solutions in their custom steps. 4.8.1 Syntactic Shortcut for Option Values "If the option value includes curly braces, it is treated as an attribute value template. The context node for attribute value templates in an option shortcut value comes from the default readable port for the step on which they occur. If there is no such port, the context node is undefined." ... plus the usual bit: "It is a dynamic error (err:XD0026) if the expression makes reference to the context node, size, or position when the context item is undefined" I think we have to be more specific about what constitutes an AVT and what is just a constant value + rules for escaping etc. For example, what happens if you do <my:step separator="{"/>? 5.1 p:input The grammar still uses the media-types attribute but the text that follows talks about content-types 5.13 p:document The the document has a non-XML content type, the behavior is implementation defined? 7.1.4 p:cast-content-type What if the document content type says "application/octet-stream" but I know that it is in fact "application/xml"? In this case I really cannot use p:cast-content-type to simply fix the content type - the data will end up in a c:data wrapper... Instead of having the c:data wrapping and the base64 encoding/decoding logic in p:cast-content-type, I think I would rather have the step simply change the content-type document property (without any checks, wrappers etc.) and then have special purpose base64 encode/decode steps (maybe with a "wrapper" option). The p:unescape-markup is close to this. Also, what about supporting sequences of documents? The way it is specified now, you will have to use a for loop. I am also thinking that it might be practical to add a "cast-content-type" attribute to p:pipe for a quick (and compact) way of setting/fixing the content-type. You would not have to pollute your pipeline graph with extra steps. 7.1.23 p:set-properties I think we will need p:get-properties as well. I wonder: do the values of the properties have to be strings? Suppose I am processing images - it might be useful to pass the thumbnails with them as well (I admit this is a completely made up use case, but you should get the idea)? 7.1.25 p:split-sequence (and maybe other steps as well, but this will be for a longer discussion) Why can't you apply this step to non-XML documents? It might still be useful to be able to do things like splitting the sequence based on the even/odd position etc. But of course this would require amendments to the XDM in XProc. 7.1.26: p:store I think the serialization options apply only to XML media types 7.1.33: p:xslt We will need to say something about the content types that flow out of the step and what they are based on (output properties in the stylesheet?) > -----Original Message----- > From: Norman Walsh [mailto:ndw@nwalsh.com] > Sent: Wednesday, October 29, 2014 2:21 AM > To: public-xml-processing-model-wg@w3.org > Cc: public-xml-processing-model-comments@w3.org > Subject: New draft XProc 2.0 spec > > Hello folks, > > Following the editing efforts that Alex and I put in over the last couple of > days at TPAC and with the WG's consent, we have adopted five proposals as > part of the new status quo draft: > > Norm attempted to address: > > https://github.com/xproc/specification/issues/29 > > 2.2 Integrate non-XML documents into pipelines > > https://github.com/xproc/specification/issues/46 > > 3.2 Associate arbitrary metadata with documents > > https://github.com/xproc/specification/issues/33 > > 2.6 Allow attribute value templates > > https://github.com/xproc/specification/issues/39 > > 2.7.6 Syntax: allow curly brace expansion in p:inline > > https://github.com/xproc/specification/issues/84 > > add <p:option name="version"/> for p:xquery > > https://github.com/xproc/specification/issues/83 > > add <p:option name="version"/> for p:validate-with-xml-schema > > Alex attempted to address > > https://github.com/xproc/specification/issues/67 > > Editorialize the term "connection" > > Alex also incorporated updates with respect to parameters as options. > > The new draft, with diffs from the previous status quo draft can be found at > > https://xproc.github.io/specification/langspec/xproc20/head/ > > In addition, diffs for the individual proposals can be seen at > > https://ndw.github.io/specification/ > > and > > https://alexmilowski.github.io/specification/ > > Be seeing you, > norm > > -- > Norman Walsh > Lead Engineer > MarkLogic Corporation > Phone: +1 512 761 6676 > www.marklogic.com
Received on Wednesday, 29 October 2014 10:53:45 UTC