Re: New draft XProc 2.0 spec

"Toman, Vojtech" <> writes:
> 2.2 Documents:
>   "Definition: A document is a representation and its [Media Types]."
>   Since the media type is a required document property, wouldn't the
>   following be more correct?
>   "A document is a representation and its properties. [...] The
>   document properties must always include a 'content-type' key which
>   identifies the [Media Types] of the representation. [...]"


>   Is the base URI part of the document properties?

Yes, I think we all agreed to that but it didn't get mentioned
explicitly. I've just corrected that.

> 2.3 Inputs and Outputs:
>   "Within a compound step, the declared outputs of the step can be
>   connected to any of the various available outputs of contained steps
>   in combination with other inputs (see Section 2.5, 'Connections')."
>   I find "in combination with other inputs" potentially
>   confusing. What about "data sources" instead of "inputs"?

I tried to clarify that without adding a new term which we'd have to

>   Typo: "Input ports may specify a *cointent* type, or list of content
>   types, that they accept. If an input port provides a set of
>   acceptable content types, it is a dynamic error (err:XD1003) if an
>   input document that arrives on the port has a content type not in
>   that set. "
>   Regarding the last part of the above sentence, I think it should say
>   something like this (in order to support wildcards and more general
>   content types): "if an input document that arrives on the port has a
>   content type that does not match any content type in that set."

Fixed and fixed.

> 2.5.1 Namespace Fixup
>   I think this section needs to make it more clear that it applies to
>   XML documents only

Ok. This is one of the sections that clearly needs to be moved
somewhere else when we get around to reviewing the overall structure
of the document.

> 2.6.1 Initial Environment
>   "Definition: An initial environment is a connection for each of the
>   readable ports and a set of option bindings used to construct the
>   in-scope bindings."
>   I think we need to be clearer whether the initial environment is
>   used before the processor starts evaluating the pipeline, or whether
>   it is used when the processor "enters" the top-level pipeline and
>   starts executing its contents. I think the intention is the latter,
>   but it took me some time to understand that.

I don't disagree, but I also don't see any straightforward way to say
that. Alex, what do you think?

> 2.8.9 Document properties
>   p:document-properties($doc as document-node()) as
>   map(xs:string,xs:string)
>   As defined, p:document-properties cannot be used on non-XML
>   documents which is kind of disappointing (but understandable). In
>   order to access document properties of an arbitrary document, we
>   will need a get-properties step, I think.

Ugh. You're right. But it should work on any document. I've stuck in
an editorial note that we need to see if we can figure something out.

> 2.12 Options
>   "Some steps require a set of name/value pairs for the operations
>   they perform. For example, an author may specify parameters to an
>   XSLT transformation or external variables to an XQuery. Such values
>   are passed to the step via an option that requires a map item value
>   [XSLT 3.0] . The map item contains the mapping of between the names
>   and the values whose interpretation is specific to the step."
>   Personally, I would say along these lines: "[...] The usual
>   (recommended??) approach to pass such values via an option that
>   requires a map item value [XSLT 3.0]; the map item contains the
>   mapping of between the names and the values whose interpretation is
>   specific to the step. [...]"
>   Using maps is just one specific approach that we have adopted in the
>   V2 standard step library. But users are free to come up with their
>   own solutions in their custom steps.

Ok. I tried to say that.

> 4.8.1 Syntactic Shortcut for Option Values
>   "If the option value includes curly braces, it is treated as an
>   attribute value template. The context node for attribute value
>   templates in an option shortcut value comes from the default
>   readable port for the step on which they occur. If there is no such
>   port, the context node is undefined."
>   ... plus the usual bit: "It is a dynamic error (err:XD0026) if the
>   expression makes reference to the context node, size, or position
>   when the context item is undefined"
>   I think we have to be more specific about what constitutes an AVT
>   and what is just a constant value + rules for escaping etc. For
>   example, what happens if you do <my:step separator="{"/>?

That's an error. Curly braces are treated as AVTs. But I'm sure we can
say that more clearly.

> 5.1 p:input
>   The grammar still uses the media-types attribute but the text that
>   follows talks about content-types


> 5.13 p:document
>   The the document has a non-XML content type, the behavior is
>   implementation defined?

Kinda. I tried to clarify that.

> 7.1.4 p:cast-content-type

I elevated this one to an issue. I have reservations about making it
easy to "lie" about the content type.

> 7.1.23 p:set-properties
>   I think we will need p:get-properties as well.
>   I wonder: do the values of the properties have to be strings?
>   Suppose I am processing images - it might be useful to pass the
>   thumbnails with them as well (I admit this is a completely made up
>   use case, but you should get the idea)?

I'd rather finesse the function so that it can operation on any kind
of document.

At the moment, the values have to be strings.

> 7.1.25 p:split-sequence (and maybe other steps as well, but this will
> be for a longer discussion)
>   Why can't you apply this step to non-XML documents? It might still
>   be useful to be able to do things like splitting the sequence based
>   on the even/odd position etc. But of course this would require
>   amendments to the XDM in XProc.

If we're going to want to do this kind of thing, I predict we're
pretty quickly going to talk ourselves back into the other binary
proposal, the one were they were all XML documents, but a document of
(paraphrasing from memory), the form

   <p:binary href="some:magic-uri-scheme"/>

was a stand-in for the binary content that was stored by the
implementation in some way.

We went back and forth about how to deal with binary.

> 7.1.26: p:store
>   I think the serialization options apply only to XML media types


> 7.1.33: p:xslt
>   We will need to say something about the content types that flow out
>   of the step and what they are based on (output properties in the
>   stylesheet?)

Issue #117,

Thanks, Vojtech!

                                        Be seeing you,

Norman Walsh
Lead Engineer
MarkLogic Corporation
Phone: +1 512 761 6676

Received on Tuesday, 18 November 2014 23:20:17 UTC