Re: Supporting non-XML data in XProc

"Toman, Vojtech" <vojtech.toman@emc.com> writes:
> 4. To allow for non-XML support in XPath, we will introduce a number of XDM
>    extensions:

Do we have to make them extensions, or can this be an implementation detail?
I can imagine, for example, having my own DocumentSuperNode type that is
passed between steps. It wraps either an XdmNode in the case of XML or a
BinaryNode in the case of non-XML.

> 5. Shimming
>
>    While evaluating a pipeline, the XProc processor performs the following
>    algorithm when data appears on a port of a step:
[...]
>    [[Note: Some aspects of the above algorithm, especially the fall-back
>    behavior, may be questionable. This definitely needs some discussion.]]

At a first reading, the rules you propose seem reasonable.

>    An important aspect of the above algorithm is that it applies not only to the
>    input ports, but also to the output ports: before the data appears on an
>    output port, it is converted to the appropriate media type.

Am I right that this is only an issue for compound steps? If I write
an (atomic) extension step that asserts it produces application/xml
and at runtime it actually produces image/jpeg, is that an "error"
that the XProc processor is supposed to detect and correct?

>    The media type conversion applies only to the p:input and p:output
>    elements. It does not take place when the XProc processor processes the
>    p:with-option, p:with-param, and p:variable elements, nor the
>    p:xpath-context, p:iteration-source, and p:viewport-source elements. It also
>    does not apply when the XProc processor evaluates the test expressions of
>    p:choose/p:when elements. In these cases, the XPath expressions use the
>    original data as the context item.

How tricky is that? From an XPath expression, is a binary document
just an empty document node? Does "/foo" return false, "count(//foo)"
return 0, etc.? What does string-length() return?

>    The kinds of mappings between different media types the XProc processor
>    supports is left implementation-defined.
>
>    [[Note: I think that to make the "shimming" feature interoperable and
>    actually useful at all, it should not be left too implementation-defined. I
>    think we would have to define a bunch of mappings between common media types
>    that the users can rely on. The downside of this is that this might be quite
>    hard and it might shift the focus of this specification into a whole
>    different direction.

How many different mappings does your implementation support today?

I wonder if we can mitigate the interop problem by having a mechanism
for the pipeline to declare what mappings it needs? At least then a
processor can reject a pipeline statically with a reasonable error
message: "error: pipeline requires unsupported image/png to text/plain
conversion."

>    A radical approach might be not to support shimming at all and simply say
>    that if data of incompatible media type arrives, you get a dynamic
>    error. Conversion between different media types can be left to
>    special-purpose custom (or standardized?) steps.]]

That seems less user-friendly.

                                        Be seeing you,
                                          norm

-- 
Norman Walsh
Lead Engineer
MarkLogic Corporation
Phone: +1 512 761 6676
www.marklogic.com

Received on Friday, 14 September 2012 20:30:52 UTC