W3C home > Mailing lists > Public > public-xml-processing-model-wg@w3.org > October 2012

RE: Another approach to supporting non-XML data

From: Toman, Vojtech <vojtech.toman@emc.com>
Date: Fri, 5 Oct 2012 03:32:08 -0400
To: XProc WG <public-xml-processing-model-wg@w3.org>
Message-ID: <F3C7EBECE80AC346BE4D1C5A9BB4A41F2EE72F949B@MX11A.corp.emc.com>
I like the idea as it is simple and does not break/change too many things in the language. And, in a way, it extends on a principle that users are already used to - passing around URI references (typically "file:...") instead of forcing the non-XML data (base64-encoded, wrapped, ...) through the pipeline.

I actually think if we changed the steps such as p:store, p:xsl-formatter etc. so that:

- they return <c:data href="..."/>  instead of <c:result>...</c:result>
- the "result" output port is primary

then we could get an almost transparent support for non-XML data: steps that understand the <c:data href="..."/> representation would retrieve the referenced data, and steps that don't, would treat the <c:data/> as an XML document like any other. For example, most steps from the current standard step library would treat it as XML, but a potential new p:zip step might actually dereference the href to get to the external content.

By making the output port of p:store etc. primary we would get rid of most of the current problems associated with passing around URI references and ensuring correct runtime sequencing of steps.

Having a standardized "binary" URI scheme may not be necessary, although it would probably be beneficial for interoperability. For instance, in our XProc processor we support "transient:..." URIs for storing content that is available during the lifetime of the pipeline.

I also wonder how important streamability of non-XML data is.

Regards,
Vojtech

--
Vojtech Toman
Consultant Software Engineer
EMC | Information Intelligence Group
vojtech.toman@emc.com
http://developer.emc.com/xmltech

> -----Original Message-----
> From: Alex Milowski [mailto:alex@milowski.com]
> Sent: Thursday, October 04, 2012 8:01 PM
> To: XProc WG
> Subject: Another approach to supporting non-XML data
> 
> I've been thinking about a different approach to non-XML data that has
> two basic properties:
> 
> 1. XML always flows between steps.
> 
> 2. Binary data streams are accessible via "internally resolvable" URIs.
> 
> For example, if a p:http-request step returns an entity body with
> content type "image/jpeg", the response could be constructed as:
> 
> <c:response status="200">
> <c:body content-type="image/jpeg" href="binary:1234"/>
> </c:response>
> 
> where "binary:1234" is some implementation defined URI that is
> resolvable within the processor.  In theory, that allows implementors
> to hook into the URI handling of the implementation language to
> actually resolve and read the binary data within third-party tools.
> 
> Similarly, inputs or p:load could return a reference via c:data:
> 
> <c:data content-type="image/jpeg" href="binary:xyzzy"/>
> 
> This still allows unicode character streams to be wrapped as we've
> done in the past.  A user of XProc would probably want the ability to
> control whether unicode character streams are wrapped or referenced.
> 
> This avoids the need to change/extend the XDM to support non-XML data.
> 
> We don't need XPath extension functions as the media type is just an
> attribute.
> 
> Steps that consume binary data would expect a c:data (or similar)
> element to appear on the input port with a resolvable reference to
> binary data.  We would probably still want the ability to annotate
> inputs on step declarations with certain expectations.
> 
> It also makes generated data via data URIs uniform as this:
> 
> <c:data content-type="image/png"
> href="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA
> AAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO
> 9TXL0Y4OHwAAAABJRU5ErkJggg=="/>
> 
> is the same as:
> 
> <c:data content-type="image/png"
> href="http://upload.wikimedia.org/wikipedia/commons/3/31/Red-dot-
> 5px.png"/>
> 
> is the same as:
> 
> <c:data content-type="image/png" href="binary:1234"/>
> 
> where the internally resolvable URI generated from something like a
> p:load on the same image resource.
> 
> 
> Did someone say resource manager?  Henry?
> 
> 
> --
> --Alex Milowski
> "The excellence of grammar as a guide is proportional to the paucity of
> the
> inflexions, i.e. to the degree of analysis effected by the language
> considered."
> 
> Bertrand Russell in a footnote of Principles of Mathematics
> 
Received on Friday, 5 October 2012 07:32:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 5 October 2012 07:32:54 GMT