W3C home > Mailing lists > Public > public-xml-processing-model-wg@w3.org > October 2012

Another approach to supporting non-XML data

From: Alex Milowski <alex@milowski.com>
Date: Thu, 4 Oct 2012 11:00:58 -0700
Message-ID: <CABp3FNLA6ovV8fhRkRcBpidgsbguAMQvXQ0d9bM2gEggdR1w2A@mail.gmail.com>
To: XProc WG <public-xml-processing-model-wg@w3.org>
I've been thinking about a different approach to non-XML data that has
two basic properties:

1. XML always flows between steps.

2. Binary data streams are accessible via "internally resolvable" URIs.

For example, if a p:http-request step returns an entity body with
content type "image/jpeg", the response could be constructed as:

<c:response status="200">
<c:body content-type="image/jpeg" href="binary:1234"/>
</c:response>

where "binary:1234" is some implementation defined URI that is
resolvable within the processor.  In theory, that allows implementors
to hook into the URI handling of the implementation language to
actually resolve and read the binary data within third-party tools.

Similarly, inputs or p:load could return a reference via c:data:

<c:data content-type="image/jpeg" href="binary:xyzzy"/>

This still allows unicode character streams to be wrapped as we've
done in the past.  A user of XProc would probably want the ability to
control whether unicode character streams are wrapped or referenced.

This avoids the need to change/extend the XDM to support non-XML data.

We don't need XPath extension functions as the media type is just an attribute.

Steps that consume binary data would expect a c:data (or similar)
element to appear on the input port with a resolvable reference to
binary data.  We would probably still want the ability to annotate
inputs on step declarations with certain expectations.

It also makes generated data via data URIs uniform as this:

<c:data content-type="image/png"
href="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA
AAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO
9TXL0Y4OHwAAAABJRU5ErkJggg=="/>

is the same as:

<c:data content-type="image/png"
href="http://upload.wikimedia.org/wikipedia/commons/3/31/Red-dot-5px.png"/>

is the same as:

<c:data content-type="image/png" href="binary:1234"/>

where the internally resolvable URI generated from something like a
p:load on the same image resource.


Did someone say resource manager?  Henry?


-- 
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics
Received on Thursday, 4 October 2012 18:01:27 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 4 October 2012 18:01:28 GMT