Re: The first five minutes ... a thought experiment (long) from Alex Milowski on 2014-02-20 (xproc-dev@w3.org from February 2014)

From: Alex Milowski <alex@milowski.com>
Date: Thu, 20 Feb 2014 11:10:25 +0000
To: XProc Dev <xproc-dev@w3.org>
Message-ID: <CABp3FN+9G84HpuG8ckCwWPymFo21y41UyeRf94H8nS=0BthNaA@mail.gmail.com>

On Tue, Feb 18, 2014 at 11:46 AM, James Fuller <jim@webcomposite.com> wrote:

>
> Documents flowing through a pipeline is a fundamental concept in xproc
> eg. data flowing through pipe whose connections to steps are defined
> by bindings. This is classic data flow language, though the decision
> in v1 was to only allow XML documents flow through.
>
> In XProc vnext we are considering allowing item()* with non xml
> documents flowing through pipes, which would address your requirement
> (I think)
>

The requirement, as stated is:

"Experience has shown that real-world pipelines often involve non-XML
documents. Several workarounds have been invented for special cases.
The limitation that V1.0 can only pass XML between steps makes some
pipelines difficult, if not impossible, to write.

Providing the ability to allow non-XML documents to flow between steps
opens up the possibility of writing simple pipelines to work with
images, JSON, Turtle, EPUB, etc."

It isn't saying "item()*" because those things do not currently have
representations in the XDM.  While we could extend the XDM to handle
these items, I think that would be difficult to justify and certainly
a large task to get right.

I'm inclined to just say "they are documents with metadata" and let
implementors do the right thing.  This allows for use cases like
processing large video files with XProc where the binary is a
reference to a stream handle that steps process using streaming.  This
allows the application of language technology to the audio tracks for
automatic captioning or other video annotations.

You can't do that with an XDM value right now as the model is
typically requires the whole value to be represented.  There are
certainly ways to specify such things without requiring complete
instantiation but that will require a sufficient amount of time to get
correct.  The result would be either we'd get it wrong it XProc V2
would take far too long.  We are also not the WG in charge of the XDM.

As such, I'd rather that non-XML documents become a bit more abstract
to allow implementors to do whatever is necessary to meet the
requirements we set forth in the specification. Whether an
implementation can handle large binaries will just be a feature or
quality of implementation question.

[1] http://www.w3.org/TR/xproc-v2-req/#non-xml-documents

-- 
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics

Received on Thursday, 20 February 2014 11:11:02 UTC