Re: The first five minutes ... a thought experiment (long) from James Fuller on 2014-02-20 (xproc-dev@w3.org from February 2014)

From: James Fuller <james.fuller.2007@gmail.com>
Date: Thu, 20 Feb 2014 12:31:57 +0100
To: Alex Milowski <alex@milowski.com>
Cc: XProc Dev <xproc-dev@w3.org>
Message-ID: <CADsXk-Z5PkEc--mNLDCr8vA-Z0NjTh2k6yFZ49bKb3+Z64-xQw@mail.gmail.com>

Hello Alex,

oh sure, I (personally) agree that the document boundary is needed ... but
that does not necc imply there is nothing we can do to make life easier in
this scenario.

Also the requirements doc not saying item()* does not mean we cannot
discuss at WG level ... IMHO discussion is still open on all requirements
and perhaps at next telcon I will flesh out an approach (or not) ... at
this stage we should listen to the community, tell them our timeline and be
careful to balance off what is most important with available resource/time.

I would put to the list, between the 2 options ... non xml docs or atomic
xdm values, which is more important ? I hope people will agree that non xml
docs are more important right now (esp if we can facilitate making life
easier in other scenarios) ... but now is the time for people to speak up.

thx, J




On Thu, Feb 20, 2014 at 12:10 PM, Alex Milowski <alex@milowski.com> wrote:

> On Tue, Feb 18, 2014 at 11:46 AM, James Fuller <jim@webcomposite.com>
> wrote:
>
> >
> > Documents flowing through a pipeline is a fundamental concept in xproc
> > eg. data flowing through pipe whose connections to steps are defined
> > by bindings. This is classic data flow language, though the decision
> > in v1 was to only allow XML documents flow through.
> >
> > In XProc vnext we are considering allowing item()* with non xml
> > documents flowing through pipes, which would address your requirement
> > (I think)
> >
>
> The requirement, as stated is:
>
> "Experience has shown that real-world pipelines often involve non-XML
> documents. Several workarounds have been invented for special cases.
> The limitation that V1.0 can only pass XML between steps makes some
> pipelines difficult, if not impossible, to write.
>
> Providing the ability to allow non-XML documents to flow between steps
> opens up the possibility of writing simple pipelines to work with
> images, JSON, Turtle, EPUB, etc."
>
> It isn't saying "item()*" because those things do not currently have
> representations in the XDM.  While we could extend the XDM to handle
> these items, I think that would be difficult to justify and certainly
> a large task to get right.
>
> I'm inclined to just say "they are documents with metadata" and let
> implementors do the right thing.  This allows for use cases like
> processing large video files with XProc where the binary is a
> reference to a stream handle that steps process using streaming.  This
> allows the application of language technology to the audio tracks for
> automatic captioning or other video annotations.
>
> You can't do that with an XDM value right now as the model is
> typically requires the whole value to be represented.  There are
> certainly ways to specify such things without requiring complete
> instantiation but that will require a sufficient amount of time to get
> correct.  The result would be either we'd get it wrong it XProc V2
> would take far too long.  We are also not the WG in charge of the XDM.
>
> As such, I'd rather that non-XML documents become a bit more abstract
> to allow implementors to do whatever is necessary to meet the
> requirements we set forth in the specification. Whether an
> implementation can handle large binaries will just be a feature or
> quality of implementation question.
>
>
> [1] http://www.w3.org/TR/xproc-v2-req/#non-xml-documents
>
> --
> --Alex Milowski
> "The excellence of grammar as a guide is proportional to the paucity of the
> inflexions, i.e. to the degree of analysis effected by the language
> considered."
>
> Bertrand Russell in a footnote of Principles of Mathematics
>
>

Received on Thursday, 20 February 2014 11:32:24 UTC