Re: Monday Afternoon Tech Plenary Notes from Erik Bruchez on 2006-03-01 (public-xml-processing-model-wg@w3.org from March 2006)

From: Erik Bruchez <ebruchez@orbeon.com>
Date: Wed, 01 Mar 2006 14:55:52 +0100
CC: public-xml-processing-model-wg@w3.org
Message-ID: <4405A7E8.8060701@orbeon.com>
One minor comment:

   s/Eric/Erik

Also, I don't know what my "In XPL everything is in scope" means, as 
that is not the case in XPL. I don't remember what I meant to say.

-Erik

Alex Milowski wrote:
> 
> 
> XPL Presentation (See presentation)
> 
>  * Michael: (Clarification)
>          The 'infosetref' attribute represents the binding and the names
>          are internal to the component
>          The 'name' attribute is the formal parameter name.
> 
>  * Eric: the p:input and p:output declare the name of the inputs and
>    outputs that are used to invoke the process and handle the results
> 
>  * Norm: (Clarification)
>   It is the pipeline processor that looks at the inputs and outputs?
> 
>  * Eric: the inputs and outputs are evaluated in a lazy fashion and it
>    back-chains through the steps which eventually leads to the input of
>    the pipeline.
> 
>  * Alex: (Claification)
>    How does back chaining work with conditionals?
> 
>  * Eric: The output of conditionals needs to have the same infoset name.
> 
>  * Eric: XHTML example (use case 5.15: Content-Dependent
>          Transformations)
>     - one of the use cases.
>     - one of the steps rewrites the QNames for presentation in IE
>     - one of the steps deals with HTML serialization
>     - the output for serialization uses an internal root element node
>       for representation of text and binary (character encoded)
> 
>  * Eric: Iteration example:
>     - lets you iteration over an document via xpath expression
>     - the current() function gives you the current item being
>       iterated
>     - gives you the ability to process large XML document
>  * Murray: Does each of the steps have its own XML vocabulary (e.g. HTTP
>            serializer)
>  * Eric: Yes.
>  * Richard: Do they require their own namespaces
>  * Eric: No, but there it isn't required as it is contextual to the
>    component.  Having another namespace adds declarations to the
>    document.
> 
> GUI Tool Sub-thread:
> 
>  * Richard: Do you have a GUI tool?
>  * Eric: No.
>  * Richard: we should define the tool in terms of a graph
>  * Norm & Michael expressed concern with this as they wouldn't
>    want to require a GUI tool.  That starting with a graph could
>    ignore the XML representation
> 
> Norm's SXPipe:
> 
>  * http://norman.walsh.name/2004/06/20/sxpipe
> 
>  * Stages are executed in order.  It is handed a DOM and returns a
>    DOM.
> 
>  * In example, skip attribute allows steps to be skipped.  If statically
>    evaluated to true, the step isn't executed.
> 
>  * Impl: two methods: init & run.  Init is passed the element that
>    represents the stages.  1700 lines of java.
> 
> 
> (Alex's presentation here)
> 
> 
> Richard's presentation:
> 
>  * I want to replace what we do today without a pipeline with an XML
>    pipeline.
> 
>  * lxgrep - produces a tree fragment (multiple root elements possible)
>    via an XPath
> 
>  * lxprintf - formats Xpath matches as plain text
> 
>      -e element   For each element
> 
>  * lxreplace - replaces elements/attributes
>      -n   Renames an element
> 
>  * lxsort - sorts elements by values identified by an XPath
> 
>  * lxviewport - runs a unix command on everything that matches an
>    element (like subtree in smallx, viewport in MT pipelines)
> 
>  * lxtransduce - ??
> 
>  * want to make these pipelines more declarative so people can use them
>    without writing code.
> 
>  * XSLT is also available
> 
> Rui Lupis: (see presentation)
> 
>  * APP: Architecture for XML Processing
> 
>  * Complex processing support for digital librarys - both developers and
>    producers
> 
>  * Always a need for some manual purposes.
> 
>  * Tiers: a set of pipelines woing on disjoint inputs
> 
>  * Pipeline: acyclic diagraph of processors
> 
>  * Processor: defined by a URI that differentiates an interface vs
>    implementation vs usage.
> 
>  * Processing language:
> 
>      Project: an RDF document
> 
>      Pipeline: mapped to a linear sequence of components
> 
>      Registry: An RDF document that registers components & their inputs
>                and outputs
> 
>  * Pros:
>     * Separation of concerns lets you interchange components without
>       touching the pipelines.
>     * Its an implementatin neutral language
>     * and others
> 
>  * Cons:
>     * No interation/test
>     * RDF based
>     * Doesn't support generation of XSLT styelsheets
>     * Doesn't support chunking
> 
>  * Thoughts:
>     * Good to have multiple levels of composition (not just xinclude)
>     * Indirection is good for batch processing
> 
>  Alex: The model is that you define a particular step in the registry
>        that is a binding, for example, of an XSLT transform to its
>        input+parameters
>        to its output.  A pipeline then points to that step and the step
>        can be re-used in other pipelines.
> 
>  * If the registry changes, the pipeline doesn't have to change.
> 
> 
> Infosets:
> 
>   Murray:
>     * stdin & stdout
>     * then there is parameters
>     * then there is the notion of input & output
>     * then there is the notion of an infoset on the side
>     * then there is the notion of artifacts
>     * e.g. on a server you might want to store things in a cache
> 
>   Norm:
>     * storing on a filesystem can be abstract to the idea that outputs
>       have a URI and a processor can decide to write them out to disk
>       if they want.  Whether that happens isn't a relevant problem.
> 
>   Richard:
>     * It is quite likely an implementation will need to buffer things
>       if you have a pipeline that isn't just a straight line.
> 
>   Eric
>     * In XPL everything is in scope
> 
>   Richard:
>     * there is no guarantee that you read things at the same rate, so
>       you have to buffer
> 
>   Murray:
>     There's stiff an output being buffered & cached.  As an output
>     you produce foo.infoset and later you consume foo.infoset, then you
>     need to store that.
> 
>   Eric: you could have a implemention that buffers things to memory or
>    alternatively to a disk cache if it is too big
> 
>   Murray:
>     Before today, I was thinking this was like a unix pipe.
>     They could be bringing in separate things, but there is still just a
>     pipeline.
>     Most things talked about today don't seem like pipelines.
> 
>   Richard:
>     My stuff is a unix a pipeline.... but that's "just an implementation
>     hack" that uses shell programming.
> 
>   Eric: The reason you want to serialize is?
> 
>   Richard: Because I have a bunch of programs that run on files.  I want
>    a language that I can still compile to scripts that serialize to
>    files.
> 
>    There are other things that things like schema validation might do
>    that may not be able to be serialized
> 
>   MSM: It is possible to define a non-standard PSVI serialization
> 
>   Eric: You can always do this by wrapping components that always
>         serialize
> 
>   Norm:
>     * there are simple components where one documen comes in and one
>       goes out
>     * there are other ways to thing about things like XSLT:
>         - there is one input and an ancillary input (the stylesheet) and
>           one output
>         - but this isn't always fixed
> 
>   Alex: Having a primary input is necessary for streaming
>         implemenations.
> 
>   Murray: In what case is that there is the stylesheet the input
> 
>   Norm: I have a report that is coming out and the report is always the
>         same (the input document), but the XSLT is what is generated by
>         the pipeline.
> 
>   MSM: Why is there emphasis on backward chaining?
> 
>   Eric: (diagram on chart w/ parallel steps that start from the same
>         start and are aggregated at the end)
> 
>      Back chaining is because a step can optionally decide not to get an
>      input.  It isn't that easy to understand from a user.
> 
>      Specifying order is natural and is a problem.  Users do have
>      problems with [controlling] order  You have this problem with XSLT
> 
>   Richard: what drives things in XSLT is apply-templates--and that is
>            not backward chaining.
> 
>            parallel paths are the 1% case
> 
>   Alex: There is a whole body of knowledge that deals with network flows
>         and we should be in compliance with those known concepts and
>         algorithms.
> 
>   All: [to alex] You're going to have to prove that you need stdin for
>        optimization.
> 
>
Received on Wednesday, 1 March 2006 13:58:28 UTC