Monday Afternoon Tech Plenary Notes from Alex Milowski on 2006-02-28 (public-xml-processing-model-wg@w3.org from February 2006)

From: Alex Milowski <alex@milowski.org>
Date: Tue, 28 Feb 2006 08:03:12 -0800
To: public-xml-processing-model-wg@w3.org
Message-ID: <44047440.9070303@milowski.org>
XPL Presentation (See presentation)

  * Michael: (Clarification)
          The 'infosetref' attribute represents the binding and the 
names
          are internal to the component
          The 'name' attribute is the formal parameter name.

  * Eric: the p:input and p:output declare the name of the inputs and
    outputs that are used to invoke the process and handle the results

  * Norm: (Clarification)
   It is the pipeline processor that looks at the inputs and outputs?

  * Eric: the inputs and outputs are evaluated in a lazy fashion and it
    back-chains through the steps which eventually leads to the input of
    the pipeline.

  * Alex: (Claification)
    How does back chaining work with conditionals?

  * Eric: The output of conditionals needs to have the same infoset name.

  * Eric: XHTML example (use case 5.15: Content-Dependent
          Transformations)
     - one of the use cases.
     - one of the steps rewrites the QNames for presentation in IE
     - one of the steps deals with HTML serialization
     - the output for serialization uses an internal root element node
       for representation of text and binary (character encoded)

  * Eric: Iteration example:
     - lets you iteration over an document via xpath expression
     - the current() function gives you the current item being
       iterated
     - gives you the ability to process large XML document
  * Murray: Does each of the steps have its own XML vocabulary (e.g. HTTP
            serializer)
  * Eric: Yes.
  * Richard: Do they require their own namespaces
  * Eric: No, but there it isn't required as it is contextual to the
    component.  Having another namespace adds declarations to the
    document.

GUI Tool Sub-thread:

  * Richard: Do you have a GUI tool?
  * Eric: No.
  * Richard: we should define the tool in terms of a graph
  * Norm & Michael expressed concern with this as they wouldn't
    want to require a GUI tool.  That starting with a graph could
    ignore the XML representation

Norm's SXPipe:

  * http://norman.walsh.name/2004/06/20/sxpipe

  * Stages are executed in order.  It is handed a DOM and returns a
    DOM.

  * In example, skip attribute allows steps to be skipped.  If statically
    evaluated to true, the step isn't executed.

  * Impl: two methods: init & run.  Init is passed the element that
    represents the stages.  1700 lines of java.


(Alex's presentation here)


Richard's presentation:

  * I want to replace what we do today without a pipeline with an XML
    pipeline.

  * lxgrep - produces a tree fragment (multiple root elements possible)
    via an XPath

  * lxprintf - formats Xpath matches as plain text

      -e element   For each element

  * lxreplace - replaces elements/attributes
      -n   Renames an element

  * lxsort - sorts elements by values identified by an XPath

  * lxviewport - runs a unix command on everything that matches an
    element (like subtree in smallx, viewport in MT pipelines)

  * lxtransduce - ??

  * want to make these pipelines more declarative so people can use them
    without writing code.

  * XSLT is also available

Rui Lupis: (see presentation)

  * APP: Architecture for XML Processing

  * Complex processing support for digital librarys - both developers and
    producers

  * Always a need for some manual purposes.

  * Tiers: a set of pipelines woing on disjoint inputs

  * Pipeline: acyclic diagraph of processors

  * Processor: defined by a URI that differentiates an interface vs
    implementation vs usage.

  * Processing language:

      Project: an RDF document

      Pipeline: mapped to a linear sequence of components

      Registry: An RDF document that registers components & their inputs
                and outputs

  * Pros:
     * Separation of concerns lets you interchange components without
       touching the pipelines.
     * Its an implementatin neutral language
     * and others

  * Cons:
     * No interation/test
     * RDF based
     * Doesn't support generation of XSLT styelsheets
     * Doesn't support chunking

  * Thoughts:
     * Good to have multiple levels of composition (not just xinclude)
     * Indirection is good for batch processing

  Alex: The model is that you define a particular step in the registry
        that is a binding, for example, of an XSLT transform to its
        input+parameters
        to its output.  A pipeline then points to that step and the step
        can be re-used in other pipelines.

  * If the registry changes, the pipeline doesn't have to change.


Infosets:

   Murray:
     * stdin & stdout
     * then there is parameters
     * then there is the notion of input & output
     * then there is the notion of an infoset on the side
     * then there is the notion of artifacts
     * e.g. on a server you might want to store things in a cache

   Norm:
     * storing on a filesystem can be abstract to the idea that outputs
       have a URI and a processor can decide to write them out to disk
       if they want.  Whether that happens isn't a relevant problem.

   Richard:
     * It is quite likely an implementation will need to buffer things
       if you have a pipeline that isn't just a straight line.

   Eric
     * In XPL everything is in scope

   Richard:
     * there is no guarantee that you read things at the same rate, so
       you have to buffer

   Murray:
     There's stiff an output being buffered & cached.  As an output
     you produce foo.infoset and later you consume foo.infoset, then you
     need to store that.

   Eric: you could have a implemention that buffers things to memory or
    alternatively to a disk cache if it is too big

   Murray:
     Before today, I was thinking this was like a unix pipe.
     They could be bringing in separate things, but there is still just a
     pipeline.
     Most things talked about today don't seem like pipelines.

   Richard:
     My stuff is a unix a pipeline.... but that's "just an implementation
     hack" that uses shell programming.

   Eric: The reason you want to serialize is?

   Richard: Because I have a bunch of programs that run on files.  I want
    a language that I can still compile to scripts that serialize to
    files.

    There are other things that things like schema validation might do
    that may not be able to be serialized

   MSM: It is possible to define a non-standard PSVI serialization

   Eric: You can always do this by wrapping components that always
         serialize

   Norm:
     * there are simple components where one documen comes in and one
       goes out
     * there are other ways to thing about things like XSLT:
         - there is one input and an ancillary input (the stylesheet) and
           one output
         - but this isn't always fixed

   Alex: Having a primary input is necessary for streaming
         implemenations.

   Murray: In what case is that there is the stylesheet the input

   Norm: I have a report that is coming out and the report is always the
         same (the input document), but the XSLT is what is generated by
         the pipeline.

   MSM: Why is there emphasis on backward chaining?

   Eric: (diagram on chart w/ parallel steps that start from the same
         start and are aggregated at the end)

      Back chaining is because a step can optionally decide not to get an
      input.  It isn't that easy to understand from a user.

      Specifying order is natural and is a problem.  Users do have
      problems with [controlling] order  You have this problem with XSLT

   Richard: what drives things in XSLT is apply-templates--and that is
            not backward chaining.

            parallel paths are the 1% case

   Alex: There is a whole body of knowledge that deals with network flows
         and we should be in compliance with those known concepts and
         algorithms.

   All: [to alex] You're going to have to prove that you need stdin for
        optimization.


-- 
--Alex Milowski
Received on Tuesday, 28 February 2006 15:48:52 UTC