Minutes for the XML Processing Model WG F2F 2006 August 4 - Afternoon from Alex Milowski on 2006-08-06 (public-xml-processing-model-wg@w3.org from August 2006)

From: Alex Milowski <alex@milowski.org>
Date: Sun, 06 Aug 2006 12:41:06 -0700
To: public-xml-processing-model-wg <public-xml-processing-model-wg@w3.org>
Message-ID: <44D645D2.6050704@milowski.org>
1. Extension elements/attributes:

   * proposal (norm): Elements and attributes that are not in the xproc
                      namespace or standard are gracefully ignored.
   * suggestion (Jeni) "must understand" attribute that means if you
     don't understand, don't run my pipeline.

   * Murray: if you understand other elements and attributes, you can use
     them.

   * Richard: Other than documentation, what are the other use cases?

   * Norm: take as a first measure: "they are ignored".  If I don't
     understand the element/attribute, ignore it.

   Consensus:  You ignore elements and attributes that you don't
               understand.

        Note: You can put extension elements into a when and then use the
              choose to detect that that extension element isn't
              available.

2. Declaring extensions components

    Proposal:

       This can occur in a pipeline library:

       <declare-component name="">
           <declare-input .../>*
           <declare-output .../>*
           <declare-parameter ../>*
       </declare-component>

     Alex: It would be inconvenient if you want to write a pipeline that
           uses a custom component and no pipeline library but you have
           to then put it in a separate document.


3. Serialization control

    * Action: Alex will write a proposal.

4. Documentation

    * Option 1: Documentation is an extension element so we don't have to
                do anything.
    * Option 2: We provide a special element for embedding documentation.

    Straw poll: Is anyone opposed to having a documentation element?  No
                objections (there was some mumbling).

    Henry: Here document elements may not have documentation elements.

    Jeni: Here documents should be able to have multiple elements to
          represent sequences.

    Norm: Don't now how to interpret:

    <input ...>
      <doc/>
      <!-- x -->
       <?x?>
      <doc/>
    </input>

    Henry: Allowing documentation anywhere seems like the right answer
           here.

    Murray: Writing content models that say documentation is allowed here
            is not impossible.

    Alex: Documentation could have a ref to an id of the subject of the
          documentation.

    Murray: Suggestion: Let Norm look at this when writing the draft and
            come up with a solution.

5. Error behavior:

    * Case 1: Component failure: The pipeline always has to produce
              output even in the case of errors.  The pipeline has to
              recover gracefully and still produce output in the case of
              component failure.

      Example:  A step uses a URL that doesn't exist and it gets a 404


     Jeni: All components have an error output.

     Alex: Error recovery is almost like a choose where the test on
           one branch is "everything succeeded" and the other is the
           failure. The output signature is the same.

     Jeni: Do we have to introduce new constructs?

           Question: What information is available in the recovery?

     Henry: The primary input to recovery branch is the error output from
            the failure.

     Richard: Must be a special construct.  Like a conditional but has
              the same outputs.  There must not be things that have
              outputs and produce no outputs.

     Murray: There are processes where there is no end to the output.

     Alex: You can use viewport along with a trap construct to process
           the the stream.

     Norm: If you have errors, do you get any output from the trap?
     Alex: No.

           Trap is like a gate.  It waits till all the outputs are done
           before the output is let out.

           We will have to spend some time thinking about how the failed
           component communicates with the outside world.

     Richard:  The trap construct has two sequences...  (confusion
               amassed).

     Example:

     <try>
        <somewrapper ...>
           <declare-output port="foo" ref="x!y"/>
           <step>...</step>
           ...
        </somewrapper>
        <catch>
            <declare-output port="foo" ref="a!b"/>
             <step>...</step>
        </catch>
     </try>

     There was some discussion for alternatives to group.


     Alex: What can be caught?  Is it only component errors or
           pipeline-level dynamic errors?

     Norm: We are probably going to say that processor error handling is
           implementation defined.

     Richard: Components should produce errors on their output.

     Alex: We could just have an stderr that we can access inside a
           catch--much like the error port on components as suggested by
           Jeni.  Inside a catch, you can use this to get the error
           information from the component.

     Henry: There is a difference between errors and failures. ... there
            are a number of components where non-failure XML is going to
            appear on the error port.

     Jeni: You can get various kinds of errors from XSLT (e.g. static,
           compile time, messages, dynamic errors).  Such a component has
           a number of error ports that could be named.  You could label
           these as error ports and have them go to the bit bucket.

     Richard: I propose you have an output port that can be marked as
              optional.

          1. Allow output ports be able to be marked as optional.
          2. Allow a distinguished required error output port.
          3. The try/catch mechanism provides a straight-forward way
             to hook up to the error port.

     Henry: ... that port is a must-consume and is a run-time
            must-consume.  If anything
            occurs on that port, then the pipeline fail.

     *: No, some implementations may do that but that isn't required.

     Murray: We need to define a component to handle stderr

     Richard: Why do we need a component?  Those errors come output on
              the error port...

     Norm: (pretty robot diagram excluded)

        * all the components have a stderr output and they are all
          connected to the pipeline's stderr output.

        * form the spec point of view, our job is done when we specify
          the output of the pipeline.  Implementations define what to
          do with those outputs when the pipeline is executed.

     Proposal:

       1. We will have a try/catch construct.

       2. Should outputs be allowed to not be hooked up and "pour out
          onto the floor"?

           - marginal preference for pouring onto the floor...
           - objections: none

       3. Should components be able to declare that an output is not
          required to be used?

           - yes

       4. All components will have a error output port.

           - yes

       5. The error port is available in the catch

       6. The "floor" is implementation defined.

       7. Other documents produced on error ports will flow out of
          the stderr port of the pipeline.

       8. The error input on the catch contains the sequence of documents
          on the error port from the component that failed.

    Example:

      <try>
         <declare-output port="out"/>
         <somewrapper>
            <choose>
               <when test="/foo" ref="x!y">
               </when>
               <when test="/foo2" ref="x!y">
               </when>
               <otherwise>
               </otherwise>
            </choose>
         </somewrapper>
         <catch>
             <declare-output port="out"/>
             ...
         </catch>
      </try>

    It would be good for the error XML to indicate:
       * the name of the step
       * an indication of whether it was that step that "died"

6. Standard Components

    Defining aggregation:

       Two possible definitions:
          1. Multiple sequences of documents in and a single sequence on
             a single output port.
          2. Multiple documents on separate ports and a single document
             on a single output port.
          3. Sequence of documents on multiple ports and a single
             document out.
          4. etc. default="undefined"

       Norm: I want the order to be implementation defined.

       Need two components:

         1. Sequence Aggregation:

            <declare-component name="sequence-aggregate">
                <declare-input port="*" sequence="yes"/>
                <declare-output port="result" sequence="yes"/>
                <declare-parameter name="order" default="undefined"/>
            </declare-component>

            Aggregates all the input sequences into one sequence of
            documents.

         2. Document Aggregation

            <declare-component name="document-aggregate">
                <declare-input port="*" sequence="yes"/>
                <declare-output port="result" sequence="no"/>
                <declare-parameter name="sequence-wrapper"/>
                <declare-parameter name="doc-wrapper"/>
                <declare-parameter name="row-wrapper"/>
                <declare-parameter name="order" default="undefined"/>
                <declare-parameter name="transpose"/>
            </declare-component>

            Aggregates all the input documents into a single document
            output whose document element is the name specified in
            'doc-wrapper'.  The rest of the parameters are TBD.

      Identity Component

         <declare-component name="identity">
            <declare-input port="input" sequence="yes"/>
            <declare-output port="result" sequence="yes"/>
         </declare-component>

      Select Component ("Dis-aggregate"):

         <declare-component name="select">
             <declare-input port="input" sequence="yes"/>
             <declare=output port="result sequence="yes"/>
             <declare-parameter name="xpath" required="yes"/>
         </declare-component>

         Applies the xpath to all the input documents and returns
         a sequence of documents--one for each matching element.  The
         XPath has to identify an element or document node.

      The "document" component

         Receives two inputs and always replicates the one of the
         inputs as its output.

         <pipeline name="document">
            <declare-input input="null"/>
            <declare-input input="always"/>
            <declare-output ref="document!always"/>
         </pipeline>

-- 
--Alex Milowski
Received on Sunday, 6 August 2006 19:49:33 UTC