W3C home > Mailing lists > Public > public-xml-processing-model-wg@w3.org > January 2006

Minutes for XProc WG telcon of 12 Jan 2006

From: Norman Walsh <Norman.Walsh@Sun.COM>
Date: Thu, 12 Jan 2006 12:21:30 -0500
To: public-xml-processing-model-wg@w3.org
Message-ID: <87slrtnyzp.fsf@nwalsh.com>
See also: http://www.w3.org/XML/XProc/2006/01/12-minutes.html

                                   - DRAFT -

                            XML Processing Model WG

12 Jan 2006

   Agenda[2]

   See also: IRC log[3]

Attendees

   Present
           Henry, Richard, Rui, Erik, Alessandro, Norm, Jeni, Paul, Andrew,
           Alex

   Regrets
           Robin, Michael_(partial)

   Chair
           Norm

   Scribe
           Norman Walsh

Contents

     * Topics
         1. Administrivia
         2. Technical: email followup
         3. Requirements
     * Summary of Action Items

     ----------------------------------------------------------------------

   **

   **

   <scribe> Scribe: Norman Walsh

   <scribe> ScribeNick: Norm

   Date: 12 Jan 2006

   <richard> http://www.w3.org/XML/XProc/2006/01/12-agenda.html[4]

  Administrivia

   Accept this agenda?

   Accepted.

   Accept last week's minutes:
   http://www.w3.org/XML/XProc/2006/01/05-minutes.html[5]

   Accepted.

   Norm reminds the group about the plenary and the hotel arrangements

  Technical: email followup

   Kinds of iteration?

   Alex: we're getting more technical, are we start doing that now are we
   going to lose the requirements/use-cases thread?
   ... The question of what passes between processes is an important one at
   this stage.
   ... Core WG said "infosets" but now we need to support XDM and other
   augmented forms

   Norm: the ability to pass around infosets and augmented infosets are both
   requirements in my mind

   Jeni: I think there's a community that just wants to pass serialized XML
   around
   ... We ought to have a should or maybe requirement around those ideas

   Richard: How is that different from an infoset?

   Jeni: I think some folks care about whether things are represented by an
   entity or a Unicode character

   Richard: So you're assuming the components aren't normal XML processors?

   Jeni: The kind of pipeline I have in mind is one where someone takes a non
   well-formed XML document, smartens it up into XML, and then can report
   that as parsed XML to the next stage. Then later on, create some XML that
   is just a stream of characters (e.g., change particular characters into
   images)

   Richard: So you'd be able to pass things around that aren't really XML?

   Jeni: From my use cases, I think processes should be able to consume and
   produce things which aren't XML (especially HTML)
   ... Taking non-XML and turning it into XML is important.

   Alex: Maybe we'll have to look at serialization more closely.
   ... Maybe some of the other things should be doable on the end of a
   pipeline.

   Norm: I imagined non-XML only at the ends but there's nothing that would
   prevent someone from glueing several together I suppose.

   Erik: Talking about non-XML stuff is a little scary because it's more like
   Unix pipes and is a little more complex. We need to be careful.
   ... Certainly it's important to some people to have some things, like
   entities, preserved, but if none of the existing data models do that, we
   should investigage why.
   ... In XPL, we only deal with XML infosets. If a component is trying to
   read data which is not XML, then either the component accesses the
   information externally (not through a connection in the pipeline) or you
   can encapsulate the information in some XML format (e.g., base64 encoded)

   Alex: We need to be very careful not to try to take on more than we can
   handle

   Norm: Jeni only said "could" or "should". Let's see if we can get a better
   handle on the issues when we have more information (later in the process)

   Richard: If we're dealing with both plain infosets and augmented infosets,
   then we could have an "unintepreted text" mode as well. Though we wouldn't
   have any standard components that work on them.

   Rui: We can look at the way cocoon handles this issue

   Erik: Cocoon handles this by using generators or serializers following a
   model similar to what I said about XPL above.

  Requirements

   Alex: I got as far as getting myself setup with XML Spec. I haven't done
   any new content, but I have a proposal about how it should be laid out.

   ->
   http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2006Jan/0041.html[6]

   Alex: I'd like feedback on that layout before I proceed.
   ... Do we need a terminology section?
   ... There's a lot of terminology out there, we should define what we mean
   by things so that we don't confuse readers.
   ... The previous document had a section on "design principles" but those
   sound like "requirements" to me.
   ... I think we could introduce the idea that "design principles" are just
   very broad requirements.

   MSM: Design principles are not simply broad requirements in the following
   way: there have been some people active in W3C WGs who have said a
   "requirement" is (a) a crisp, verifiable statement and (b) is a do-or-die
   thing; if you don't meet the requirements you don't ship.
   ... For people who take that view, keep it "short and simple" isn't crisp
   enough. Short you could manage, but "crisp" would be untestable.
   ... But equally, it's not exactly a do-or-die situation. If you set a
   target of 20 pages and the normative prose turns out to be 21 pages, you
   typically call that a success.
   ... If no one in our readership is going to interpret requirement as
   above, then Alex's proposal is fine. But there are those people in the
   world.

   Alex: In that case, I would put "we process infosets" as a hard
   requirement.

   <MSM> +1

   Norm: Does that all sound ok to folks then?

   Yes.

   Discussion of requirements in Alex's document:

   http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2006Jan/att-0041/xproc-requirements.html[7]

   1. The language must be rich enough to address practical interoperability
   concerns.

   Design principle

   2. The language should be as small and simple as possible.

   Design principle

   3. The language must allow the inputs, outputs, and other parameters of a
   components to be specified.

   Requirement.

   4. he language must define the basic minimal set of mandatory input
   processing options and associated error reporting options required to
   achieve interoperability.

   There's some confusion about this one

   Editor will refactor.

   5. Given a set of components and a set of documents, the language must
   allow the order of processing to be specified.

   Requirement.

   6. It should be relatively easy to implement a conformant implementation
   of the language, but it should also be possible to build a sophisticated
   implementation that can perform parallel operations, lazy or greedy
   processing, and other optimizations.

   Confusion? Design principles or requirements?

   Editor will refactor.

   7. The model should be extensible enough so that applications can define
   new processes and make them a component in a pipeline.

   Requirement.

   Richard: I think we should be careful not to use "extensibility" and
   "interoperability" without being fairly precise about what we mean.

   8. The model must provide mechanisms for addressing error handling and
   fallback behaviors.

   Requirement.

   MSM: Are we talking about candidate requirements or requirements we've
   accepted

   <richard> these are all "candidate" requirements at this stage, surely

   Norm: I think we get to start over and we get to pick if these are
   requirements we accept or not after we believe we have a common
   understanding of what they mean

   9. The model could allow conditional processing so that different
   components are selected depending on run-time evaluation.

   MSM: Run-time evaluation is clear enough to count as crisp?

   Alex: No, I think these will all get longer.

   Requirement.

   10. The model should not prohibit the existence of streaming pipelines.

   Requirement.

   Richard: we should be clear that you should be able to write pipelines
   that can be streamed rather than that every pipeline must be streamable.
   ... Some things that you might want to do with pipelines cannot be
   streamed.

   MSM: Can we imagine an option where I ask if this pipeline is streamable
   and fail if it isn't?

   Erik: I'm not sure I understand the question. Should you have an option to
   ask the pipeline engine if a pipeline is streamable?

   MSM: I would like the option of having the processor tell me if I've
   failed to write a streaming pipeline.

   Erik: This sounds like something specific to a particular implementation.

   MSM: It may be infeasable in general.

   Erik: The idea is to leave the door open to allow some processors to
   optimize something to be streaming.

   MSM: If it's that difficult to tell, then I'm concerned about it being a
   requirement as opposed to a design goal.

   Richard: Something like a general XSLT transformation cannot possibly be
   guaranteed to be streamble. There are some cases where the streambility is
   determined by the compoents.
   ... But if there are conditionals in the language then it may also not be
   possible to stream on that basis (.e.g, a condition that cannot be
   deterined until some stage has finished).
   ... As we proceed through, we shouldn't put anything in that prevents a
   streaming pipeline.

   Alex; We can mark this as a possible new requirement and debate it as we
   proceed.

   <ebruchez> I I think that's too specific of a reuirement

   <MSM> I am having trouble imagining a language construct that would not
   only be non-streamable but would successfully prohibit the writing of
   streamable pipelines. Is the req as formulated by Core a nop?

   11. The model should allow multiple inputs and multiple outputs for a
   component.

   Requirement.

   12. The model should allow any data set conforming to one of the W3C
   standards, such as XML 1.1, XSLT 1.0, XML Query 1.0, etc., to be specified
   as an input or output of a component.

   I'd be inclined to state it broadly as a design principle.

   <richard> Michael - a rule that downstream components must not start
   unless it is guaranteed that no upstream component will abortld be an
   example of such a construct

   Alex: That boils down to specific ones for known languages.

   Norm: I think we may be able to answer the question more generally, but
   I'm ok with that.

   13. Information should be passed between components in a standard way, for
   example, as one of the data sets conforming to an industry standard.

   Richard: I think that means it should use things like SAX and DOM

   MSM: Except that neither SAX nor DOM is a data set.

   Alex: we could refactor that to say that we don't want to preclude ...
   some list of known ways to pass infosets.

   Richard: The Core WG may have been trying to express that it didn't want
   us to invent a *new* way
   ... The fact that the Core WG included it doesn't mean everyone there
   agreed with it.

   Editor will refactor.

   14. The language should be expressed in XML. It should be possible to
   author and manipulate documents expressed in the pipeline language using
   standard XML tools.

   Requirement.

   15. The pipeline language should be declarative, not based on APIs.

   Erik: I would argue that XPL is declarative
   ... You really are declaring linking of components together and leaving it
   to the implementation to do the work

   Richard: The idea here is that the language for expressing the connections
   between components should be declarative.

   16. The model should be neutral with respect to implementation language.

   Requirement.

   Norm: Do you have enough to make a first pass?

   Alex: Yes. If you look at this list and see things missing, we should add
   them.
   ... I'll take a first stab at it from the minutes of the preceding
   meetings.

   <scribe> ACTION: Alex to produce document by c.o.b. 17 Jan 2006 [recorded
   in http://www.w3.org/2006/01/12-xproc-minutes.html#action01[8]]

   ADJOURNED

Summary of Action Items

   [NEW] ACTION: Alex to produce document by c.o.b. 17 Jan 2006 [recorded in
   http://www.w3.org/2006/01/12-xproc-minutes.html#action01[9]]
   **
   [End of minutes]

     ----------------------------------------------------------------------

   [1] http://www.w3.org/
   [2] http://www.w3.org/XML/XProc/2006/01/12-agenda.html
   [3] http://www.w3.org/2006/01/12-xproc-irc
   [4] http://www.w3.org/XML/XProc/2006/01/12-agenda.html
   [5] http://www.w3.org/XML/XProc/2006/01/05-minutes.html
   [6]
   http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2006Jan/0041.html
   [7]
   http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2006Jan/att-0041/xproc-requirements.html
   [8] http://www.w3.org/2006/01/12-xproc-minutes.html#action01
   [9] http://www.w3.org/2006/01/12-xproc-minutes.html#action01
   [10] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
   [11] http://dev.w3.org/cvsweb/2002/scribe/

    Minutes formatted by David Booth's scribe.perl[10] version 1.127 (CVS
    log[11])
    $Date: 2006/01/12 17:09:17 $


                                        Be seeing you,
                                          norm

-- 
Norman.Walsh@Sun.COM / XML Standards Architect / Sun Microsystems, Inc.
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.

Received on Thursday, 12 January 2006 17:29:41 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:21:46 GMT