Minutes for XProc WG telcon of 12 Jan 2006 from Norman Walsh on 2006-01-12 (public-xml-processing-model-wg@w3.org from January 2006)

From: Norman Walsh <Norman.Walsh@Sun.COM>
Date: Thu, 12 Jan 2006 12:21:30 -0500
To: public-xml-processing-model-wg@w3.org
Message-ID: <87slrtnyzp.fsf@nwalsh.com>

See also: http://www.w3.org/XML/XProc/2006/01/12-minutes.html

- DRAFT -

XML Processing Model WG

12 Jan 2006

Agenda[2]

See also: IRC log[3]

Attendees

Present
Henry, Richard, Rui, Erik, Alessandro, Norm, Jeni, Paul, Andrew,
Alex

Regrets
Robin, Michael_(partial)

Chair
Norm

Scribe
Norman Walsh

Contents

* Topics
1. Administrivia
2. Technical: email followup
3. Requirements
* Summary of Action Items

----------------------------------------------------------------------

<scribe> Scribe: Norman Walsh

<scribe> ScribeNick: Norm

Date: 12 Jan 2006

<richard> http://www.w3.org/XML/XProc/2006/01/12-agenda.html[4]

Administrivia

Accept this agenda?

Accepted.

Accept last week's minutes:
http://www.w3.org/XML/XProc/2006/01/05-minutes.html[5]

Accepted.

Norm reminds the group about the plenary and the hotel arrangements

Technical: email followup

Kinds of iteration?

Alex: we're getting more technical, are we start doing that now are we
going to lose the requirements/use-cases thread?
... The question of what passes between processes is an important one at
this stage.
... Core WG said "infosets" but now we need to support XDM and other
augmented forms

Norm: the ability to pass around infosets and augmented infosets are both
requirements in my mind

Jeni: I think there's a community that just wants to pass serialized XML
around
... We ought to have a should or maybe requirement around those ideas

Richard: How is that different from an infoset?

Jeni: I think some folks care about whether things are represented by an
entity or a Unicode character

Richard: So you're assuming the components aren't normal XML processors?

Jeni: The kind of pipeline I have in mind is one where someone takes a non
well-formed XML document, smartens it up into XML, and then can report
that as parsed XML to the next stage. Then later on, create some XML that
is just a stream of characters (e.g., change particular characters into
images)

Richard: So you'd be able to pass things around that aren't really XML?

Jeni: From my use cases, I think processes should be able to consume and
produce things which aren't XML (especially HTML)
... Taking non-XML and turning it into XML is important.

Alex: Maybe we'll have to look at serialization more closely.
... Maybe some of the other things should be doable on the end of a
pipeline.

Norm: I imagined non-XML only at the ends but there's nothing that would
prevent someone from glueing several together I suppose.

Erik: Talking about non-XML stuff is a little scary because it's more like
Unix pipes and is a little more complex. We need to be careful.
... Certainly it's important to some people to have some things, like
entities, preserved, but if none of the existing data models do that, we
should investigage why.
... In XPL, we only deal with XML infosets. If a component is trying to
read data which is not XML, then either the component accesses the
information externally (not through a connection in the pipeline) or you
can encapsulate the information in some XML format (e.g., base64 encoded)

Alex: We need to be very careful not to try to take on more than we can
handle

Norm: Jeni only said "could" or "should". Let's see if we can get a better
handle on the issues when we have more information (later in the process)

Richard: If we're dealing with both plain infosets and augmented infosets,
then we could have an "unintepreted text" mode as well. Though we wouldn't
have any standard components that work on them.

Rui: We can look at the way cocoon handles this issue

Erik: Cocoon handles this by using generators or serializers following a
model similar to what I said about XPL above.

Requirements

Alex: I got as far as getting myself setup with XML Spec. I haven't done
any new content, but I have a proposal about how it should be laid out.

->
http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2006Jan/0041.html[6]

Alex: I'd like feedback on that layout before I proceed.
... Do we need a terminology section?
... There's a lot of terminology out there, we should define what we mean
by things so that we don't confuse readers.
... The previous document had a section on "design principles" but those
sound like "requirements" to me.
... I think we could introduce the idea that "design principles" are just
very broad requirements.

MSM: Design principles are not simply broad requirements in the following
way: there have been some people active in W3C WGs who have said a
"requirement" is (a) a crisp, verifiable statement and (b) is a do-or-die
thing; if you don't meet the requirements you don't ship.
... For people who take that view, keep it "short and simple" isn't crisp
enough. Short you could manage, but "crisp" would be untestable.
... But equally, it's not exactly a do-or-die situation. If you set a
target of 20 pages and the normative prose turns out to be 21 pages, you
typically call that a success.
... If no one in our readership is going to interpret requirement as
above, then Alex's proposal is fine. But there are those people in the
world.

Alex: In that case, I would put "we process infosets" as a hard
requirement.

<MSM> +1

Norm: Does that all sound ok to folks then?

Yes.

Discussion of requirements in Alex's document:

http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2006Jan/att-0041/xproc-requirements.html[7]

1. The language must be rich enough to address practical interoperability
concerns.

Design principle

2. The language should be as small and simple as possible.

Design principle

3. The language must allow the inputs, outputs, and other parameters of a
components to be specified.

Requirement.

4. he language must define the basic minimal set of mandatory input
processing options and associated error reporting options required to
achieve interoperability.

There's some confusion about this one

Editor will refactor.

5. Given a set of components and a set of documents, the language must
allow the order of processing to be specified.

Requirement.

6. It should be relatively easy to implement a conformant implementation
of the language, but it should also be possible to build a sophisticated
implementation that can perform parallel operations, lazy or greedy
processing, and other optimizations.

Confusion? Design principles or requirements?

Editor will refactor.

7. The model should be extensible enough so that applications can define
new processes and make them a component in a pipeline.

Requirement.

Richard: I think we should be careful not to use "extensibility" and
"interoperability" without being fairly precise about what we mean.

8. The model must provide mechanisms for addressing error handling and
fallback behaviors.

Requirement.

MSM: Are we talking about candidate requirements or requirements we've
accepted

<richard> these are all "candidate" requirements at this stage, surely

Norm: I think we get to start over and we get to pick if these are
requirements we accept or not after we believe we have a common
understanding of what they mean

9. The model could allow conditional processing so that different
components are selected depending on run-time evaluation.

MSM: Run-time evaluation is clear enough to count as crisp?

Alex: No, I think these will all get longer.

Requirement.

10. The model should not prohibit the existence of streaming pipelines.

Requirement.

Richard: we should be clear that you should be able to write pipelines
that can be streamed rather than that every pipeline must be streamable.
... Some things that you might want to do with pipelines cannot be
streamed.

MSM: Can we imagine an option where I ask if this pipeline is streamable
and fail if it isn't?

Erik: I'm not sure I understand the question. Should you have an option to
ask the pipeline engine if a pipeline is streamable?

MSM: I would like the option of having the processor tell me if I've
failed to write a streaming pipeline.

Erik: This sounds like something specific to a particular implementation.

MSM: It may be infeasable in general.

Erik: The idea is to leave the door open to allow some processors to
optimize something to be streaming.

MSM: If it's that difficult to tell, then I'm concerned about it being a
requirement as opposed to a design goal.

Richard: Something like a general XSLT transformation cannot possibly be
guaranteed to be streamble. There are some cases where the streambility is
determined by the compoents.
... But if there are conditionals in the language then it may also not be
possible to stream on that basis (.e.g, a condition that cannot be
deterined until some stage has finished).
... As we proceed through, we shouldn't put anything in that prevents a
streaming pipeline.

Alex; We can mark this as a possible new requirement and debate it as we
proceed.

<ebruchez> I I think that's too specific of a reuirement

<MSM> I am having trouble imagining a language construct that would not
only be non-streamable but would successfully prohibit the writing of
streamable pipelines. Is the req as formulated by Core a nop?

11. The model should allow multiple inputs and multiple outputs for a
component.

Requirement.

12. The model should allow any data set conforming to one of the W3C
standards, such as XML 1.1, XSLT 1.0, XML Query 1.0, etc., to be specified
as an input or output of a component.

I'd be inclined to state it broadly as a design principle.

<richard> Michael - a rule that downstream components must not start
unless it is guaranteed that no upstream component will abortld be an
example of such a construct

Alex: That boils down to specific ones for known languages.

Norm: I think we may be able to answer the question more generally, but
I'm ok with that.

13. Information should be passed between components in a standard way, for
example, as one of the data sets conforming to an industry standard.

Richard: I think that means it should use things like SAX and DOM

MSM: Except that neither SAX nor DOM is a data set.

Alex: we could refactor that to say that we don't want to preclude ...
some list of known ways to pass infosets.

Richard: The Core WG may have been trying to express that it didn't want
us to invent a *new* way
... The fact that the Core WG included it doesn't mean everyone there
agreed with it.

Editor will refactor.

14. The language should be expressed in XML. It should be possible to
author and manipulate documents expressed in the pipeline language using
standard XML tools.

Requirement.

15. The pipeline language should be declarative, not based on APIs.

Erik: I would argue that XPL is declarative
... You really are declaring linking of components together and leaving it
to the implementation to do the work

Richard: The idea here is that the language for expressing the connections
between components should be declarative.

16. The model should be neutral with respect to implementation language.

Requirement.

Norm: Do you have enough to make a first pass?

Alex: Yes. If you look at this list and see things missing, we should add
them.
... I'll take a first stab at it from the minutes of the preceding
meetings.

<scribe> ACTION: Alex to produce document by c.o.b. 17 Jan 2006 [recorded
in http://www.w3.org/2006/01/12-xproc-minutes.html#action01[8]]

ADJOURNED

Summary of Action Items

[NEW] ACTION: Alex to produce document by c.o.b. 17 Jan 2006 [recorded in
http://www.w3.org/2006/01/12-xproc-minutes.html#action01[9]]
**
[End of minutes]

----------------------------------------------------------------------

[1] http://www.w3.org/
[2] http://www.w3.org/XML/XProc/2006/01/12-agenda.html
[3] http://www.w3.org/2006/01/12-xproc-irc
[4] http://www.w3.org/XML/XProc/2006/01/12-agenda.html
[5] http://www.w3.org/XML/XProc/2006/01/05-minutes.html
[6]
http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2006Jan/0041.html
[7]
http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2006Jan/att-0041/xproc-requirements.html
[8] http://www.w3.org/2006/01/12-xproc-minutes.html#action01
[9] http://www.w3.org/2006/01/12-xproc-minutes.html#action01
[10] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
[11] http://dev.w3.org/cvsweb/2002/scribe/

Minutes formatted by David Booth's scribe.perl[10] version 1.127 (CVS
log[11])
$Date: 2006/01/12 17:09:17 $

Be seeing you,
norm

--
Norman.Walsh@Sun.COM / XML Standards Architect / Sun Microsystems, Inc.
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.

Received on Thursday, 12 January 2006 17:29:41 UTC