XProc Minutes 5 July 2007 from Norman Walsh on 2007-07-05 (public-xml-processing-model-wg@w3.org from July 2007)

From: Norman Walsh <ndw@nwalsh.com>
Date: Thu, 05 Jul 2007 13:41:42 -0400
To: public-xml-processing-model-wg@w3.org
Message-ID: <878x9uhhnt.fsf@nwalsh.com>
See http://www.w3.org/XML/XProc/2007/07/05-minutes

W3C[1]

                                   - DRAFT -

                            XML Processing Model WG

Meeting 73, 5 Jul 2007

   Agenda[2]

   See also: IRC log[3]

Attendees

   Present
           Norm, Mohamed, Rui, Paul, Henry, Murray, Andrew

   Regrets
           Richard, Alessandro

   Chair
           Norm

   Scribe
           Norm

Contents

     * Topics
         1. Accept this agenda?
         2. Accept minutes from the previous meeting?
         3. Next meeting: telcon 12 July 2007
         4. Review of 6 July 2007 Working Draft
         5. Step library issues
         6. Any other business?
     * Summary of Action Items

     ----------------------------------------------------------------------

  Accept this agenda?

   -> http://www.w3.org/XML/XProc/2007/07/05-agenda

   Accepted.

  Accept minutes from the previous meeting?

   -> http://www.w3.org/XML/XProc/2007/06/28-minutes

   Accepted.

  Next meeting: telcon 12 July 2007

   Richard's regrets continue; probably regrets from Mohamed, Henry until 16
   August.

  Review of 6 July 2007 Working Draft

   -> http://www.w3.org/XML/XProc/docs/WD-xproc-20070706/

   Murray: On some fourth level headings, the formatting looks a bit odd.

   <scribe> ACTION: Norm to do something about the formatting of fourth level
   headings [recorded in
   http://www.w3.org/2007/07/05-xproc-minutes.html#action01[7]]

   Murray: In particular, since we have an element name in there, having it
   in u/c is a problem.

   Mohamed: Some small editorial problems that I sent to Alex didn't get
   incorporated.
   ... and error codes are in an odd order.

   <scribe> ACTION: Norm to sort the error codes in the appendix [recorded in
   http://www.w3.org/2007/07/05-xproc-minutes.html#action02[8]]

   Mohamed: What about p:map?

   Norm: Yes, we still need to talk about that, but I don't think it'll get
   in this draft.

   Mohamed: We have a schematron reference but no schematron step.

   Norm: I thought we had agreed to have a schematron step.

   Henry: Seems reasonable to me, along with XSLT2 and XSL Formatter.

   Mohamed: We may also want to have an NVDL step.

   Norm: Yes.
   ... I'd like someone to propose how the NVDL step would work.

   Murray: What about an appendix for the WG members.

   Norm: Sure.

   Proposal: We'll publish this as a public Working Draft tomorrow.

   Accepted.

  Step library issues

   ->
   http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2007May/0318.html

   Norm: Let's struggle on in Alex's absence.
   ... What about parsing HTML?

   Henry: I seem to recall that if we said the content-type was text/html,
   then you get an implementation defined mapping from HTML to XHTML.

   Norm: Should we do it that way?

   Henry: There was an implicit reference to the HTTP request step that it by
   default produces escaped markup.

   Norm: I hope that's wrong.

   Henry: We have an unescape markup step because we know that Atom, RSS,
   NewsML, etc can encapsulate documents with escaped markup.
   ... So it seems that p:http-request and p:unescape-markup have this
   problem.
   ... but what do save/serialize have to do with this?
   ... I'd like to split receiving and producing.
   ... How about: it's implementation defined if any media types under than
   application/xml or application/foo+xml are allowed. Processors are not
   required to support any other media types. But if they do, then it's
   implementation defined what mechanism they use to get from the ones they
   support to XML.

   Murray: Are we still talking about infosets?

   Henry: Yes, that's why this problem arises

   Murray: So it's implementation defined how you build an infoset from
   something that isn't XML.

   Norm: I'm happy with Henry's proposal as a starting point.

   Murray: I'm worried about how many different kinds of
   implementation-defined we're going to get.
   ... In GRDDL, we have an issue called faithful infosets. This arises
   because in GRDDL, we're talking about XPath node trees and there are
   questions about validation and XInclude, etc.
   ... This seems to create another faithful infoset issue.

   Scribe stepped away, a few minutes lost

   Henry: The things you can depend on are the minimal common subset that
   more-or-less the infoset defines
   ... It's true that there's more in the XPath 2.0 datamodel, but you can't
   get at it from our language.

   Norm: I'm sympathetic because of web services like Flickr that allow users
   to get comments

   Murray: I think everything needs to be able to filter to XML or you need
   to have a specific component that's for loading non-XML things

   Henry: I think Murray is right, but we're going to cheat just a little bit
   and say there are two.
   ... I'm happy that if you want to inject HTML into your pipeline and
   gaurantee that it's XML then you have to use http-request.

   Norm: We have load, basically only to support DTD validation

   <Zakim> MoZ, you wanted to ask Murray on the difference between XPath node
   trees and infosets and to

   Mohamed: I have a problem with components that translate from HTML to XML.

   Norm: I want it to be implementation defined.

   Mohamed: Norm, you said HTML to XHTML, but maybe we just meant HTML to
   XML.

   Henry: Yes, I think that was my fault. All we need is XML.

   Murray outlines a recent GRDDL use case about faithfulness of a
   representation

   Murray: My initial thought was that there should be a "garbage-in" step
   that could reach out and bring anything in.

   Norm: I think implementors will provide this if we don't

   Henry: The way I read this, you can specify that you require an
   application/html+xml media type and that will cause the pipeline to fail
   if you don't get it.

   Murray: I do an http-request and what I get back is an HTML document. I
   run some kind of process over that and I get some result. That result may
   be successful or not successfull.
   ... What comes out of http-request will be the result.
   ... But presumably I as the author of the pipeline want to know a couple
   of things.

   Norm: I think you can find all of those things by looking at the headers
   and body you get back.

   Henry: If you're using tidy, I'll expect implementations to fail if tidy
   throws errors.

   Norm: I agree.

   Henry: If you're using tagsoup, then you know you'll always get an output.

   <Zakim> MoZ, you wanted to speaks about the difference between p:parameter
   namespace=""... and p:option without namespace@

   Mohamed: Are we sure that the parameters of the header will be available
   to the next step?
   ... The http-request step will ask with some parameters, the result will
   be one of those.

   Murray: So the http-request does a get and there are some headers.

   Norm: You get those back in the headers.

   <Zakim> ht, you wanted to register a concern about the architecture of
   p:http-request

   Henry: If no one else is worrying about this, that's ok, because I'm only
   looking at this in detail now.
   ... Had we already discussed doing this using two output ports instead?
   ... I'd like to be able to write a take-my-chances pipeline where the
   primary output is a sequence of documents.
   ... And only if I care about the minutia do I look at the port.

   Norm: I'm not sure how that would handle multipart related.

   Henry: An alternative would be to say that there is an option that says
   "take my chances"
   ... I want a sequence of documents or fail, don't bother me with all this
   stuff.

   Norm: That's not on the table now, but if you can fire off a quick message
   before you go on vacatoin, that would be good.

   <Zakim> MoZ, you wanted to ask the question why p:store/!result is not
   primary but not p:xslformatter/!result

   Norm: Oversight, I agree.

   Mohamed: What is the default for required on option?

   Norm: "no"

   Mohamed: It's written explicitly in some places.

   Norm: Are we satisified that we've given editorial direction to Alex

   Norm attempts to describe the serialization problem that probably caused
   Alex to lump them together.

  Any other business?

   None.

   Adjourned

Summary of Action Items

   [NEW] ACTION: Norm to do something about the formatting of fourth level
   headings [recorded in
   http://www.w3.org/2007/07/05-xproc-minutes.html#action01[10]]
   [NEW] ACTION: Norm to sort the error codes in the appendix [recorded in
   http://www.w3.org/2007/07/05-xproc-minutes.html#action02[11]]
    
   [End of minutes]

     ----------------------------------------------------------------------

   [1] http://www.w3.org/
   [2] http://www.w3.org/XML/XProc/2007/07/05-agenda
   [3] http://www.w3.org/2007/07/05-xproc-irc
   [7] http://www.w3.org/2007/07/05-xproc-minutes.html#action01
   [8] http://www.w3.org/2007/07/05-xproc-minutes.html#action02
   [10] http://www.w3.org/2007/07/05-xproc-minutes.html#action01
   [11] http://www.w3.org/2007/07/05-xproc-minutes.html#action02
   [12] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
   [13] http://dev.w3.org/cvsweb/2002/scribe/

    Minutes formatted by David Booth's scribe.perl[12] version 1.128 (CVS
    log[13])
    $Date: 2007/07/05 17:39:35 $
Received on Thursday, 5 July 2007 17:41:51 UTC