Re: Issue #3306 from Jeni Tennison on 2006-06-05 (public-xml-processing-model-wg@w3.org from June 2006)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Mon, 05 Jun 2006 12:50:46 +0100
To: public-xml-processing-model-wg@w3.org
Message-ID: <44841A96.2050209@jenitennison.com>
Hi,

Richard Tobin wrote:
>> 0. Say that queries over document sequences aren't supported in XProc
> 
> If we did this, then there is a simple workaround which is to have a
> standard component that takes a sequence of documents and returns a
> single document containing them as children of the root element.  You
> can then do a query on that.

Yep. The same kind of workaround can be applied when you want to do a 
query over two (or more) sequences. You end up with huge documents and 
the user has to do a bit more work, but it's not the end of the world.

I think we need two standard components here:

  - p:aggregate
      default input:    document sequence
      'wrapper' param:  QName
      default output:   single document

      Creates a new document with a document element named by the wrapper
      parameter. Its children are deep copies of the document elements in
      the input document sequence. Each of these elements is given a
      xml:base attribute to indicate its original base URI.

  - p:concatenate
      'seq1' input:     document sequence
      'seq2' input:     document sequence
      'wrapper' param:  QName
      default output:   single document

      Creates a new document with the document element named by the
      wrapper parameter. Its children are deep copies of the document
      elements in the 'seq1' input, followed by deep copies of the
      document elements in the 'seq2' input. Each of these elements is
      given a xml:base attribute to indicate its original base URI.

>> 1. Say that XProc inputs and outputs are actually *sets* of documents
> 
> Document sequences are going to be very common, as I said above I
> think that queries on document sequences are much rarer.  We shouldn't
> let the sequence-query tail wag the sequence dog.

I agree that if document sequences are a useful notion then it doesn't 
make sense to not have them just to make it easier to perform queries. 
I'm just not sure whether the reason we've talked about ports accepting 
document *sequences* was because simply because we want them to accept 
more than one document and a sequence is the default option. Is there a 
greater rationale behind using sequences? Hence my questions:

>> Do people have examples of components that produce sequences of 
>> documents where (a) the order of the documents within that sequence 
>> matters and/or (b) the sequence can contain duplicate documents?
> 
> Can you construct duplicate documents at all in the pipeline?  I think
> we had agreed some time ago that the pipeline itself has copying
> semantics: if a component modifies an input document (assuming the
> implementation provides a way to do that) it doesn't affect other
> components that have the same document as input.  It would be
> consistent to say that no standard components generate sequences with
> the same ("eq") document twice.  I suppose a user-written component
> under a given implementation might be able to generate a sequence with
> duplicate documents

I agree that components shouldn't be able to modify the documents they 
get as input.

I think there are two separate questions here:

1. Can components return as an output the same (unmodified) document 
that it receives as input, or must it always copy any documents it 
receives? It might be more efficient for implementations if components 
like 'filter', 'union' and 'identity' didn't have to create copies.

2. Can a document sequence contain the same document twice?

Cheers,

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com
Received on Monday, 5 June 2006 11:51:07 UTC