- From: Jeni Tennison <jeni@jenitennison.com>
- Date: Mon, 05 Jun 2006 12:50:46 +0100
- To: public-xml-processing-model-wg@w3.org
Hi, Richard Tobin wrote: >> 0. Say that queries over document sequences aren't supported in XProc > > If we did this, then there is a simple workaround which is to have a > standard component that takes a sequence of documents and returns a > single document containing them as children of the root element. You > can then do a query on that. Yep. The same kind of workaround can be applied when you want to do a query over two (or more) sequences. You end up with huge documents and the user has to do a bit more work, but it's not the end of the world. I think we need two standard components here: - p:aggregate default input: document sequence 'wrapper' param: QName default output: single document Creates a new document with a document element named by the wrapper parameter. Its children are deep copies of the document elements in the input document sequence. Each of these elements is given a xml:base attribute to indicate its original base URI. - p:concatenate 'seq1' input: document sequence 'seq2' input: document sequence 'wrapper' param: QName default output: single document Creates a new document with the document element named by the wrapper parameter. Its children are deep copies of the document elements in the 'seq1' input, followed by deep copies of the document elements in the 'seq2' input. Each of these elements is given a xml:base attribute to indicate its original base URI. >> 1. Say that XProc inputs and outputs are actually *sets* of documents > > Document sequences are going to be very common, as I said above I > think that queries on document sequences are much rarer. We shouldn't > let the sequence-query tail wag the sequence dog. I agree that if document sequences are a useful notion then it doesn't make sense to not have them just to make it easier to perform queries. I'm just not sure whether the reason we've talked about ports accepting document *sequences* was because simply because we want them to accept more than one document and a sequence is the default option. Is there a greater rationale behind using sequences? Hence my questions: >> Do people have examples of components that produce sequences of >> documents where (a) the order of the documents within that sequence >> matters and/or (b) the sequence can contain duplicate documents? > > Can you construct duplicate documents at all in the pipeline? I think > we had agreed some time ago that the pipeline itself has copying > semantics: if a component modifies an input document (assuming the > implementation provides a way to do that) it doesn't affect other > components that have the same document as input. It would be > consistent to say that no standard components generate sequences with > the same ("eq") document twice. I suppose a user-written component > under a given implementation might be able to generate a sequence with > duplicate documents I agree that components shouldn't be able to modify the documents they get as input. I think there are two separate questions here: 1. Can components return as an output the same (unmodified) document that it receives as input, or must it always copy any documents it receives? It might be more efficient for implementations if components like 'filter', 'union' and 'identity' didn't have to create copies. 2. Can a document sequence contain the same document twice? Cheers, Jeni -- Jeni Tennison http://www.jenitennison.com
Received on Monday, 5 June 2006 11:51:07 UTC