RE: Threading and ordering

I have seen people creating steps that have two (or more) input/output ports where one is used for the "main" data that is being processed and the others are used to access additional information about the individual documents. There is a 1-to-1 correspondence between the two, and this approach relies on the exact same order of the documents.

If the order cannot be guaranteed, I think the proposed document metadata XProc V2 feature might help in some cases, but in general, I think that people who would want to implement any kind of pair-wise operation would be in trouble.

Ordering of connections is also important for parameters. Without a predictable order, you cannot rely on consistent parameter overriding. Again, this should go away if we replace/drop the current parameters in V2.

Relaxing the ordering would also have impact on some of the standard steps, for instance p:pack, p:split-sequence (the "initial-only" option), p:wrap-sequence, or p:xquery (again, I have seen people who pass a fixed-order sequence of documents to the step: the initial context item is the primary data to query, and the others are auxiliary resources used by the query).

So I think I agree with Romain and Norm that having an option to indicate that the order does not matter is probably the most sensible way to go.

Regards,
Vojtech

--
Vojtech Toman
Consultant Software Engineer
EMC | Information Intelligence Group
vojtech.toman@emc.com
http://developer.emc.com/xmltech

> -----Original Message-----
> From: Norman Walsh [mailto:ndw@nwalsh.com]
> Sent: Sunday, September 29, 2013 11:49 AM
> To: Romain Deltour
> Cc: public-xml-processing-model-comments@w3.org
> Subject: Re: Threading and ordering
> 
> Romain Deltour <rdeltour@gmail.com> writes:
> > That said, this is not a *strict* dependence, we could certainly find
> > a workaround if the XProc spec was to change. Another option would be
> > to keep the default behavior and add an option to explicitly declare
> > when the order doesn't matter, e.g. using an extra attribute on the
> > p:input and p:output ?
> 
> Yes, that's about where I've come to in thinking about it. If the order
> matters, some (in the worst case, all but one) documents will have to
> be buffered so that they can be delivered in the right order.
> 
> Giving pipeline authors a way to indicate that order doesn't matter
> will potentially make some pipelines consume less memory and run
> faster.
> 
>                                         Be seeing you,
>                                           norm
> 
> --
> Norman Walsh
> Lead Engineer
> MarkLogic Corporation
> Phone: +1 512 761 6676
> www.marklogic.com

Received on Monday, 30 September 2013 09:52:19 UTC