- From: Dimitre Novatchev <dnovatchev@gmail.com>
- Date: Wed, 14 Jan 2026 13:23:21 -0800
- To: Christian Grün <cg@basex.org>
- Cc: Norm Tovey-Walsh <norm@saxonica.com>, "public-xslt-40@w3.org" <public-xslt-40@w3.org>
- Message-ID: <CAK4KnZeUE___RFDe62P=m=5n1OrMJoTnefzbuMJ1yjP0yDD-+w@mail.gmail.com>
An update:
Christian and I discussed the Generators implementation of the N-way merge.
He proposed a monstrous test where each of the 3 data sources is an array
of 100 Billion values (100_000_000_000) and the test must print the first
100_000 (one hundred thousand values).
Initially the Generators implementation needed 6 seconds to do this.
Now, after reasoning and improving, it takes just 1.2 seconds, a 5 times
speed improvement and all this without loss of generality. In the code of
gn:merge-sorted-generators one needs to just replace
gn:make-generator-from-array with gn:make-generator, and pass to this
function the specific data-provider for the datatype of each data source -
and the different data sources can be of different types!
Here is the latest code:
declare function gn:merge-sorted-generators($arGens as array(f:generator))
as f:generator
{
if(array:empty($arGens)) then gn:empty-generator()
else
let $starts := $arGens => array:for-each(fn($gen){$gen =>
gn:value()}),
$minVal := min($starts),
$firstMinInd := ($starts => array:index-where(fn($val){$val eq
$minVal}))[1],
$firstMinSource := $arGens($firstMinInd),
$newArOfGens := $arGens => array:remove($firstMinInd),
$genTrimmed := $firstMinSource => gn:skip(1),
$newArOfGens2 := if(gn:some($genTrimmed)) then $newArOfGens =>
array:append($genTrimmed)
else $newArOfGens,
$result := f:generator(initialized := true(), end-reached :=
false(),
get-current := fn($this as f:generator)
{$minVal},
move-next := fn($this as f:generator)
{gn:merge-sorted-generators($this?inputGens )}
) => map:put("inputGens", $newArOfGens2 )
return
$result
};
And the test itself:
let
$ar1 := array {1 to 100_000_000_000},
$ar2 := array {2 to 100_000_000_001},
$ar3 := array {3 to 100_000_000_002},
$gn1 := gn:make-generator-from-array($ar1),
$gn2 := gn:make-generator-from-array($ar2),
$gn3 := gn:make-generator-from-array($ar3),
$ArOfGens := [$gn1, $gn2, $gn3]
return
gn:merge-sorted-generators($ArOfGens) => gn:subrange(1, 100000) =>
gn:to-array()
[image: image.png]
On Wed, Jan 14, 2026 at 6:58 AM Dimitre Novatchev <dnovatchev@gmail.com>
wrote:
> Hi Christian,
>
> The N-way merge is used when the N sorted lists are so big/long that they
> cannot fit individually or together in the RAM of the computer.
>
> In this case the N generators will be created using the
> *gn:make-generator* function passing to it a provider that feeds it with
> one value at a time. And in fact, any of the N different data sources can
> be of different type (XML, JSON, etc) and have its own, unique data
> provider - all this is configurable.
>
> So, yes - if all the data is small and can fit into the RAM of the
> computer, there is no reason why we would want to use the N-way merging.
>
> The power of Generators is dealing with streaming data of unbounded
> volume. This is why N-way merge is used as an essential part of external
> sorting.
>
> > I guess it may be simple (and more persuasive) to reformulate it in a
> way that it works with infinite input? I tried for a while, but I failed.
>
> Please, read about the *gn:make-generator
> <https://qt4cg.org/pr/2247/xpath-functions-40/Overview.html#generator-make-generator>*
> function and its protocol with its data-provider. Also see the examples to
> it, and if there is something that is not clear, I would be glad to explain.
>
> I would be glad to answer any other questions you or the people around you
> might have.
>
> Thanks,
> Dimitre.
>
> On Wed, Jan 14, 2026 at 3:21 AM Christian Grün <cg@basex.org> wrote:
>
>> Hi Dimitre,
>>
>> > I really don't understand this statement. We have 138 examples/tests.
>> Today's proposal had 5-6 examples, at least half of which deal with
>> integers (even / odd/, etc.
>>
>> The proposal provides many examples, which are certainly helpful to
>> understand what the specific functions are supposed to do. What we still
>> miss (at least when I think of the feedback that I got from first reviewers
>> and possible future users) are examples that demonstrate that (and how)
>> generators will simplify tasks. An N-way merge could be a nice use case
>> indeed. The current example can also be answered without generators and
>> with a simple combination of join and sort:
>>
>> array:sort(array:join(($ar1, $ar2, $ar3)))
>>
>> I guess it may be simple (and more persuasive) to reformulate it in a way
>> that it works with infinite input? I tried for a while, but I failed.
>>
>> Best,
>> Christian
>>
>>
>
>
--
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
To achieve the impossible dream, try going to sleep.
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they write
all patents, too? :)
-------------------------------------
Sanity is madness put to good use.
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.
Attachments
- image/png attachment: image.png
Received on Wednesday, 14 January 2026 21:23:38 UTC