Re: QT4CG meeting 148 draft minutes, 13 January 2026

An update:

Christian and I discussed the Generators implementation of the N-way merge.
He proposed a monstrous test where each of the 3 data sources is an array
of 100 Billion values (100_000_000_000) and the test must print the first
100_000 (one hundred thousand values).

Initially the Generators implementation needed 6 seconds to do this.

Now, after reasoning and improving, it takes just 1.2 seconds, a 5 times
speed improvement and all this without loss of generality. In the code of
gn:merge-sorted-generators one needs to just replace
gn:make-generator-from-array with gn:make-generator, and pass to this
function the specific data-provider for the datatype of each data source -
and the different data sources can be of different types!

Here is the latest code:

declare function gn:merge-sorted-generators($arGens as array(f:generator))
as f:generator
{
  if(array:empty($arGens)) then gn:empty-generator()
    else
      let $starts := $arGens => array:for-each(fn($gen){$gen =>
gn:value()}),
          $minVal := min($starts),
          $firstMinInd := ($starts => array:index-where(fn($val){$val eq
$minVal}))[1],
          $firstMinSource := $arGens($firstMinInd),
          $newArOfGens := $arGens => array:remove($firstMinInd),
          $genTrimmed := $firstMinSource => gn:skip(1),
          $newArOfGens2 := if(gn:some($genTrimmed)) then $newArOfGens =>
array:append($genTrimmed)
                              else $newArOfGens,
         $result := f:generator(initialized := true(), end-reached :=
false(),
                             get-current := fn($this as f:generator)
                                              {$minVal},
                             move-next := fn($this as f:generator)

{gn:merge-sorted-generators($this?inputGens )}
                            ) => map:put("inputGens", $newArOfGens2 )

       return
          $result
};

And the test itself:

   let
       $ar1 := array {1 to 100_000_000_000},
       $ar2 := array {2 to 100_000_000_001},
       $ar3 := array {3 to 100_000_000_002},
       $gn1 := gn:make-generator-from-array($ar1),
       $gn2 := gn:make-generator-from-array($ar2),
       $gn3 := gn:make-generator-from-array($ar3),

       $ArOfGens := [$gn1, $gn2, $gn3]
     return
       gn:merge-sorted-generators($ArOfGens) => gn:subrange(1, 100000) =>
gn:to-array()


[image: image.png]

On Wed, Jan 14, 2026 at 6:58 AM Dimitre Novatchev <dnovatchev@gmail.com>
wrote:

> Hi Christian,
>
> The N-way merge is used when the N sorted lists are so big/long that they
> cannot fit individually or together in the RAM of the computer.
>
> In this case the N generators will be created using the
> *gn:make-generator* function passing to it a provider that feeds it with
> one value at a time. And in fact, any of the N different data sources can
> be of different type (XML, JSON, etc) and have its own, unique data
> provider - all this is configurable.
>
> So, yes - if all the data is small and can fit into the RAM of the
> computer, there is no reason why we would want to use the N-way merging.
>
> The power of Generators is dealing with streaming data of unbounded
> volume. This is why N-way merge is used as an essential part of external
> sorting.
>
> >  I guess it may be simple (and more persuasive) to reformulate it in a
> way that it works with infinite input? I tried for a while, but I failed.
>
> Please, read about the *gn:make-generator
> <https://qt4cg.org/pr/2247/xpath-functions-40/Overview.html#generator-make-generator>*
> function and its protocol with its data-provider. Also see the examples to
> it, and if there is something that is not clear, I would be glad to explain.
>
> I would be glad to answer any other questions you or the people around you
> might have.
>
> Thanks,
> Dimitre.
>
> On Wed, Jan 14, 2026 at 3:21 AM Christian Grün <cg@basex.org> wrote:
>
>> Hi Dimitre,
>>
>> > I really don't understand this statement. We have 138 examples/tests.
>> Today's proposal had 5-6 examples, at least half of which deal with
>> integers (even / odd/, etc.
>>
>> The proposal provides many examples, which are certainly helpful to
>> understand what the specific functions are supposed to do. What we still
>> miss (at least when I think of the feedback that I got from first reviewers
>> and possible future users) are examples that demonstrate that (and how)
>> generators will simplify tasks. An N-way merge could be a nice use case
>> indeed. The current example can also be answered without generators and
>> with a simple combination of join and sort:
>>
>>   array:sort(array:join(($ar1, $ar2, $ar3)))
>>
>> I guess it may be simple (and more persuasive) to reformulate it in a way
>> that it works with infinite input? I tried for a while, but I failed.
>>
>> Best,
>> Christian
>>
>>
>
>

-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
To achieve the impossible dream, try going to sleep.
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they write
all patents, too? :)
-------------------------------------
Sanity is madness put to good use.
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.

Received on Wednesday, 14 January 2026 21:23:38 UTC