Re: QT4CG meeting 145 draft agenda, 9 December 2025

On the subject of generators, I wanted to bring a bit of hack perspective
from my understanding and minor usage.

I did some work using mutating maps to hold state, inspired by and thanks
to Dimitre's work on generators.
I got quite a bit of help from Dimitre on the implementation, that I pared
down for the specific usage of making an array slice selector.

The generator-inspired approach solved a problem for which I was not able
to find a satisfactory non-mutating solution. Via the generator-inspired
approach, the somewhat arcane defaults and handling of negative indices and
steps from the definition of RFC9535 are evaluated only once and then
stored in the generator, along with the current value and the methods for
mutating to the next values until there are no more. (I feel great affinity
with the generator approach because, much like me, it is inherently lazy.)

Whereas I am convinced that generators are both immensely useful and
quasi-fundamental, what I have not yet seen, certainly due to my great
ignorance, is the reason for which generators cannot be a separate library.
Perhaps there is a fundamental operation within the implementation of
generators that requires a special primitive that is not currently
available in QT4, in which case said primitive could be added to QT4. It
would be good, then, to look at such a primitive, if indeed it exists.

If I am understanding correctly, the argument for adding generators to the
base definition of QT4 is that, in that way, generators are sure to be
treated as fundamental. The argument for having generators as a separate
library is that it can be a separate, parallel, somewhat independent
initiative from QT4, hence a reduced effort for base QT4 functions and an
even earlier QT4 release. As a fervent albeit non-practicing QT4
enthusiast, I would tend to vote for the separate library.

Looking at https://qt4cg.org/pr/2247/xpath-functions-40/autodiff.html , I
am struck by the consideration and attentiveness that has gone into this
work. Inasmuch, I'm afraid that my suggestion can be taken as being
flippant and ungrateful, but I would protest that I have nothing but
admiration for this work that I, personally, find inspiring. I feel,
nonetheless, that the importance of generators and the important effort
required to define them can be better separated from the mostly simpler
primitives that make up the base QT4 functions in the sole but vital
interest of simplifying the QT4 definition and delivery effort.

Please accept this expression of my respectuous sentiments

-alan


On Tue, Dec 9, 2025 at 3:42 AM Liam R. E. Quin <liam@fromoldbooks.org>
wrote:

> On Mon, 08 Dec 2025 11:11:02 +0000
> Norm Tovey-Walsh <norm@saxonica.com> wrote:
>
> > Hello,
> >
> > This week, we plan to begin with a discussion of the generators
> > proposal. I hope that we will be joined by both Dimitre Novatchev and
>
> A few notes about generators for (optional) background reading, in text
> (here and pdf (attached).
>
> First a tiny bit of background.
>
> AS we know, the way to do “looping” in both XSLT and XQuery is through
> recursion. And if you know the common pattern for writing a recursive
> function or template, and follow it carefully, chances are high that
> the processor will turn your expression into a loop internally, using
> tail call optimization.
>
> This is great, because at each stage of recursion you can explore a new
> input value, and computing the results happens only at that time, not
> all in advance, so e.g. if the calculation is really expensive and you
> only need a few values processed, but don’t in advance know how many,
> you can terminate early.
>
> Example: Consider all of the nodes reachable in this RDF graph starting
> at a particular node, but only until you reach a particular result
> (e.g. can you get to Mornington Crescent from here?).
>
> But if you don’t get the template or function right, you run out of
> stack space, or reach some limit on stack size or recursion depth
> imposed by the processor. This can be as low as 50 nested calls in some
> implementations.
>
> So along came xsl:iterate. This XSLT instruction is the equivalent of a
> recursive template, but in a way that can always be optimized. It also
> supports early termination, which is tricky to get right with a
> recursive template.
>
> There is nothing in xsl:iterate you couldn’t write yourself with a
> recursive named template and a lot of care. And yet, in a detailed and
> exhaustive survey of all XSLT 3 users in the entire universe  [1] it
> was the second most popular feature introduced in 3.0.
>
> [1] OK, I asked a few people.
>
> But what about pure XPath, or what about XQuery?
>
> There’s nothing similar built in. And as with recursive XSLT templates
> or functions, there is no guaranteed optimization of recursive
> functions.
>
> So, on to generators...
>
> I maintain an online version of a 32-volume dictionary of biography
> from 1812 or so. It covers over 9500 people, and often lists the most
> important books they wrote.
>
> Many of the books are online in TEI XML, it turns out. The Early
> English Books Online project transcribed a lot of them (using slave lab…
> er I mean graduate students). Over sixty thousand books in all.
>
> So I wrote a query (using BaseX) to match up authors of books in EEBO to
> people in the 1812 biographical dictionary. It found two or three
> thousand authors who were likely the same (you have to allow for
> spelling variations, and for dates being a year or two different partly
> because of calendar changes).
>
> Now for each author I’d like to look up each book in turn in the
> British Library Short Title Catalogue (at least for the 18th Century
> English authors, but that’s more than half of them I think), and be
> able to give visitors to the site better references with the more usual
> names of the books, and, ultimately, link to online images as well as
> the EEBO transcription.
>
> So, I have 9,500 biographies in Chalmers, many with books they
> supposedly wrote, and I can look up the books they wrote in the
> catalogue. If I look up all of them at once the British Library will
> hate me, but in any case I don’t need to. Some books will appear more
> than once (e.g. multiple authors, or a mention in a biography rather
> than an assertion or authorship). Many will fall outside the date
> period for the catalogue. And when I’ve found a likely match for one, I
> want to fetch the full catalogue record, save it in a cache for later
> processing, and move on.
>
> Consider using xsl:iterate, but you are going to use a Web API on each
> iteration, and maybe you will stop when you have found a dozen matches
> for each author: the goal is partly to build confidence in the name of
> the author and their details, and fetching sixty works doesn’t do that
> much better than fetching a dozen, if the details match well enough.
>
> So, 9,600 times 12 is 115,200 queries at most.
>
> Now, if you’ve worked with XSLT 3 streaming, you know that you have to
> avoid constructing long sequences of data from your input.
>
> Don’t write
>   xsl:for-each select=”/dictionary/letter-group/entry/title”
> because that makes a list of 9,960 entry titles before it starts to
> process them. Instead, you write, e.g.
>   xsl:apply-templates select=”/dictionary/letter-group/entry/title”
> so that the processor can handle one title at a time.
>
> In XQuery we don’t have template matching. We can only do the for-each.
>
> I can rewrite this as a recursive function, keeping an integer to track
> position in the list, or using ../following::entry[1]/title to fetch
> the next title given the current one.
>
> This is easy to get wrong, and if I try to process the 60,000+ books in
> TEI XML this way I’m in real danger of running out of memory. But what
> if I could write something like,
>
> let $g := generator:new( {
>   “current” : $doc/dictionary/letter-group[1]/entry[1]/title ,
>   “next” : fn() {
>     generator:new(
>       “current” :  $g/../following::entry[1]/title.
>       “next” : $g(“next”) (: copy the next function :)
> } )
> return process-books-by($g)
>
> where process-books-by() is a function that processes $g→current() and
> then calls itself with $g→next() each time?
>
> This is a lot less to write, and the generator class (OK, we don’t have
> classes, the generator code) is written once and (we hope) tested.
> (this example doesn’t use exactly the proposed generator API)
>
> All that’s going on is that I have a map, an object if you prefer,
> except we don’t have objects, that knows how to find the next value,
> and returns the empty sequence when there are no more values. The next
> function returns a new generator with the current value set to the next
> one to process.
>
> But this is exactly what an iterator looks like in procedural
> languages. The generator proposal defines the functional counterpart.
>
> When I’ve found a list of candidates from the remote catalogue, I have
> a text similarity problem – which book is the best match (if any) from
> the list? Does it match any of the 60,000 EEBO books?
>
> The thing about text similarity is you want to sort candidates based on
> a likelihood criterion, then do the more expensive similarity test on
> the smallest possible solution space, the fewest books possible,
> because it’s computationally expensive. So when you find a good match
> you want to stop, but you don’t know how many to process in advance,
> and the full list of EEBO books uses tens or hundreds of gigabytes in
> the database.
>
> So this is another good case for generators: choose some candidates,
> process them until you find one you think is likely the right answer,
> then stop.
>
> Again, you could do all this with a recursive function, or maybe with
> one of the new-fangled qt4 while-do-done() things. But you can’t use
> fold-left as you can’t construct the sequence to process, and even
> xsl:iterate is tricky if you don’t know what you’re iterating over
> exactly in advance.
>
> This example does not use exactly the proposed generator API, because
> in a generator the current value can be the empty sequence, it doesn’t
> have to mean ‘there is no more’.
>
> Some examples of good candidates for standardization might include:
>
> 1. Things many people need but have to write for themselves
>
> 2. Things people tend to write incorrectly, causing problems (e.g.
> contains-token() for matching HTML class values)
>
> 3. Things that are hard enough to write that most people will simply
> use a different language or wander off or fail or hate us :-)  --- e.g.
> replace() could be written in terms of string processing functions, and
> the file:archive module for reading/writing zip files could be written
> with file:read-binary and file:write-binary and simply implementing
> Lempel-Ziv and Huffman compression in XPath, and then you could process
> epub files in XSLT or XQuery, but few people would do that. Having the
> extension makes things feasible.
>
> 4. Things that extend the reach of the language – this is social more
> than technical.
>
> Generators meet all of these criteria, which is why I support the
> work.
>
> Hope this helps a little,
>
> liam
>
> --
> Liam Quin: Delightful Computing - Training and Consultancy in
> XSLT / XML Markup / Typography / CSS / Accessibility / and more...
> Outreach for the GNU Image Manipulation Program
> Vintage art digital files - fromoldbooks.org
>

Received on Tuesday, 9 December 2025 09:36:12 UTC