Improving algorithms in specs

Hello spec editors, contributors, and spec authoring tools authors,

Reffy [1], the crawler that powers the cross-references database used in 
Bikeshed and Respec, now also extracts algorithms from specs (see 
extracts in Webref [2]). This makes analyses and automated detection of 
problems possible, at least in theory. First automated analysis of 
algorithm steps that run in parallel reveals that it's easy to get 
things wrong when a Promise or an event is involved [3]. Dom and I are 
planning to file spec issues to report detected problems, on a 
semi-automated basis using Strudy [4], as we already do for broken 
links.

Looking beyond this first analysis, various improvements could be 
considered to make it easier to detect bugs early on and write correct 
algorithms. Some are listed below. We're seeking input on what's doable 
or worth exploring. If you have stories of bugs that you uncovered and 
fixed in algorithms that could perhaps be detected automatically, we're 
also interested!

Some possible improvements to consider:

1. Converge on a common way to flag, define, and reference algorithms.
Reffy currently needs to rely on a number of heuristics to extract 
algorithms and extraction is far from perfect (some limitations are 
noted as TODO notes in [5]).

a) Flagging algorithms
Bikeshed supports an `algorithm` attribute to flag algorithms. Respec 
supports `class="algorithm"`, but that is more used to identify the 
actual algorithm steps. These mechanisms are used inconsistently for 
now, or not at all.

b) Defining algorithms
Infra describes conventions for algorithms [6], including ways to define 
them. These conventions are also used inconsistently or not at all. From 
a spec authoring perspective, it's not clear what definition type to use 
(`abstract-op` looks good and is sometimes used, but there are lots of 
definitions that use the `dfn` type). There is also no common mechanism 
to precisely define the parameters that algorithms may take.

c) Referencing algorithms
Different specs use different wording to call algorithms and pass 
parameters. This makes it harder to "follow" variables, inputs and 
outputs, across algorithms.

2. Converge on constructs.
Specs have mostly converged on using possibly nested ordered lists to 
describe algorithm steps. That's good! There remain divergences on the 
way common constructs such as "if/then/else" or "try/catch" get written. 
Similarly, some specs mix different operations within a single step 
while others use more atomic specs. There's a balance to be found 
between readability and formalism. More readability is better for 
readers and is usually more compact. More formalism is better for 
automated parsing and analyses.

3. Converge on terms and actions used in algorithms.
While exploring heuristics to detect algorithm steps and distinguish 
them from a simple list of items, I started building a list of actions 
that appear at the beginning of steps [7]. That reveals the many verbs 
that specs use to tell implementers what needs to be done. Some have 
formal definitions. The meaning of others is more fuzzy. Ideally, the 
meaning of all operations would be normatively defined somewhere. For 
example, steps that run in parallel should mainly reference terms 
defined in Infra (such steps are not supposed to manipulate observable 
JavaScript objects).

To ease the burden on spec authors, it would be fantastic if progress on 
the above did not entail more work on their part. That is what spec 
authoring tools excel at, and another reason why we're seeking input.

Making algorithms more clearly identified in specs could also help 
provide more "visual" ways to read them inline. For a starting point, 
see the algorithm explorer that Dom started to visualize spec algorithms 
[8]. Marcos proposed a breakout session at TPAC 2024 on "Bringing more 
UI utilities to specs" [9]. We could perhaps discuss visualization of 
algorithms as flowcharts within that breakout.

Thanks,
François and Dom.

[1] https://github.com/w3c/reffy
[2] https://github.com/w3c/webref/tree/main/ed/algorithms
[3] https://github.com/w3c/strudy/issues/663
[4] https://github.com/w3c/strudy
[5] 
https://github.com/w3c/reffy/blob/main/src/browserlib/extract-algorithms.mjs#L37
[6] https://infra.spec.whatwg.org/#algorithms
[7] 
https://github.com/w3c/reffy/blob/main/src/browserlib/extract-algorithms.mjs#L97-L278
[8] https://dontcallmedom.github.io/algo-explorer/
[9] https://github.com/w3c/tpac2024-breakouts/issues/14

Received on Thursday, 29 August 2024 17:01:10 UTC