WebIDL crawler

Dear TAG,

Francois (cc'd) and I have recently been working on a set of tools
aiming at crawling WebIDL data from the various Web Platform
specifications that use it.

More specifically, these tools try and extract WebIDL fragments from the
latest identified version of specs (even when fed with URLs that are not
necessarily the latest), and at the same time extract information on
normative references that these specs declare. The WebIDL fragments are
then parsed into a JSON AST built from webidl2.js.

This lets us build a complete map of usage of WebIDL across the
specifications of the OWP, which itself has enabled us to run various
analyzers:
* we can run diagnostics on specs to ensure their normative references
are consistent with the various WebIDL "names" they import
* we can detect duplicate or missing definitions (the tool for instance
spotted the bug that the TAG had already separately reported on the
double usage of "Credential" in WebAuthN & Credential API)
* we can easily detect specs with invalid WebIDL fragments

See for instance one report produced recently:
https://github.com/tidoust/reffy/wiki/Report-per-anomaly-(20160711)
(there are known false positives)

On top of that, we also built a more general explorer of WebIDL usage
across specifications:
https://dontcallmedom.github.io/webidlpedia

This explorer lists all the defined WebIDL names (interfaces,
dictionaries, typedef, enums), with information on which specs define
them and which specs makes use of them.

An interesting way to look at these lists is the one sorted by
"popularity" (i.e. highest level of usage by other specs):
https://dontcallmedom.github.io/webidlpedia/?full=popularity
It might be particularly interesting to explore in more depth the
patterns that lead to some dictionaries and enums having 0 usage.

A similar view shows the list of strings that are used as enum values
across specifications:
https://dontcallmedom.github.io/webidlpedia/?enums=popularity
That view could hopefully become useful in bringing more consistency in
these names across specification.

There are obviously many other ways the collected data ought to be
exploited, for instance by exploring which specs make use of which
extended attribute that have particular platform relevance (e.g.
[SecureContext]).

Likewise, there is probably quite a bit more that can be extracted and
analyzed from the list of normative references that the tool collects.

The said tools are available at
https://github.com/tidoust/reffy
https://github.com/dontcallmedom/webidlpedia

Francois and I will likely keep working on these tools time permitting;
we also welcome pull requests on the repos. Should they be of interest
to the TAG in its operations, we would also be happy to discuss how they
can be improved in that direction.

Thanks,

Dom & François

Received on Friday, 15 July 2016 08:00:11 UTC