W3C home > Mailing lists > Public > spec-prod@w3.org > April to June 2021

Re: How to find all spec editors in my company?

From: Dominique Hazael-Massieux <dom@w3.org>
Date: Thu, 6 May 2021 18:51:28 +0200
To: Jeffrey Yasskin <jyasskin@google.com>, Marcos Caceres <marcosc@w3.org>
Cc: spec-prod <spec-prod@w3.org>
Message-ID: <c8187eac-edc0-efd4-4486-8c36b609adef@w3.org>
Le 05/05/2021 à 07:19, Dominique Hazael-Massieux a écrit :
> Reffy is made to run on the list of specs maintained in browser-specs
> [5] - if your crawl needs to run on a different list, some further
> customization might be needed (happy to help with them).

I've ended up hacking my way through this [1] (very much a
work-in-progress), which has made it possible to extract editors and
their affiliations from 313 specs, with a few miss (whose affiliation
appear as "undetermined" in the attached data - available both as JSON
and CSV).

This is still very much a ad-hoc process, and more importantly, it
extract data "only" from 313 specs in browser-specs [2], which means
both that there are a few browser-specs specs from which it couldn't
extract the information, and more importantly, that it isn't looking at
the many known W3C specs that aren't in browser-specs, and even less so
at the many other specs (e.g. from CGs) that aren't in browser-specs.

It would be relatively easy to add the known W3C specs that aren't in
browser-specs; much harder to get data from other CGs specs since I
don't think we have a good mechanism to track their existence at this
point (although the data collected by the CG monitor [3] might be a
starting point).

I'll wait to see if this data is useful and used before looking into
making the whole thing more robust.

Dom


1.
https://github.com/w3c/reffy/blob/spec-crawler/src/cli/extract-editors.js
with the extracted data post-processed with
https://gist.github.com/dontcallmedom/290986d35a8991a163f805e1692ff53a
2. https://github.com/w3c/browser-specs
3. https://w3c.github.io/cg-monitor/

Received on Thursday, 6 May 2021 16:52:02 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 6 May 2021 16:52:07 UTC