Data Set with Glyph IDs

Is now available here:
https://drive.google.com/file/d/1rZNbW7Gxk4ipjompzajgufOIRKv4hnZu/view?usp=sharing

The updated data set is in the data_set.12-05-2020.v6.sampled_10.zip file.

This version is sampled to retain 1 in 10 sequences. I'm currently
processing the full unsampled version and that should be available in a few
days. I've included a copy of all of the font files that were used to
derive the glyph ids in the zip file.

Additionally this version features some other improvements to the font
matching and text codepoint extraction from the source pages.

Commit
https://github.com/w3c/PFE-analysis/commit/2ef1c132ca89093ccbc5bd3ca71c83b79ad36609
adds
the glyph_ids field to the proto definition.

Received on Tuesday, 28 July 2020 22:26:36 UTC