- From: Doug Schepers <schepers@w3.org>
- Date: Wed, 21 May 2014 19:29:53 -0400
- To: "public-webplatform-tests@w3.org" <public-webplatform-tests@w3.org>
- Message-ID: <537D36F1.6000606@w3.org>
Hi, folks– We originally thought that we had gotten all the compat table data from MDN, using Frozenice's importer [1], but it turns out that data was incomplete. The importer relied on the feeds from various topic tags (e.g. HTML, CSS, SVG), but those feeds were limited to around 500 pages. So, Pat Tressel volunteered to create a crawler/scraper (based on Apache Nutch) that would retrieve the full list of pages that have compatibility information; she made some progress with this, but ran into deployment problems (if I understand correctly). I talked today with Frozenice, Pat, and Renoir, and we came up with a plan that doesn't get all the data, but does get most of it that we're interested in for the short term. I visited the index pages for each of the main topics that seem to have compat tables: CSS (including properties and selectors), HTML Elements, HTML Attributes, SVG Elements, SVG Attributes, DOM Interfaces, JavaScript APIs, and JavaScript. I manually collected all the URLs to the pages for those topics (thanks to a clever console hack from Renoir that made it a snap), and collated them together (see attached file 'page-list.txt'). We can use this list of pages as a poor-man's crawler for Frozenice's importer. (Fro says "I think we only need to get rid of https://github.com/webplatform/mdn-compat-importer/blob/master/index.js#L25 and put the master list into reader.links instead".) Pat did more research on long-tail MDN pages that may be candidates for other useful compat-table info (and also on pages that turned out to be dead ends), which I'm attaching as 'seed-page-list.txt'. We should look more at that for the next phases. So, for the first phase, we'll go with those pages that are listed in the page-list attachment; please go through those and see if there are conspicuous inclusions and exclusions. Hopefully, we can have meaningful results in a week or two. [1] https://github.com/webplatform/mdn-compat-importer Regards- -Doug
Attachments
- text/plain attachment: page-list.txt
- text/plain attachment: seed-page-list.txt
Received on Wednesday, 21 May 2014 23:30:06 UTC