- From: Janet Swisher <jswisher@mozilla.com>
- Date: Sat, 11 Jan 2014 17:03:39 -0600
- To: Pat Tressel <ptressel@myuw.net>, Doug Schepers <schepers@w3.org>
- CC: WebPlatform Community <public-webplatform@w3.org>
- Message-ID: <52D1CDCB.9040109@mozilla.com>
Hi Pat, On 1/11/14 9:41 AM, Pat Tressel wrote: > Doug -- > > We need some coding help to convert HTML tables into JSON, for our > compatibility data project. > Does the script you have already crawl the site and pull out the > tables intact? > > > It looks like the MDN compatibility info is easily findable in their > pages. (I spot-checked their HTML, CSS, and JavaScript references, and > all seem to have very regular structures.) The desktop and mobile > tables have id = compat-desktop and id = compat-mobile, respectively. > Not all pages have both desktop and mobile tables, though. All the > pages I looked at only had a "Basic support" row -- I wonder if some > have additional rows. The MDN compatibility tables are built with macros/templates, so unless the page is very neglected (and therefore not updated to use the standard macros), the table should have an identical structure. Some tables do have multiple rows, if the features were added over time. My worst-case example of a hairy compatibility table is https://developer.mozilla.org/en-US/docs/Web/HTML/Element/Input#Browser_compatibility > > Or, we could operate on the original pages. I'm a bit hesitant to run > a crawler without permission plus a look at MDN's robots.txt, so if > you've already got the complete set of tables, that may be better. (I > recently ran wget to download a student's work from their web site, > and apparently that violated their hosting site's robots.txt, and the > site blocked me! Don't want that to happen again...) I think there was discussion of doing this at time of day that has the least traffic on MDN. I can talk to Mozilla's IT folks to a) give them a heads-up and b) find out the best time to do it. -- Janet Swisher <mailto:jREMOVEswisher@mozilla.com> Mozilla Developer Network <https://developer.mozilla.org> Developer Engagement Community Organizer
Received on Saturday, 11 January 2014 23:04:08 UTC