- From: David Kirstein <frozenice@frozenice.de>
- Date: Sun, 12 Jan 2014 13:20:30 +0100
- To: "'Pat Tressel'" <ptressel@myuw.net>, "'Doug Schepers'" <schepers@w3.org>
- Cc: "'WebPlatform Community'" <public-webplatform@w3.org>
- Message-ID: <001e01cf0f90$b01335f0$1039a1d0$@frozenice.de>
Alright, see here: https://github.com/webplatform/mdn-compat-importer I put some of the current issues and general info into the README. I hope this serves as a quick start and JS as the development language will appeal to a lot of people here. If anyone (Pat?) wants to hack away at the issues, that would be grrrreat! -fro From: David Kirstein [mailto:frozenice@frozenice.de] Sent: Sonntag, 12. Januar 2014 02:16 To: 'Pat Tressel'; 'Doug Schepers' Cc: 'WebPlatform Community' Subject: RE: Converting MDN Compat Data to JSON Heya, If you can wait until tomorrow, I can put a NodeJS version of the old script up, well slightly improved. I'll need to get some sleep first, as it's already a bit late here in Germany. ;) It basically generates a list of pages (via tags), grabs the compat tables in HTML format and converts them to a nice JS object. That's already working for basic tables, I'll add support for prefixes. That <input> table is a beast, though! The main thing left to do here is converting this internal JS object into something that resembles the spec Doug linked... and some 'minor' things, like taming the <input> compat table Janet linked and maybe find a better way to generate the list of pages (tag lists are limited to 500 entries, removing duplicates from the lists of ['CSS', 'HTML5', 'API', 'WebAPI'], I counted about 1.2k pages or so). No worries, no harm has or will be done to the MDN servers! There are sensible delays between requests and the tool has caches, so there are actually no MDN-requests needed to work on the HTML -> JS conversion or the conversion to WPD format. I'll bundle a filled cache with gracefully requested data, so there should be enough data to work on for the start. :) Pat, can I interest you in working on the HTML parser or the conversion to WPD format? The HTML parsing is really easy, as it's written in JS and uses a jQuery-like library, as you will see from my code. Catch me here or on IRC; if I'm not responding, I'm probably still sleeping! ;) -fro From: ptressel@uw.edu [mailto:ptressel@uw.edu] On Behalf Of Pat Tressel Sent: Sonntag, 12. Januar 2014 01:10 To: Doug Schepers Cc: WebPlatform Community Subject: Re: Converting MDN Compat Data to JSON Unfortunately, MDN doesn't expose their compatibility data as JSON, so we'll need to convert their HTML tables into JSON that matches our data model [2]. We already have a script that collects the data (again, as HTML tables) from their site, but we need someone who can reformat and normalize that data. The language used for this task is not important: it could be JavaScript, Python, Ruby, Perl, PHP, or even C. I believe that the best approach may use RegEx, but there might be a better way. ... I'd be inclined to use Python and Beautiful Soup. The latter works on intact web pages -- I'm not sure about isolated elements, but it would be simple enough to tack on a minimal set of <html>, <head>, <body> tags. ... (I'd be equally inclined to use JavaScript and jQuery if I were set up to use them outside a browser. ... The "right" tool is probably XSLT. But it would probably be faster to get it working in Python / Beautiful Soup. ;-) -- Pat
Attachments
- application/pkcs7-signature attachment: smime.p7s
Received on Sunday, 12 January 2014 12:20:58 UTC