Re: Converting MDN Compat Data to JSON from Pat Tressel on 2014-01-12 (public-webplatform@w3.org from January 2014)

From: Pat Tressel <ptressel@myuw.net>
Date: Sat, 11 Jan 2014 16:10:06 -0800
To: Doug Schepers <schepers@w3.org>
Cc: WebPlatform Community <public-webplatform@w3.org>
Message-ID: <CABT-+2pD+O5O2=ca5XEkpcZs++5v198PxG8MPiS4x11TGE9Bbw@mail.gmail.com>

> Unfortunately, MDN doesn't expose their compatibility data as JSON, so
>> we'll need to convert their HTML tables into JSON that matches our data
>> model [2]. We already have a script that collects the data (again, as HTML
>> tables) from their site, but we need someone who can reformat and normalize
>> that data.
>>
>> The language used for this task is not important: it could be JavaScript,
>> Python, Ruby, Perl, PHP, or even C. I believe that the best approach may
>> use RegEx, but there might be a better way.
>>
>
> ...
>
I'd be inclined to use Python and Beautiful Soup.  The latter works on
> intact web pages -- I'm not sure about isolated elements, but it would be
> simple enough to tack on a minimal set of <html>, <head>, <body> tags.
> ...
> (I'd be equally inclined to use JavaScript and jQuery if I were set up to
> use them outside a browser. ...
>

The "right" tool is probably XSLT.  But it would probably be faster to get
it working in Python / Beautiful Soup.  ;-)

-- Pat

Received on Sunday, 12 January 2014 00:10:34 UTC