Converting MDN Compat Data to JSON

Hi, folks–

We need some coding help to convert HTML tables into JSON, for our 
compatibility data project.

As I've explained elsewhere [1], we have several goals for our browser 
compatibility information:

1) collect the most accurate data we can, from multiple trusted sources
2) store the data all in JSON, available for anyone to use via our API
3) use a MediaWiki extension to automatically populate the right pages 
with their relevant data

We've made some progress on this, such as developing a data model [2], 
but gotten stalled approaching the holidays. I'd like to find help to 
bring us across the finish line.

We should do this in multiple passes. The first pass will simply be to 
populate the pages with at least one source of data; the best match for 
our page structure is MDN.

Unfortunately, MDN doesn't expose their compatibility data as JSON, so 
we'll need to convert their HTML tables into JSON that matches our data 
model [2]. We already have a script that collects the data (again, as 
HTML tables) from their site, but we need someone who can reformat and 
normalize that data.

The language used for this task is not important: it could be 
JavaScript, Python, Ruby, Perl, PHP, or even C. I believe that the best 
approach may use RegEx, but there might be a better way.

So, I'm asking you all to help in one of a few ways:

1) If you think you might know how to do this, and have time and energy 
to see it through, please let us know!

2) If you think you might know someone who can help, please introduce us!

3) If you can't do the task, nor know someone who could, please help me 
refine this message so we can put the call out, explaining what we are 
doing and what we need.

[1] 
http://lists.w3.org/Archives/Public/public-webplatform-tests/2013OctDec/0000.html
[2] http://www.ronaldmansveld.nl/webplatform/compat_tables_datamodel.html

Regards-
-Doug

Received on Saturday, 11 January 2014 09:21:15 UTC