Re: Converting MDN Compat Data to JSON from Doug Schepers on 2014-01-12 (public-webplatform@w3.org from January 2014)

From: Doug Schepers <schepers@w3.org>
Date: Sun, 12 Jan 2014 12:31:05 -0500
To: David Kirstein <frozenice@frozenice.de>
CC: 'Max Polk' <maxpolk@gmail.com>, 'Pat Tressel' <ptressel@myuw.net>, 'Webplatform List' <public-webplatform@w3.org>
Message-ID: <52D2D159.6050306@w3.org>
Yeah, absolutely, Github is the place we're doing that now (though we'll 
likely have our own git repo, mirrored on Github, for the future).

Max, I'll add you, if you tell me your github account.

Regards-
-Doug

On 1/12/14 9:45 AM, David Kirstein wrote:
> Well, the WebPlatform organization (https://github.com/webplatform)
> seems like the logical choice. There already are several projects there
> and more will come.
>
> I’m sure Doug will figure something out. :)
>
> -fro
>
> *From:*Max Polk [mailto:maxpolk@gmail.com]
> *Sent:* Sonntag, 12. Januar 2014 15:14
> *To:* Pat Tressel
> *Cc:* Webplatform List; frozenice@frozenice.de; Doug Schepers
> *Subject:* Re: Converting MDN Compat Data to JSON
>
> We are in need of a place, perhaps in github, for all the tools we are
> building.  Some Python and shell scripting I have in a private repo, but
> webplatform needs a educated space for it and someone to keep it
> organized, so others later can use them for future needs.
>
> Is there already such a place?
>
> On Jan 12, 2014 7:21 AM, Pat Tressel <ptressel@myuw.net
> <mailto:ptressel@myuw.net>> wrote:
>
> Hi, David!
>
>     If you can wait until tomorrow, I can put a NodeJS version of the
>     old script up, well slightly improved. I’ll need to get some sleep
>     first, as it’s already a bit late here in Germany. ;)
>
> Sure.  ;-)
>
>     It basically generates a list of pages (via tags), grabs the compat
>     tables in HTML format and converts them to a nice JS object.
>
>     That’s already working for basic tables, I’ll add support for prefixes.
>
> Why don't we look at it first, in case it is possible to avoid the work
> -- see below...
>
>     That <input> table is a beast, though!
>
> At least <input> has a table that's in the expected format.  Here's one
> without a table:
>
> https://developer.mozilla.org/en-US/docs/Web/API/Document
>
> Ooo, this one has footnotes to its compatibility table...
>
> https://developer.mozilla.org/en-US/docs/Web/API/Element
>
> Could write out lists of pages without the table, and pages have the
> table but don't match expectation in some other way.
>
>     The main thing left to do here is converting this internal JS object
>     into something that resembles the spec Doug linked... and some
>     ‘minor’ things, like taming the <input> compat table Janet linked
>     and maybe find a better way to generate the list of pages (tag lists
>     are limited to 500 entries, removing duplicates from the lists of
>     ['CSS', 'HTML5', 'API', 'WebAPI'], I counted about 1.2k pages or so).
>
> I was about to ask "What does tag mean in this context?"  :D  But I see
> the "Tags" section (i.e. class = tag-list) at the bottom of pages.  So,
> you're not crawling the relevant parts of the site looking for pages
> with an <a> tag with href #Browser_compatibility?
>
>     No worries, no harm has or will be done to the MDN servers! There
>     are sensible delays between requests and the tool has caches, so
>     there are actually no MDN-requests needed to work on the HTML -> JS
>     conversion or the conversion to WPD format. I’ll bundle a filled
>     cache with gracefully requested data, so there should be enough data
>     to work on for the start. :)
>
>     Pat, can I interest you in working on the HTML parser or the
>     conversion to WPD format? The HTML parsing is really easy, as it’s
>     written in JS and uses a jQuery-like library, as you will see from
>     my code.
>
> Maybe I'm not understanding something, as it seems we should not need to
> do any parsing.  If we get a page with XMLHttpRequest, then its
> responseXML is a document object, which is already parsed.  We could use
> document methods or actual jQuery ;-) at that point to select the
> elements containing the compatibility info.  Or we could use XSLT to
> both select and reformat the info, but that would probably be harder
> than writing procedural code for the reformatting.  Ok, cancel the XSLT
> suggestion.
>
> So, what am I missing?  Are we just using "parse" in different senses?
>
>     Catch me here or on IRC; if I’m not responding, I’m probably still
>     sleeping! ;)
>
> Your nick is in the channel but I won't ping you just yet, as it's 4am
> my time, so I'm about to go fall over....zzzz...
>
> -- Pat
>
Received on Sunday, 12 January 2014 17:31:14 UTC