W3C home > Mailing lists > Public > public-webplatform@w3.org > January 2014

Re: Converting MDN Compat Data to JSON

From: Pat Tressel <ptressel@myuw.net>
Date: Sun, 12 Jan 2014 04:21:16 -0800
Message-ID: <CABT-+2qZdC+LoZhNrRrBHqG8wDJEe+wgVWySL3nZh5fsUxWd3A@mail.gmail.com>
To: David Kirstein <frozenice@frozenice.de>
Cc: Doug Schepers <schepers@w3.org>, WebPlatform Community <public-webplatform@w3.org>
Hi, David!


> If you can wait until tomorrow, I can put a NodeJS version of the old
> script up, well slightly improved. Iíll need to get some sleep first, as
> itís already a bit late here in Germany. ;)
>

Sure.  ;-)


> It basically generates a list of pages (via tags), grabs the compat tables
> in HTML format and converts them to a nice JS object.
>
Thatís already working for basic tables, Iíll add support for prefixes.
>

Why don't we look at it first, in case it is possible to avoid the work --
see below...


> That <input> table is a beast, though!
>

At least <input> has a table that's in the expected format.  Here's one
without a table:

https://developer.mozilla.org/en-US/docs/Web/API/Document

Ooo, this one has footnotes to its compatibility table...

https://developer.mozilla.org/en-US/docs/Web/API/Element

Could write out lists of pages without the table, and pages have the table
but don't match expectation in some other way.


> The main thing left to do here is converting this internal JS object into
> something that resembles the spec Doug linked... and some Ďminorí things,
> like taming the <input> compat table Janet linked and maybe find a better
> way to generate the list of pages (tag lists are limited to 500 entries,
> removing duplicates from the lists of ['CSS', 'HTML5', 'API', 'WebAPI'], I
> counted about 1.2k pages or so).
>

I was about to ask "What does tag mean in this context?"  :D  But I see the
"Tags" section (i.e. class = tag-list) at the bottom of pages.  So, you're
not crawling the relevant parts of the site looking for pages with an <a>
tag with href #Browser_compatibility?

No worries, no harm has or will be done to the MDN servers! There are
> sensible delays between requests and the tool has caches, so there are
> actually no MDN-requests needed to work on the HTML -> JS conversion or the
> conversion to WPD format. Iíll bundle a filled cache with gracefully
> requested data, so there should be enough data to work on for the start. :)
>

> Pat, can I interest you in working on the HTML parser or the conversion to
> WPD format? The HTML parsing is really easy, as itís written in JS and uses
> a jQuery-like library, as you will see from my code.
>

Maybe I'm not understanding something, as it seems we should not need to do
any parsing.  If we get a page with XMLHttpRequest, then its responseXML is
a document object, which is already parsed.  We could use document methods
or actual jQuery ;-) at that point to select the elements containing the
compatibility info.  Or we could use XSLT to both select and reformat the
info, but that would probably be harder than writing procedural code for
the reformatting.  Ok, cancel the XSLT suggestion.

So, what am I missing?  Are we just using "parse" in different senses?

Catch me here or on IRC; if Iím not responding, Iím probably still
> sleeping! ;)
>

Your nick is in the channel but I won't ping you just yet, as it's 4am my
time, so I'm about to go fall over....zzzz...

-- Pat
Received on Sunday, 12 January 2014 12:21:43 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:20:56 UTC