Re: WebPlatform Browser Support phased approach? from Ronald Mansveld on 2013-11-01 (public-webplatform-tests@w3.org from October to December 2013)

From: Ronald Mansveld <ronald@ronaldmansveld.nl>
Date: Fri, 01 Nov 2013 17:47:42 +0000
To: Doug Schepers <schepers@w3.org>
Cc: <public-webplatform-tests@w3.org>, Janet Swisher <jswisher@mozilla.com>
Message-ID: <46e35895655c9489f5badfa7b739f9f8@webmail.byte.nl>
It's OK. I've ran into Jean-Yves here at the London Office, and he 
brought me into contact with some of his american collegues. A bug has 
been filed to have the data be available as JSON, but it seems like 
their raw data are indeed the HTML-tables, so either way it would mean 
parsing that data. Either on their side, or on our side. There have been 
talks about extracting that data to machine-readable, but for now that's 
likely to be in the future.

As for the MediaWiki extension: can you send me an example or spec of 
the precise JSON-formatting it expects?

What might be a solution for now:

- Use MediaWiki and JSON (and all benefits) when CIU/H5T data is 
available
- Bypass MediaWiki and show the MDN table if no CIU/H5T data is 
available

That way a lot of properties would still have the MW extension, just 
the entries that we don't have CIU/H5T data for would have to resort to 
the MDN-fallback.

By simply looking at the analytics-data for the pages, we can always 
decide to manually provide MDN-JSON for pages with high request-rates, 
until a good parser has been written.


I've come a long way parsing the MDN-data to JSON, the main problem is 
that some of the key-data is lumped together in 1 table-cell. So it's 
hard to extract that data in a correct way. I am trying however, just 
not sure about the right way to do so. (Part of the current solution is 
replacing a <br> with a textnode with a specific string, so I have a 
textual marker in the nodeValue where I can split the text on. Parsing 
this data really does feel like clutching at straws to get somewhere at 
times...

Let me know if the fallback-option would be feasible (I'm not too 
familiar with the current set-up of the servers etc, so I can't really 
make a call on that one), or that I should continue parsing the table to 
JSON.


Ronald




Doug Schepers schreef op 2013-11-01 17:32:
> Hi, Ronald–
> 
> Thanks for the update!
> 
> First, I didn't realize that Janet Swisher (Mozilla, and one of the
> founders of this project) didn't know you were working on MDN data, or
> she would have introduced you to someone at Mozilla. Maybe that's the
> contact you made already. In any case, she can confirm that.
> 
> I'm okay with keeping the data as tables for now, if that makes your
> life easier. But I do want to note that it would be much better to
> have it as JSON, because that's what the MediaWiki extension is
> expecting.
> 
> If we keep the data as tables, we will need to rewrite the MediaWiki
> extension to deal with that instead; it would also make it difficult
> (maybe prohibitively so) to make the "icon view" at the top of the
> page, since we'd need to parse and reformat the data.
> 
> So, there is extra work to be done either way: either we rewrite the
> extension and lose some functionality (for now); or we find a way to
> parse the MDN tables into JSON. I don't know which would be more work.
> I do know that JSON is the final format we want the data in, so I'd
> like to shoot for that if we can.
> 
> I don't want to put all the work on you, especially since you've been
> so awesome on driving this forward. How about as a next step, you
> expose the data you've collected, and someone (me?) looks at making a
> regex that normalizes it, or at least assesses which approach will be
> more work?
> 
> Again, if we can get by with tables with not much work, then I agree
> we should do that.
> 
> Regards-
> -Doug
> 
> On 11/1/13 12:50 PM, Ronald Mansveld wrote:
>> It have been some pretty productive days, with both ups and downs.
>> 
>> the data from both CIU and H5T have been pretty easy to parse, mostly
>> because this data is already available in JSON-format. MDN-data is a
>> different story though.
>> 
>> At this point, the MDN-data is _not_ available as JSON. I can get a
>> JSON-feed, but that only states that a compatibility-section is
>> available. It doesn't give the data. So, I had to resort to scraping.
>> 
>> However, even though the data may be in a table, which makes the 
>> general
>> parsing pretty easy, some of the data actually isn't that nice 
>> embedded
>> in tags. For instance: the version-numbers for prefixed use and
>> non-prefixed use are only separated by a line-break.
>> 
>> I've come a long way, but it most certainly isn't yet at the level I
>> want it to be. So I'm actually thinking of not even trying to parse 
>> the
>> MDN-data, and just use the HTML-table as is.
>> 
>> By parsing the CIU and H5T data into tables of the same formatting, 
>> we
>> still can have a uniformed layout on the site.
>> 
>> 
>> I have been given a contact within MDN, so I'll try to work with them 
>> to
>> make the data available as JSON, so we can do a better integration 
>> after
>> this first phase.
>> 
>> 
>> Any thoughts/comments?
>> 
>> I'll continue working once I'm back in NL, if no-one objects, I'd 
>> like
>> to go for the table option, which could be up and running pretty 
>> soon. I
>> don't see too many downsides, given the fact this is just a temporary
>> solution so we can go live with the CSS-part of the site, and a more
>> future-proof solution will be build once this is up and running.
>> 
>> 
>> Ronald
>> 
>> 
>> 
>> 
>> Doug Schepers schreef op 2013-10-30 17:59:
>>> Hi, Ronald–
>>> 
>>> Thanks for the update! Looking forward to seeing it.
>>> 
>>> Since we eventually plan to have tests for each assertion, and
>>> results based on running those tests against browsers (versions, 
>>> OSs,
>>> etc.), it makes the most sense to expand the data from MDN to a
>>> version-range, if that's doable. That will be the most consistent 
>>> with
>>> our plans.
>>> 
>>> Note that in reality, there are regressions. For example, Chrome has
>>> dropped support for MathML, and other browsers have dropped features
>>> as well (e.g. some SVG stuff). But we'll deal with that once the
>>> infrastructure for reporting test results is more mature.
>>> 
>>> Regards-
>>> -Doug
>>> 
>>> 
>>> On 10/30/13 11:29 AM, Ronald Mansveld wrote:
>>>> OK, I've come a long way so far. There is just one decision to be
>>>> made:
>>>> 
>>>> MDN provides the compat data not per version, but rather a
>>>> since-version.
>>>> 
>>>> Both caniuse and html5test provide the data per version (where
>>>> available).
>>>> 
>>>> 
>>>> What do we want to use? I can collapse the data from caniuse and
>>>> html5test to a since version pretty easily. Expanding the data from
>>>> MDN from a since-version up to a complete version-range might be
>>>> doable as well, although I have to rely on the browser-data 
>>>> provided
>>>> in the feeds from CIU and H5T to determine what versions are
>>>> available.
>>>> 
>>>> Anyone with arguments towards or against either option?
>>>> 
>>>> 
>>>> 
>>>> Ronald
>>>> 
>>>> 
>>>> Doug Schepers schreef op 2013-10-29 06:18:
>>>>> Hi, Ronald–
>>>>> 
>>>>> Since we're going with this phased approach (which I fully
>>>>> support), I think we should do 2 things:
>>>>> 
>>>>> 1) Use the MDN data as the baseline, since they have fairly
>>>>> complete data and a similar feature level as WPD (e.g., they have
>>>>> basically the same page names as we do); this means you'll have to
>>>>> collect this data via MDN's API;
>>>>> 
>>>>> 2) Supplement that baseline data with CanIUse and HTML5Test data
>>>>> where there is an equivalent feature name (e.g. "border-radius");
>>>>> we'll have to wait for QuirksMode and MobileHTML5 data until we
>>>>> have the source for that, but we will launch an "explainer" page
>>>>> that tells about all our data sources and our timeline.
>>>>> 
>>>>> Does this seem like a doable approach?
>>>>> 
>>>>> Regards- -Doug
>>>>> 
>>>>> On 10/23/13 9:24 PM, Julee wrote:
>>>>>> Thanks much, Ronald! And everyone who is sharing their data as
>>>>>> is!
>>>>>> 
>>>>>> I've sent feelers out regarding a work space in London next week.
>>>>>>  Will let you know if I hear anything.
>>>>>> 
>>>>>> In the meantime, do you have a sense of how long it might take
>>>>>> to normalize this phase-1 data? No biggie, just looking to fill
>>>>>> out the CSS-properties schedule.
>>>>>> 
>>>>>> Regards!
>>>>>> 
>>>>>> Julee ---------------------------- julee@adobe.com @adobejulee
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -----Original Message----- From: Ronald Mansveld
>>>>>> <ronald@ronaldmansveld.nl> Date: Tuesday, October 22, 2013 3:47
>>>>>> PM To: Alex Komoroske <komoroske@google.com> Cc: Niels Leenheer
>>>>>> <info@html5test.com>, julee <julee@adobe.com>,
>>>>>> "public-webplatform-tests@w3.org"
>>>>>> <public-webplatform-tests@w3.org> Subject: Re: WebPlatform
>>>>>> Browser Support phased approach?
>>>>>> 
>>>>>>> Alex Komoroske schreef op 2013-10-22 17:48:
>>>>>>>> I strongly support a phased approach. I'm very excited about
>>>>>>>> the prospect of having a more robust system set up, but as
>>>>>>>> far as the CSS Properties launch goes, it's more important to
>>>>>>>> have _something_, even if it's just a one-time import from a
>>>>>>>> couple of sources.
>>>>>>>> 
>>>>>>> 
>>>>>>> I feel like there is support to do a phased approach, plus we
>>>>>>> have access to a (basic) set of data to get started. Coupled
>>>>>>> with the urgency to get CSS live (which I absolutely support,
>>>>>>> we've been in alpha long enough now ;) ), I think this is
>>>>>>> indeed the right path to follow. Plus, this buys us time to
>>>>>>> come up with a good plan and schemata for the data-exchange we
>>>>>>> want to use in the future.
>>>>>>> 
>>>>>>> 
>>>>>>> Next week I'll be in London, if anyone knows a place to work
>>>>>>> for me I can start building the first scripts to parse the
>>>>>>> data. I've checked out the Mozilla Open Office, but to me it's
>>>>>>> pretty unclear whether that is still in use, and if so: if and
>>>>>>> how I can use it. Do we have any Mozilla-employees on the list?
>>>>>>> Or do we have Googlers that know if perhaps the Google office
>>>>>>> can be used? Or any Londoners that know of a place?
>>>>>>> 
>>>>>>> Worst case scenario I think I can use the City Business
>>>>>>> Library, but my experience is that libraries are not always the
>>>>>>> best place to work from, especially not if you try to make full
>>>>>>> office hours.
>>>>>>> 
>>>>>>> 
>>>>>>> Ronald
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>
Received on Friday, 1 November 2013 17:48:10 UTC