- From: Doug Schepers <schepers@w3.org>
- Date: Tue, 05 Nov 2013 02:12:11 -0500
- To: Ronald Mansveld <ronald@ronaldmansveld.nl>
- CC: public-webplatform-tests@w3.org
Hi, Ronald– I chatted with Julee about this, and though I agree with her priorities, we agreed that it would probably take as much (or more) time to adapt the MW extension as it would to parse out the MDN output into JSON, and that the latter met our long-term goals better as well. But I don't want to put too much on your plate. I'd be happy to help writing the parser, if you don't have time to do it (if you do want to do it, I certainly won't object, but I don't want to overwork you). Could you make your existing data and code available on github, so we can start looking at how to solve this problem? It would be great if you and I could find a few minutes to chat via skype in the next couple of days. Regards- -Doug On 11/1/13 1:47 PM, Ronald Mansveld wrote: > It's OK. I've ran into Jean-Yves here at the London Office, and he > brought me into contact with some of his american collegues. A bug has > been filed to have the data be available as JSON, but it seems like > their raw data are indeed the HTML-tables, so either way it would mean > parsing that data. Either on their side, or on our side. There have been > talks about extracting that data to machine-readable, but for now that's > likely to be in the future. > > As for the MediaWiki extension: can you send me an example or spec of > the precise JSON-formatting it expects? > > What might be a solution for now: > > - Use MediaWiki and JSON (and all benefits) when CIU/H5T data is available > - Bypass MediaWiki and show the MDN table if no CIU/H5T data is available > > That way a lot of properties would still have the MW extension, just the > entries that we don't have CIU/H5T data for would have to resort to the > MDN-fallback. > > By simply looking at the analytics-data for the pages, we can always > decide to manually provide MDN-JSON for pages with high request-rates, > until a good parser has been written. > > > I've come a long way parsing the MDN-data to JSON, the main problem is > that some of the key-data is lumped together in 1 table-cell. So it's > hard to extract that data in a correct way. I am trying however, just > not sure about the right way to do so. (Part of the current solution is > replacing a <br> with a textnode with a specific string, so I have a > textual marker in the nodeValue where I can split the text on. Parsing > this data really does feel like clutching at straws to get somewhere at > times... > > Let me know if the fallback-option would be feasible (I'm not too > familiar with the current set-up of the servers etc, so I can't really > make a call on that one), or that I should continue parsing the table to > JSON. > > > Ronald > > > > > Doug Schepers schreef op 2013-11-01 17:32: >> Hi, Ronald– >> >> Thanks for the update! >> >> First, I didn't realize that Janet Swisher (Mozilla, and one of the >> founders of this project) didn't know you were working on MDN data, or >> she would have introduced you to someone at Mozilla. Maybe that's the >> contact you made already. In any case, she can confirm that. >> >> I'm okay with keeping the data as tables for now, if that makes your >> life easier. But I do want to note that it would be much better to >> have it as JSON, because that's what the MediaWiki extension is >> expecting. >> >> If we keep the data as tables, we will need to rewrite the MediaWiki >> extension to deal with that instead; it would also make it difficult >> (maybe prohibitively so) to make the "icon view" at the top of the >> page, since we'd need to parse and reformat the data. >> >> So, there is extra work to be done either way: either we rewrite the >> extension and lose some functionality (for now); or we find a way to >> parse the MDN tables into JSON. I don't know which would be more work. >> I do know that JSON is the final format we want the data in, so I'd >> like to shoot for that if we can. >> >> I don't want to put all the work on you, especially since you've been >> so awesome on driving this forward. How about as a next step, you >> expose the data you've collected, and someone (me?) looks at making a >> regex that normalizes it, or at least assesses which approach will be >> more work? >> >> Again, if we can get by with tables with not much work, then I agree >> we should do that. >> >> Regards- >> -Doug >> >> On 11/1/13 12:50 PM, Ronald Mansveld wrote: >>> It have been some pretty productive days, with both ups and downs. >>> >>> the data from both CIU and H5T have been pretty easy to parse, mostly >>> because this data is already available in JSON-format. MDN-data is a >>> different story though. >>> >>> At this point, the MDN-data is _not_ available as JSON. I can get a >>> JSON-feed, but that only states that a compatibility-section is >>> available. It doesn't give the data. So, I had to resort to scraping. >>> >>> However, even though the data may be in a table, which makes the general >>> parsing pretty easy, some of the data actually isn't that nice embedded >>> in tags. For instance: the version-numbers for prefixed use and >>> non-prefixed use are only separated by a line-break. >>> >>> I've come a long way, but it most certainly isn't yet at the level I >>> want it to be. So I'm actually thinking of not even trying to parse the >>> MDN-data, and just use the HTML-table as is. >>> >>> By parsing the CIU and H5T data into tables of the same formatting, we >>> still can have a uniformed layout on the site. >>> >>> >>> I have been given a contact within MDN, so I'll try to work with them to >>> make the data available as JSON, so we can do a better integration after >>> this first phase. >>> >>> >>> Any thoughts/comments? >>> >>> I'll continue working once I'm back in NL, if no-one objects, I'd like >>> to go for the table option, which could be up and running pretty soon. I >>> don't see too many downsides, given the fact this is just a temporary >>> solution so we can go live with the CSS-part of the site, and a more >>> future-proof solution will be build once this is up and running. >>> >>> >>> Ronald >>> >>> >>> >>> >>> Doug Schepers schreef op 2013-10-30 17:59: >>>> Hi, Ronald– >>>> >>>> Thanks for the update! Looking forward to seeing it. >>>> >>>> Since we eventually plan to have tests for each assertion, and >>>> results based on running those tests against browsers (versions, OSs, >>>> etc.), it makes the most sense to expand the data from MDN to a >>>> version-range, if that's doable. That will be the most consistent with >>>> our plans. >>>> >>>> Note that in reality, there are regressions. For example, Chrome has >>>> dropped support for MathML, and other browsers have dropped features >>>> as well (e.g. some SVG stuff). But we'll deal with that once the >>>> infrastructure for reporting test results is more mature. >>>> >>>> Regards- >>>> -Doug >>>> >>>> >>>> On 10/30/13 11:29 AM, Ronald Mansveld wrote: >>>>> OK, I've come a long way so far. There is just one decision to be >>>>> made: >>>>> >>>>> MDN provides the compat data not per version, but rather a >>>>> since-version. >>>>> >>>>> Both caniuse and html5test provide the data per version (where >>>>> available). >>>>> >>>>> >>>>> What do we want to use? I can collapse the data from caniuse and >>>>> html5test to a since version pretty easily. Expanding the data from >>>>> MDN from a since-version up to a complete version-range might be >>>>> doable as well, although I have to rely on the browser-data provided >>>>> in the feeds from CIU and H5T to determine what versions are >>>>> available. >>>>> >>>>> Anyone with arguments towards or against either option? >>>>> >>>>> >>>>> >>>>> Ronald >>>>> >>>>> >>>>> Doug Schepers schreef op 2013-10-29 06:18: >>>>>> Hi, Ronald– >>>>>> >>>>>> Since we're going with this phased approach (which I fully >>>>>> support), I think we should do 2 things: >>>>>> >>>>>> 1) Use the MDN data as the baseline, since they have fairly >>>>>> complete data and a similar feature level as WPD (e.g., they have >>>>>> basically the same page names as we do); this means you'll have to >>>>>> collect this data via MDN's API; >>>>>> >>>>>> 2) Supplement that baseline data with CanIUse and HTML5Test data >>>>>> where there is an equivalent feature name (e.g. "border-radius"); >>>>>> we'll have to wait for QuirksMode and MobileHTML5 data until we >>>>>> have the source for that, but we will launch an "explainer" page >>>>>> that tells about all our data sources and our timeline. >>>>>> >>>>>> Does this seem like a doable approach? >>>>>> >>>>>> Regards- -Doug >>>>>> >>>>>> On 10/23/13 9:24 PM, Julee wrote: >>>>>>> Thanks much, Ronald! And everyone who is sharing their data as >>>>>>> is! >>>>>>> >>>>>>> I've sent feelers out regarding a work space in London next week. >>>>>>> Will let you know if I hear anything. >>>>>>> >>>>>>> In the meantime, do you have a sense of how long it might take >>>>>>> to normalize this phase-1 data? No biggie, just looking to fill >>>>>>> out the CSS-properties schedule. >>>>>>> >>>>>>> Regards! >>>>>>> >>>>>>> Julee ---------------------------- julee@adobe.com @adobejulee >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -----Original Message----- From: Ronald Mansveld >>>>>>> <ronald@ronaldmansveld.nl> Date: Tuesday, October 22, 2013 3:47 >>>>>>> PM To: Alex Komoroske <komoroske@google.com> Cc: Niels Leenheer >>>>>>> <info@html5test.com>, julee <julee@adobe.com>, >>>>>>> "public-webplatform-tests@w3.org" >>>>>>> <public-webplatform-tests@w3.org> Subject: Re: WebPlatform >>>>>>> Browser Support phased approach? >>>>>>> >>>>>>>> Alex Komoroske schreef op 2013-10-22 17:48: >>>>>>>>> I strongly support a phased approach. I'm very excited about >>>>>>>>> the prospect of having a more robust system set up, but as >>>>>>>>> far as the CSS Properties launch goes, it's more important to >>>>>>>>> have _something_, even if it's just a one-time import from a >>>>>>>>> couple of sources. >>>>>>>>> >>>>>>>> >>>>>>>> I feel like there is support to do a phased approach, plus we >>>>>>>> have access to a (basic) set of data to get started. Coupled >>>>>>>> with the urgency to get CSS live (which I absolutely support, >>>>>>>> we've been in alpha long enough now ;) ), I think this is >>>>>>>> indeed the right path to follow. Plus, this buys us time to >>>>>>>> come up with a good plan and schemata for the data-exchange we >>>>>>>> want to use in the future. >>>>>>>> >>>>>>>> >>>>>>>> Next week I'll be in London, if anyone knows a place to work >>>>>>>> for me I can start building the first scripts to parse the >>>>>>>> data. I've checked out the Mozilla Open Office, but to me it's >>>>>>>> pretty unclear whether that is still in use, and if so: if and >>>>>>>> how I can use it. Do we have any Mozilla-employees on the list? >>>>>>>> Or do we have Googlers that know if perhaps the Google office >>>>>>>> can be used? Or any Londoners that know of a place? >>>>>>>> >>>>>>>> Worst case scenario I think I can use the City Business >>>>>>>> Library, but my experience is that libraries are not always the >>>>>>>> best place to work from, especially not if you try to make full >>>>>>>> office hours. >>>>>>>> >>>>>>>> >>>>>>>> Ronald >>>>>>> >>>>>>> >>>>>>> >>>>> >>>
Received on Tuesday, 5 November 2013 07:12:21 UTC