- From: Alex Russell <slightlyoff@google.com>
- Date: Wed, 27 Mar 2013 15:25:43 +0000
- To: Noah Mendelsohn <nrm@arcanedomain.com>
- Cc: "www-tag@w3.org List" <www-tag@w3.org>, Tantek Çelik <tcelik@mozilla.com>
- Message-ID: <CANr5HFU=hWs41EXdSWCME-JjtrNLhhdMHcLCM5_pe5DMThNnYg@mail.gmail.com>
On Wednesday, March 27, 2013, Noah Mendelsohn wrote: > (Leaving off most of the cc: list to avoid cross-posted discussion. > Nothing sensitive here -- feel feel to forward if useful.) > > This looks very cool. Would it be easy/reasonable/in-the-spirit-**of-the-thing > to extend it start gathering statistics on JSON, XML, various forms of RDF, > RDF-a, etc? So the way it works is by analyzing nodes as you browse. If you can think of a lightweight way to charachterize an element as being in one of these buckets, patches welcome! > For that matter, it would also be >really< interesting to watch things > like content that will be interpreted differently by the HTML5 sniffing > rules than by following authoritative metadata. > How would we detect such a thing? > In general, you seem to be on a very nice slippery slope of building a > dashboard for the Web's data/content encoding. Are you interested in > heading further down the slope? Happy to extend this to gather whatever data can be both truly anonymous and inexpensively characterized. > Noah > > On 3/27/2013 9:58 AM, Alex Russell wrote: > >> Hi all, >> >> These lists host many debates about the semantics (or lack thereof) of >> HTML. Good data that bears on these questions is often hard to come by. >> This isn't anyone's fault per sae but it sure would be nice if we had >> better data to use as the baseline for discussions about what should (and >> shouldn't) be in HTML.next. >> >> In the interest of building such a corpus, I've created a small extension >> to help gather information on the real-world semantics that users >> encounter >> in the web; both semantic HTML and extensions to it like Microformats, >> schema.org <http://schema.org> markup, and ARIA roles and states. >> Crawlers >> miss a lot as they (generally) aren't running scripts and interacting >> deeply with sites, so this anonymizing system attempts to fill that gap by >> observing the semantic content of pages both when the load and as they >> change over time. >> >> Why cross-post this so broadly? Because I need your help! If you think >> evolving the web based on data is better than trying to do it without and >> you happen to use Chrome as your browser, please install the extension: >> >> https://chrome.google.com/**webstore/detail/meaningless/** >> gmmhpelpfhlofjjolcegdddjadkmin**cn/details<https://chrome.google.com/webstore/detail/meaningless/gmmhpelpfhlofjjolcegdddjadkmincn/details> >> >> If you're a developer and use another browser, I'd love your help in >> porting the extension to other platforms (FF, Safari, etc.): >> >> https://github.com/**slightlyoff/meaningless<https://github.com/slightlyoff/meaningless> >> >> If you're interested in the data, a sparse reporting front-end is >> currently >> in place: >> >> http://meaningless-stats.**appspot.com/global<http://meaningless-stats.appspot.com/global> >> >> Help is needed to analyze the data in more meaningful ways, visualize it, >> etc. Filing tickets and submitting pull requests is the easiest way to >> help: https://github.com/**slightlyoff/meaningless/issues<https://github.com/slightlyoff/meaningless/issues> >> >> Thanks for your help and attention. >> >
Received on Wednesday, 27 March 2013 15:26:21 UTC