Meaningless: towards a real-world web semantics observatory

Hi all,

These lists host many debates about the semantics (or lack thereof) of
HTML. Good data that bears on these questions is often hard to come by.
This isn't anyone's fault per sae but it sure would be nice if we had
better data to use as the baseline for discussions about what should (and
shouldn't) be in HTML.next.

In the interest of building such a corpus, I've created a small extension
to help gather information on the real-world semantics that users encounter
in the web; both semantic HTML and extensions to it like Microformats,
schema.org markup, and ARIA roles and states. Crawlers miss a lot as they
(generally) aren't running scripts and interacting deeply with sites, so
this anonymizing system attempts to fill that gap by observing the semantic
content of pages both when the load and as they change over time.

Why cross-post this so broadly? Because I need your help! If you think
evolving the web based on data is better than trying to do it without and
you happen to use Chrome as your browser, please install the extension:


https://chrome.google.com/webstore/detail/meaningless/gmmhpelpfhlofjjolcegdddjadkmincn/details

If you're a developer and use another browser, I'd love your help in
porting the extension to other platforms (FF, Safari, etc.):

    https://github.com/slightlyoff/meaningless

If you're interested in the data, a sparse reporting front-end is currently
in place:

   http://meaningless-stats.appspot.com/global

Help is needed to analyze the data in more meaningful ways, visualize it,
etc. Filing tickets and submitting pull requests is the easiest way to
help: https://github.com/slightlyoff/meaningless/issues

Thanks for your help and attention.

Received on Wednesday, 27 March 2013 13:59:17 UTC