Meaningless: towards a real-world web semantics observatory

Hi all,

These lists host many debates about the semantics (or lack thereof) of
HTML. Good data that bears on these questions is often hard to come by.
This isn't anyone's fault per sae but it sure would be nice if we had
better data to use as the baseline for discussions about what should (and
shouldn't) be in

In the interest of building such a corpus, I've created a small extension
to help gather information on the real-world semantics that users encounter
in the web; both semantic HTML and extensions to it like Microformats, markup, and ARIA roles and states. Crawlers miss a lot as they
(generally) aren't running scripts and interacting deeply with sites, so
this anonymizing system attempts to fill that gap by observing the semantic
content of pages both when the load and as they change over time.

Why cross-post this so broadly? Because I need your help! If you think
evolving the web based on data is better than trying to do it without and
you happen to use Chrome as your browser, please install the extension:

If you're a developer and use another browser, I'd love your help in
porting the extension to other platforms (FF, Safari, etc.):

If you're interested in the data, a sparse reporting front-end is currently
in place:

Help is needed to analyze the data in more meaningful ways, visualize it,
etc. Filing tickets and submitting pull requests is the easiest way to

Thanks for your help and attention.

Received on Wednesday, 27 March 2013 13:59:17 UTC