W3C home > Mailing lists > Public > www-tag@w3.org > March 2013

Re: Meaningless: towards a real-world web semantics observatory

From: Alex Russell <slightlyoff@google.com>
Date: Wed, 27 Mar 2013 15:25:43 +0000
Message-ID: <CANr5HFU=hWs41EXdSWCME-JjtrNLhhdMHcLCM5_pe5DMThNnYg@mail.gmail.com>
To: Noah Mendelsohn <nrm@arcanedomain.com>
Cc: "www-tag@w3.org List" <www-tag@w3.org>, Tantek Çelik <tcelik@mozilla.com>
On Wednesday, March 27, 2013, Noah Mendelsohn wrote:

> (Leaving off most of the cc: list to avoid cross-posted discussion.
> Nothing sensitive here -- feel feel to forward if useful.)
> This looks very cool. Would it be easy/reasonable/in-the-spirit-**of-the-thing
> to extend it start gathering statistics on JSON, XML, various forms of RDF,
> RDF-a, etc?

So the way it works is by analyzing nodes as you browse. If you can think
of a lightweight way to charachterize an element as being in one of these
buckets, patches welcome!

> For that matter, it would also be >really< interesting to watch things
> like content that will be interpreted differently by the HTML5 sniffing
> rules than by following authoritative metadata.

How would we detect such a thing?

> In general, you seem to be on a very nice slippery slope of building a
> dashboard for the Web's data/content encoding. Are you interested in
> heading further down the slope?

Happy to extend this to gather whatever data can be both truly anonymous
and inexpensively characterized.

> Noah
> On 3/27/2013 9:58 AM, Alex Russell wrote:
>> Hi all,
>> These lists host many debates about the semantics (or lack thereof) of
>> HTML. Good data that bears on these questions is often hard to come by.
>> This isn't anyone's fault per sae but it sure would be nice if we had
>> better data to use as the baseline for discussions about what should (and
>> shouldn't) be in HTML.next.
>> In the interest of building such a corpus, I've created a small extension
>> to help gather information on the real-world semantics that users
>> encounter
>> in the web; both semantic HTML and extensions to it like Microformats,
>> schema.org <http://schema.org> markup, and ARIA roles and states.
>> Crawlers
>> miss a lot as they (generally) aren't running scripts and interacting
>> deeply with sites, so this anonymizing system attempts to fill that gap by
>> observing the semantic content of pages both when the load and as they
>> change over time.
>> Why cross-post this so broadly? Because I need your help! If you think
>> evolving the web based on data is better than trying to do it without and
>> you happen to use Chrome as your browser, please install the extension:
>> https://chrome.google.com/**webstore/detail/meaningless/**
>> gmmhpelpfhlofjjolcegdddjadkmin**cn/details<https://chrome.google.com/webstore/detail/meaningless/gmmhpelpfhlofjjolcegdddjadkmincn/details>
>> If you're a developer and use another browser, I'd love your help in
>> porting the extension to other platforms (FF, Safari, etc.):
>> https://github.com/**slightlyoff/meaningless<https://github.com/slightlyoff/meaningless>
>> If you're interested in the data, a sparse reporting front-end is
>> currently
>> in place:
>> http://meaningless-stats.**appspot.com/global<http://meaningless-stats.appspot.com/global>
>> Help is needed to analyze the data in more meaningful ways, visualize it,
>> etc. Filing tickets and submitting pull requests is the easiest way to
>> help: https://github.com/**slightlyoff/meaningless/issues<https://github.com/slightlyoff/meaningless/issues>
>> Thanks for your help and attention.
Received on Wednesday, 27 March 2013 15:26:21 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:56:54 UTC