- From: Manu Sporny <msporny@digitalbazaar.com>
- Date: Tue, 08 Nov 2011 11:40:17 -0500
- To: RDFa WG <public-rdfa-wg@w3.org>
I started a page for the new Web Crawl Regexes that will measure RDFa usage in the wild, and give us a better idea if the RDFa Lite changes we're thinking of making will break existing content out there: The page is hosted in the Data Driven Standards WG wiki, so you'll have to join that group if you want to edit the wiki: http://www.w3.org/community/data-driven-standards/wiki/Data-in-html-crawl-design There isn't much there right now, but it's a start. The plan is to turn these regexes into a Hadoop map/reduce job and run it on the Amazon Elastic Map Reduce infrastructure on the Common Crawl dataset (5 billion web pages, tens of terabytes of web page data). -- manu -- Manu Sporny (skype: msporny, twitter: manusporny) Founder/CEO - Digital Bazaar, Inc. blog: Standardizing Payment Links - Why Online Tipping has Failed http://manu.sporny.org/2011/payment-links/
Received on Tuesday, 8 November 2011 16:40:57 UTC