W3C home > Mailing lists > Public > public-vocabs@w3.org > April 2012

Re: ANN: WebDataCommons.org - Offering 3.2 billion quads current RDFa, Microdata and Miroformat data extracted from 65.4 million websites

From: Martin Hepp <martin.hepp@ebusiness-unibw.org>
Date: Tue, 17 Apr 2012 21:29:02 +0200
Cc: Chris Bizer <chris@bizer.de>, Peter Mika <pmika@yahoo-inc.com>, "public-vocabs@w3.org Vocabularies" <public-vocabs@w3.org>, public-lod@w3.org
Message-Id: <D7CF5067-4825-4E3F-8FF9-683B0B34C742@ebusiness-unibw.org>
To: Dan Brickley <danbri@danbri.org>
That would be a nice first step. And then stopping to claim that the stats show the actual status of the "data web" ;-)


On Apr 17, 2012, at 9:23 PM, Dan Brickley wrote:

> How about adding a disclaimer line to the webdatacommons.org site like
> 
> "Note that the many database-backed sites contain a huge long tail of
> rarely-visited, rarely-linked pages (e.g. product catalogues), but
> which increasingly contain useful structured data. It is best not to
> assume that this collection contains a complete, deep crawl of every
> site it touches."
> 
> Dan

--------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  hepp@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
         http://www.heppnetz.de/ (personal)
skype:   mfhepp 
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================
* Project Main Page: http://purl.org/goodrelations/
Received on Tuesday, 17 April 2012 19:29:33 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 22 May 2012 06:49:02 GMT