W3C home > Mailing lists > Public > public-vocabs@w3.org > April 2012

Re: ANN: WebDataCommons.org - Offering 3.2 billion quads current RDFa, Microdata and Miroformat data extracted from 65.4 million websites

From: Dan Brickley <danbri@danbri.org>
Date: Tue, 17 Apr 2012 21:23:41 +0200
Message-ID: <CAFfrAFo9Sp-QmD1xZ6yonisumpGfF0Hf_WPhM+WztO1yCmEV9Q@mail.gmail.com>
To: Martin Hepp <martin.hepp@ebusiness-unibw.org>, Chris Bizer <chris@bizer.de>
Cc: Peter Mika <pmika@yahoo-inc.com>, "public-vocabs@w3.org Vocabularies" <public-vocabs@w3.org>, public-lod@w3.org
How about adding a disclaimer line to the webdatacommons.org site like

"Note that the many database-backed sites contain a huge long tail of
rarely-visited, rarely-linked pages (e.g. product catalogues), but
which increasingly contain useful structured data. It is best not to
assume that this collection contains a complete, deep crawl of every
site it touches."

Dan
Received on Tuesday, 17 April 2012 19:24:13 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 22 May 2012 06:49:02 GMT