Re: ANN: WebDataCommons.org - Offering 3.2 billion quads current RDFa, Microdata and Miroformat data extracted from 65.4 million websites

How about adding a disclaimer line to the webdatacommons.org site like

"Note that the many database-backed sites contain a huge long tail of
rarely-visited, rarely-linked pages (e.g. product catalogues), but
which increasingly contain useful structured data. It is best not to
assume that this collection contains a complete, deep crawl of every
site it touches."

Dan

Received on Tuesday, 17 April 2012 19:24:13 UTC