- From: Martin Hepp <martin.hepp@ebusiness-unibw.org>
- Date: Mon, 2 Jul 2012 11:43:06 +0200
- To: Hannes Mühleisen <muehleis@inf.fu-berlin.de>
- Cc: public-vocabs@w3.org
Dear Hannes: I would like to stress again what was discussed on various mailing lists in April 2012, i.e. that the data basis for webdatacommons.org is highly problematic, since the underlying CommonCrawl corpus does not include the majority of deep links into dynamic Web applications and thus misses the core of where RDFa and Microdata typically sits. See http://yfrog.com/h3z75np http://lists.w3.org/Archives/Public/public-lod/2012Apr/0117.html http://lists.w3.org/Archives/Public/public-lod/2012Apr/0103.html Best Martin On Jul 2, 2012, at 9:19 AM, Hannes Mühleisen wrote: > Hello Vocabulary Enthusiasts, > > we have recently completed a study on vocabulary usage on Web pages using the Microdata and RDFa encodings. We have analyzed both vocabulary as well as class and property usage frequencies and property co-occurence for two web crawls. These crawls contained 93 Million URLs with data using both encodings from 2012, and 14 Million URLs from 2009/2010. The results are available at http://webdatacommons.org/vocabulary-usage-analysis/index.html . > > We hope our findings are useful in giving a small insight in what vocabularies (or parts thereof) are used to annotate entities within HTML pages. > > Regards, > > Hannes Mühleisen > -------------------------------------------------------- martin hepp e-business & web science research group universitaet der bundeswehr muenchen e-mail: hepp@ebusiness-unibw.org phone: +49-(0)89-6004-4217 fax: +49-(0)89-6004-4620 www: http://www.unibw.de/ebusiness/ (group) http://www.heppnetz.de/ (personal) skype: mfhepp twitter: mfhepp Check out GoodRelations for E-Commerce on the Web of Linked Data! ================================================================= * Project Main Page: http://purl.org/goodrelations/
Received on Monday, 2 July 2012 09:43:35 UTC