W3C home > Mailing lists > Public > public-vocabs@w3.org > July 2012

Re: Vocabulary Usage on Web Pages - Analysis Results

From: Bernard Vatant <bernard.vatant@mondeca.com>
Date: Mon, 2 Jul 2012 11:06:24 +0200
Message-ID: <CAK4ZFVF3fALdBNW4sxudwZH9LOYVH2KrHtF=GNP9RQKQY7qntQ@mail.gmail.com>
To: Hannes Mühleisen <muehleis@inf.fu-berlin.de>
Cc: public-vocabs@w3.org
Hello Hannes

Interesting stuff. Among top namespaces I see
http://rdf.data-vocabulary.org/
I must admit I did not know it ... and looking at
http://data-vocabulary.org/ it seems that it has not been updated since
2010, and it seems to me widely redundant with, or prefiguration of
schema.org.

@schema.org folks : Since http://data-vocabulary.org/ seems to belong to
Google, would not it be a great idea to have a note on this vocabulary
along the lines of ... this vocabulary is obsolete, use schema.org instead?

Or do I miss something?

Bernard

2012/7/2 Hannes Mühleisen <muehleis@inf.fu-berlin.de>

> Hello Vocabulary Enthusiasts,
>
> we have recently completed a study on vocabulary usage on Web pages using
> the Microdata and RDFa encodings. We have analyzed both vocabulary as well
> as class and property usage frequencies and property co-occurence for two
> web crawls. These crawls contained 93 Million URLs with data using both
> encodings from 2012, and 14 Million URLs from 2009/2010. The results are
> available at
> http://webdatacommons.org/vocabulary-usage-analysis/index.html .
>
> We hope our findings are useful in giving a small insight in what
> vocabularies (or parts thereof) are used to annotate entities within HTML
> pages.
>
> Regards,
>
> Hannes Mühleisen
>
>


-- 
*Bernard Vatant
*
Vocabularies & Data Engineering
Tel :  + 33 (0)9 71 48 84 59
 Skype : bernard.vatant
Linked Open Vocabularies <http://labs.mondeca.com/dataset/lov>

--------------------------------------------------------
*Mondeca**          **                   *
3 cité Nollez 75018 Paris, France
www.mondeca.com
Follow us on Twitter : @mondecanews <http://twitter.com/#%21/mondecanews>
Received on Monday, 2 July 2012 09:07:23 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 2 July 2012 09:07:23 GMT