Re: Corpus Statistics

Hi John:

> Is data available for these questions related to schema.org?
> - How many property instances have currently been marked up?
> - What is the distribution of property instances across types?
> -  What is the rate of growth of property instances (by month)?
> - What is the total number of web sites that have created property instance markup?
> - What is the rate of growth of the number of web sites that have created property instance markup?
>  
> I am interested in using the corpus of schema.org markup for a project and having this data would help in making the decision to proceed. Additionally, making this data available might encourage more web site developers to jump on theschema.org bandwagon.
>  


I think Chris Bizer et al. have a paper on the topic at the upcoming ISWC 2013 conference:

     Christian Bizer, Kai Eckert, Robert Meusel, Hannes Mühleisen, Michael Schuhmacher and Johanna Völker. Deployment of RDFa, Microdata, and Microformats on the Web – A Quantitative Analysis

I have not yet seen the paper, but I assume it is based on http://webdatacommons.org/, which already provides some statistics:

    http://webdatacommons.org/vocabulary-usage-analysis/index.html

Another two papers about the topic are

	Ashraf, J., Hussain, O. K. and Hussain, F. K. (2013), Empirical analysis of domain ontology usage on the Web: eCommerce domain in focus. Concurrency Computat.: Pract. Exper.. doi: 10.1002/cpe.3089 (earlier version likely:  Ashraf, Jamshaid and Cyganiak, Richard and O’Riain, Sean and Hadzic, Maja. 2011. Open ebusiness ontology usage: investigating community implementation of goodrelations, in C. Bizer, T. Heath, T. Berners-Lee and M. Hausenblas (ed), WWW2011 Workshop on Linked Data on the Web (LDOW2011), Mar 29 2011. Hyderabad, India: CEUR Workshop Proceedings.)

     Jamshaid Ashraf: A Framework for Ontology Usage Analysis. ESWC 2012: 813-817

Note, however, that these works will likely not have had access to a Google-scale crawl of the Web, which may limit the general validity of the findings.

Martin



--------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  hepp@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
         http://www.heppnetz.de/ (personal)
skype:   mfhepp 
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================
* Project Main Page: http://purl.org/goodrelations/

Received on Tuesday, 30 July 2013 05:43:17 UTC