Possible choice for the RDFa default profile prefixes

Guys,

I have been working, on and off, with the Sindice guys the past few weeks to see if we can extract a suitable list for vocabularies for a default profile. The issue was to find a proper and objective way to determine what should be in the list of those vocabularies. Here is what we came up with:

- Any vocabulary that is defined via a W3C Recommendation or a W3C WG/IG Note is automatically added to the list. This include obviously rdf, rdfs, skos, but also void.
- For the rest we rely on a search results and some processing on the search results, as performed by the Sindice search engine. 
(- If I can also get a similar crawl result from other sources, like Yahoo, then we would be able to merge those results somehow. But, at the moment, that is not the case...)

I have collected the results in a table on [1], and have also given some details on how the crawl results were used and processed[2].

Looking at the results my proposal is, first of all, to rank along the last column (that is the default ranking on [1] and also the criteria to choose the top 100), because that gives a measure of the widespread usage (or not) of a particular vocabulary. (There are some interesting cases like the bio/0.1 one: large number of domains and a low number 2nd level domains: Giovanni's analysis is that this is based on places like my opera, that provides a large number of blogs for users with a local domain.). Furthermore, the proposal is to draw a line after the rdf.data-vocabulary.org/# one (that is the vocabulary for Google's rich snippet), there is indeed a drop in the numbers of the last column. One could argue in the case of a number of other vocabularies, and the most notable issue is that the good relations ontology, that has a significant traction out there, falls outside the list. However, I would like to avoid arbitrary choices and stick to objective numbers; such a discussion in the community might go on for ages and that is not what we want. Also, in my view, the number of prefixes on the default vocabulary should not be large...

The crawl results do not say what the prefix should be, they only give the vocabulary. The choice of the prefix should probably be based on the documentation of the vocabulary. In disputed cases we could of course contact the vocabulary authors.

Opinions?

Ivan

P.S. I want to publicly express my thanks to the Sindice team. They did the work, I was, mostly, nagging only...:-)

[1] http://www.w3.org/2010/02/rdfa/profile/Sindice-crawl.html
[2] http://www.w3.org/2010/02/rdfa/profile/Sindice-crawl.html#method

----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Monday, 2 May 2011 05:01:40 UTC