Re: Possible choice for the RDFa default profile prefixes from Nathan on 2011-05-02 (public-rdfa-wg@w3.org from May 2011)

From: Nathan <nathan@webr3.org>
Date: Mon, 02 May 2011 16:29:13 +0100
To: Ivan Herman <ivan@w3.org>
CC: W3C RDFWA WG <public-rdfa-wg@w3.org>, Giovanni Tummarello <giovanni.tummarello@deri.org>
Message-ID: <4DBECDC9.6090009@webr3.org>

Ivan Herman wrote:
> Guys,
> 
> I have been working, on and off, with the Sindice guys the past few weeks to see if we can extract a suitable list for vocabularies for a default profile. The issue was to find a proper and objective way to determine what should be in the list of those vocabularies. Here is what we came up with:
> 
> - Any vocabulary that is defined via a W3C Recommendation or a W3C WG/IG Note is automatically added to the list. This include obviously rdf, rdfs, skos, but also void.
> - For the rest we rely on a search results and some processing on the search results, as performed by the Sindice search engine. 
> (- If I can also get a similar crawl result from other sources, like Yahoo, then we would be able to merge those results somehow. But, at the moment, that is not the case...)
> 
> I have collected the results in a table on [1], and have also given some details on how the crawl results were used and processed[2].
> 
> Looking at the results my proposal is, first of all, to rank along the last column (that is the default ranking on [1] and also the criteria to choose the top 100), because that gives a measure of the widespread usage (or not) of a particular vocabulary. (There are some interesting cases like the bio/0.1 one: large number of domains and a low number 2nd level domains: Giovanni's analysis is that this is based on places like my opera, that provides a large number of blogs for users with a local domain.). Furthermore, the proposal is to draw a line after the rdf.data-vocabulary.org/# one (that is the vocabulary for Google's rich snippet), there is indeed a drop in the numbers of the last column. One could argue in the case of a number of other vocabularies, and the most notable issue is that the good relations ontology, that has a significant traction out there, falls outside the list. However, I would like to avoid arbitrary choices and stick to objective numbers; such a di
scussion in the community might go on for ages and that is not what we want. Also, in my view, the number of prefixes on the default vocabulary should not be large...
> 
> The crawl results do not say what the prefix should be, they only give the vocabulary. The choice of the prefix should probably be based on the documentation of the vocabulary. In disputed cases we could of course contact the vocabulary authors.
> 
> Opinions?

Useful list!

I agree re the w3c specs, other comments are:

foaf, sioc, cc, og (and good relations!) all seem like safe bets, and v 
used in rdfa (which has a potentially different audience to typical rdf 
formats).

prefixes for dublin core vocabularies? (could generate incorrect data 
since people often map dc to either uri)

does void have a w3 uri as well as the rdfs one now?

wgs84 has several different prefixes known for it, but is used a lot. 
geo and geonames both used a fair bit too, but likewise possible 
prefix-name collisions.

> P.S. I want to publicly express my thanks to the Sindice team. They did the work, I was, mostly, nagging only...:-)

Definitely, most useful data - thanks!

> [1] http://www.w3.org/2010/02/rdfa/profile/Sindice-crawl.html
> [2] http://www.w3.org/2010/02/rdfa/profile/Sindice-crawl.html#method
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
> 
> 
> 
>

Received on Monday, 2 May 2011 15:30:02 UTC