Re: Possible choice for the RDFa default profile prefixes

On May 2, 2011, at 17:29 , Nathan wrote:

> Ivan Herman wrote:
>> Guys,
>> I have been working, on and off, with the Sindice guys the past few weeks to see if we can extract a suitable list for vocabularies for a default profile. The issue was to find a proper and objective way to determine what should be in the list of those vocabularies. Here is what we came up with:
>> - Any vocabulary that is defined via a W3C Recommendation or a W3C WG/IG Note is automatically added to the list. This include obviously rdf, rdfs, skos, but also void.
>> - For the rest we rely on a search results and some processing on the search results, as performed by the Sindice search engine. (- If I can also get a similar crawl result from other sources, like Yahoo, then we would be able to merge those results somehow. But, at the moment, that is not the case...)
>> I have collected the results in a table on [1], and have also given some details on how the crawl results were used and processed[2].
>> Looking at the results my proposal is, first of all, to rank along the last column (that is the default ranking on [1] and also the criteria to choose the top 100), because that gives a measure of the widespread usage (or not) of a particular vocabulary. (There are some interesting cases like the bio/0.1 one: large number of domains and a low number 2nd level domains: Giovanni's analysis is that this is based on places like my opera, that provides a large number of blogs for users with a local domain.). Furthermore, the proposal is to draw a line after the rdf.data-vocabulary.org/# one (that is the vocabulary for Google's rich snippet), there is indeed a drop in the numbers of the last column. One could argue in the case of a number of other vocabularies, and the most notable issue is that the good relations ontology, that has a significant traction out there, falls outside the list. However, I would like to avoid arbitrary choices and stick to objective numbers; such a di
> scussion in the community might go on for ages and that is not what we want. Also, in my view, the number of prefixes on the default vocabulary should not be large...
>> The crawl results do not say what the prefix should be, they only give the vocabulary. The choice of the prefix should probably be based on the documentation of the vocabulary. In disputed cases we could of course contact the vocabulary authors.
>> Opinions?
> 
> Useful list!
> 
> I agree re the w3c specs, other comments are:
> 
> foaf, sioc, cc, og (and good relations!) all seem like safe bets, and v used in rdfa (which has a potentially different audience to typical rdf formats).

Out of those good relations is the only one that does not appear among the top ones numerically. Actually, Giovanni and Martin Hepp have agreed on looking into that, it might depend on the breath of Sindice's crawl. The Sindice guys will do some more crawls in the days/weeks to come to see that. Let us wait with that.

> 
> prefixes for dublin core vocabularies? (could generate incorrect data since people often map dc to either uri)

I will ask Tom Baker on that one. I think that dcterms and dc are the two that they prefer to see but, again, he should tell me.

> 
> does void have a w3 uri as well as the rdfs one now?

the void document is a w3c note[1], but that URI of the vocabulary has not changed[2]


[1] http://www.w3.org/TR/void/
[2] http://rdfs.org/ns/void#


> 
> wgs84 has several different prefixes known for it, but is used a lot. geo and geonames both used a fair bit too, but likewise possible prefix-name collisions.

And they are much down on the list (somewhat surprisingly; I suspect that most of the usage are done through dbpedia, ie, one domain only). As for wgs84:

http://www.w3.org/2003/01/geo/wgs84_pos#

I think I will stick to 'geo'. That is also what prefix.cc gives me.

Note if somebody defines a prefix locally (to whatever URI) that will override the default setup anyway.

> 
>> P.S. I want to publicly express my thanks to the Sindice team. They did the work, I was, mostly, nagging only...:-)
> 
> Definitely, most useful data - thanks!
> 
>> [1] http://www.w3.org/2010/02/rdfa/profile/Sindice-crawl.html
>> [2] http://www.w3.org/2010/02/rdfa/profile/Sindice-crawl.html#method
>> ----
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Monday, 2 May 2011 17:27:27 UTC