- From: Stéphane Corlosquet <scorlosquet@gmail.com>
- Date: Mon, 2 May 2011 11:56:19 -0400
- To: Ivan Herman <ivan@w3.org>
- Cc: W3C RDFWA WG <public-rdfa-wg@w3.org>
- Message-ID: <BANLkTikF9o_2de1b7iPsGRc0esVnVTjNdg@mail.gmail.com>
On Mon, May 2, 2011 at 1:02 AM, Ivan Herman <ivan@w3.org> wrote: > Guys, > > I have been working, on and off, with the Sindice guys the past few weeks > to see if we can extract a suitable list for vocabularies for a default > profile. The issue was to find a proper and objective way to determine what > should be in the list of those vocabularies. Here is what we came up with: > > - Any vocabulary that is defined via a W3C Recommendation or a W3C WG/IG > Note is automatically added to the list. This include obviously rdf, rdfs, > skos, but also void. > - For the rest we rely on a search results and some processing on the > search results, as performed by the Sindice search engine. > (- If I can also get a similar crawl result from other sources, like Yahoo, > then we would be able to merge those results somehow. But, at the moment, > that is not the case...) > > I have collected the results in a table on [1], and have also given some > details on how the crawl results were used and processed[2]. > > Looking at the results my proposal is, first of all, to rank along the last > column (that is the default ranking on [1] and also the criteria to choose > the top 100), because that gives a measure of the widespread usage (or not) > of a particular vocabulary. (There are some interesting cases like the > bio/0.1 one: large number of domains and a low number 2nd level domains: > Giovanni's analysis is that this is based on places like my opera, that > provides a large number of blogs for users with a local domain.). > Furthermore, the proposal is to draw a line after the > rdf.data-vocabulary.org/# one (that is the vocabulary for Google's rich > snippet), there is indeed a drop in the numbers of the last column. One > could argue in the case of a number of other vocabularies, and the most > notable issue is that the good relations ontology, that has a significant > traction out there, falls outside the list. However, I would like to avoid > arbitrary choices and stick to objective numbers; such a discussion in the > community might go on for ages and that is not what we want. Also, in my > view, the number of prefixes on the default vocabulary should not be > large... > > The crawl results do not say what the prefix should be, they only give the > vocabulary. The choice of the prefix should probably be based on the > documentation of the vocabulary. In disputed cases we could of course > contact the vocabulary authors. > prefix.cc should also help to decide on prefixes based on popular usage, e.g. http://prefix.cc/?q=http://xmlns.com/foaf/0.1/ Steph. > > Opinions? > > Ivan > > P.S. I want to publicly express my thanks to the Sindice team. They did the > work, I was, mostly, nagging only...:-) > > [1] http://www.w3.org/2010/02/rdfa/profile/Sindice-crawl.html > [2] http://www.w3.org/2010/02/rdfa/profile/Sindice-crawl.html#method > > ---- > Ivan Herman, W3C Semantic Web Activity Lead > Home: http://www.w3.org/People/Ivan/ > mobile: +31-641044153 > PGP Key: http://www.ivan-herman.net/pgpkey.html > FOAF: http://www.ivan-herman.net/foaf.rdf > > > > > > >
Received on Monday, 2 May 2011 15:58:43 UTC