- From: Dan Brickley <danbri@danbri.org>
- Date: Sat, 11 Sep 2010 07:40:01 +0000
- To: Ed Summers <ehs@pobox.com>
- Cc: public-esw-thes@w3.org
On Sat, Sep 11, 2010 at 2:45 AM, Ed Summers <ehs@pobox.com> wrote: > On a Friday whim (prompted by Dan Brickley) I downloaded the 2010 > Billion Triple Challenge dataset to look and see how many SKOS > assertions there are in it, and from what domains. If you are > interested the results can be found at: > > http://gist.github.com/574700 This is great, thanks for doing this! I'm also having similar conversation with the Sindice team, and will be offering suggestions for how they can map out the SemWeb vocabulary/data landscape. Now would be a very good time for the SKOS community to figure out what else they might want to learn about large scale SKOS deployment patterns. SKOS is interesting in this regard, since it is a bit like a domain vocabulary (dc, foaf, creative commons) and a bit like an infrastructural vocabulary (rdfs, owl, ...). General RDF stats that help dc, foaf, creative commons etc understand their deployment, aren't so directly helpful for individual SKOS scheme creators, since eg. 'UKAT in SKOS' or 'LCSH in SKOS' show up as very similar triples in RDF. So what would we like to know about SKOS? For example - (general questions) - what non-skos properties most commonly point to things of type skos:Concept? - what non-skos properties most commonly apply to skos:Concepts? - which bits of SKOS are heavily used; are not used; are still used, even though removed from the final spec? - are people subclassing, superclassing SKOS classes eg. skos:Concept? - are there sub/super-properties declared for SKOS properties? - how are the internationalisation features of SKOS being used in practice? - URI patterns: # vs / URIs, 303 redirects; are these being used? - is SKOS for Web publication of 'traditional thesauri' used differently (data patterns) from SKOS used to capture information from users (tags, blog categories, wikipedia)? - how long are prefLabel and other SKOS strings? (some graphs here could help Web designers creating UI to display SKOS content) - what common mistakes can we find in the data? (scheme-specific questions) - given a SKOS scheme/dataset, eg. UKAT, we might re-ask some of the above questions, eg. what properties point to concepts from that scheme; or what domains are using it. - what are the RDF types and most common properties found on objects that have any property whose value is a link to a id.loc.gov LCSH skos Concept? OK that's just off the top of my head. I'm sure others here must have questions they'd be interested to see answers for. I'm emphasising data questions that involve large aggregations of RDF data, not analytics you'd do on your own local SKOS repository... No promises that any of these questions can be answered, but finding out what we want to know would be a useful first step. cheers, Dan
Received on Saturday, 11 September 2010 07:45:00 UTC