Re: Library of Congress Subject Headings as SKOS Linked Data from Richard Cyganiak on 2008-06-10 (public-swd-wg@w3.org from June 2008)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Wed, 11 Jun 2008 00:17:30 +0100
To: Ed Summers <ehs@pobox.com>
Cc: "SWD Working SWD" <public-swd-wg@w3.org>, public-lod@w3.org
Message-Id: <7142E6DC-F481-417E-B38E-3AD9C54FC8B1@cyganiak.de>
Ed,

A very cool service, and exemplary attention to detail!

Of course, I still have a few suggestions! I haven't read through the  
entire thread, so apologies if some of this was mentioned already.

(I saw 303s being mentioned in the thread -- you are doing things the  
right way, there's no need to do 303s at <sh95000541>. It is an  
information resource and therefore 200 is fine. The concept is  
<sh95000541#concept>, a URI that cannot be directly dereferenced via  
HTTP, so you are consistent with httpRange-14, as explained in the  
Cool URIs document. This is one of the nice things about hash URIs.)

1. The content-negotiated URI should send a "Vary: Accept" header.  
This helps caches to deal correctly with content-negotiated resources.

2. The correct MIME type for N3 is "text/rdf+n3;charset=utf-8", not  
"text/n3". (I think the spec used to recommend text/n3, but has been  
changed some time ago.)

3. I would suggest adding a few triples to the RDF/XML and N3  
versions, to link the generic document to its variants, and the  
generic document to the concept. Example (choose your own favourite  
properties):

<sh95000541> foaf:primaryTopic <sh95000541#concept> .
<sh95000541> dcterms:format <sh95000541.rdf> .
<sh95000541> dcterms:format <sh95000541.n3> .
<sh95000541> dcterms:format <sh95000541.json> .
<sh95000541> dcterms:format <sh95000541.html> .

This helps RDF browsers to relate all those resources.

4. The content negotiation could benefit from a little bit of  
tweaking. You correctly handle q values, which is great. It would be  
even better if there was a slight bias towards the non-HTML formats. I  
would argue that the data variants are quite a bit more useful than  
the HTML variant, as RDF-aware clients can do all sorts of cool stuff  
with the RDF that are not possible . Therefore, a client that  
indicates identical preference for HTML and RDF/XML should be served  
RDF/XML. FWIW, Tabulator has a preference of 1.0 for XHTML and 0.8 for  
RDF/XML. It would be great if your algorithm would return RDF/XML in  
this case.

(I'm ignoring the availability of RDFa in my argument -- unfortunately  
there is no way for a client to indicate that it supports RDFa AFAIK,  
so it cannot really be factored into the content negotiation equation.)

5. Ideally, you would add the skos:prefLabels of all related concepts  
to the RDF output. This would support navigation in RDF browsers.

Again, great work!

Best,
Richard


On 9 Jun 2008, at 14:54, Ed Summers wrote:

>
> I'd like to announce an experimental linked-data, SKOS representation
> of the Library of Congress Subject Headings (LCSH) [1] ... and also
> ask for some help.
>
> The Library of Congress has been participating in the W3C Semantic Web
> Deployment Working Group, and has converted LCSH from the MARC21 data
> format [2] to SKOS. LCSH is a controlled vocabulary used to index
> materials that have been added to the collections at the Library of
> Congress. It has been in active development since 1898, and was first
> published in 1914 so that other libraries and bibliographic utilities
> could use and adapt it. The lcsh.info service makes 266,857 subject
> headings available as SKOS concepts, which amounts to 2,441,494
> triples that are separately downloadable [3] (since there isn't a
> SPARQL endpoint just yet).
>
> At the last SWDWG telecon some questions came up about the way
> concepts are identified, and made available via HTTP. Since we're
> hoping lcsh.info can serve as an implementation of SKOS for the W3C
> recommendation process we want to make sure we do this right. So I was
> hoping interested members of the linked-data and SKOS communities
> could take a look and make sure the implementation looks correct.
>
> Each concept is identified with a URI like:
>
> http://lcsh.info/sh95000541#concept
>
> When responding to requests for concept URIs, the server content
> negotiates to determine which representation of the concept to return:
>
> - application/xhtml+xml
> - application/json
> - text/n3
> - application/rdf+xml
>
> This is basically the pattern that Cool URIs for the Semantic Web
> discusses as the Hash URI with Content Negotiation [4]. An additional
> point that is worth mentioning is that the XHTML representation
> includes RDFa, that also describes the concept.
>
> At the moment the LCSH/SKOS data is only linked to itself, through
> assertions that involve skos:broader, skos:narrower, and skos:related.
> But the hope is that minting URIs for LCSH will allow it to be mapped
> and/or linked to concepts in other vocabularies: dbpedia, geonames,
> etc.
>
> Any feedback, criticisms, ideas are welcome either on either the
> public-lod [5] or public-swd-wg [6] discussion lists.
>
> Thanks for reading this far!
> //Ed
>
> [1] http://lcsh.info
> [2] http://www.loc.gov/marc/
> [3] http://lcsh.info/static/lcsh.nt
> [4] http://www.w3.org/TR/cooluris/#hashuri
> [5] http://lists.w3.org/Archives/Public/public-lod/
> [6] http://lists.w3.org/Archives/Public/public-swd-wg/
>
Received on Tuesday, 10 June 2008 23:18:08 UTC