RE: [open-bibliography] New BNB sample data available

Hi Antoine and all,

Many thanks for the feedback and apologies for the length of this email.

In answer to the questions about 
<dcterms:subject>
>>         <rdf:Description
>> rdf:about="http://id.loc.gov/authorities/sh2008107012#concept">
>>           <skos:inScheme
>> rdf:resource="http://id.loc.gov/authorities#conceptScheme" />
>>           <skos:prefLabel>Literary landmarks--England--
>> London.</skos:prefLabel>
>>           <rdf:type
>> rdf:resource="http://www.w3.org/2004/02/skos/core#Concept" />
>>         </rdf:Description>
>>       </dcterms:subject>

And

1. why does the literal value contained in <skos:prefLabel> Literary landmarks--England--
London.</skos:prefLabel> does not exactly match the one served by LC at id.loc.gov for http://id.loc.gov/authorities/sh2008107012#concept?


The answer is that it should. We've matched the LCSH heading contained in the bibliographic record to the LCSH heading in the authority file. The issue is to do with punctuation (which is input at the end of the heading in the bib record but is not part of the heading in the authority file). We'll address this in the conversion - this is an issue in the LCSH headings and I believe in other parts of our output. [So no, we "are *not* essentially trying to say which of the SKOS preflabels the BL prefers" as one post tried to double-guess]

2. Why does our output does not include the xml:lang="en" in <skos:prefLabel>
This is because in some cases this xml:lang="en" whilst true to the data served up by id.loc.gov is actually not correct. For example, if you look at  
<http://id.loc.gov/authorities/sh94003128#concept> for Parque Nacional Torotoro (Bolivia), we have
<skos:prefLabel xml:lang="en">Parque Nacional Torotoro (Bolivia)</skos:prefLabel>

instead of Spanish. 

I assume the reason for that is that there isn't the granularity in MARC 21 - where these headings originates from - to code the language of each data element. So when LC expresses LCSH in SKOS, they couldn't specify and went for the language of the majority of the headings, which is English.

So we - ok, I ;-) thought we could do "without" the xml:lang attribute since it wasn't "correct" in all cases. I didn't realise the implications.

3. Why are we outputting both the literal value and the resource URI? 
In a very first attempt, we'd only included the resource URI as you suggest. They were concerns about the two being out of sync., e.g. when a LCSH is updated. In fact, this is one of the uses of those URIs - enabling easier updating of bibliographic data. 

But we got some advice to the contrary. Some linked data platforms index the literal values to improve searching; it was also pointed out that there may be a risk of the linked dataset we link to "disappearing".

There are other considerations: we are putting our data out for people to use and re-use; and we are not too sure what they want to do with it yet - so as you suggest, some of them may not want or be able to go and fetch data from id.loc.gov. or any other data sets we link to. A related question is to do with the time and resources to produce these files. At the moment, we are concentrating on the BNB but the intention is to work on other data sets. We are currently working on two versions of the file, a "non-URI" and a "with added-URI" version of the data and ideally, it would be good to have only one version - the "with added-URI" one - to maintain/produce if it meets the needs of all/most people.

Now it's my turn for a question ;-)

In your feedback, you highlight the risk of "that your data is less complete than the one of other services"[1] e.g., if you don't have skos:broader that id.loc.gov has for LCSH concepts.

So to take the example of LCSH at id.loc.gov, how much of the data included there should I replicate in my instance data? Isn't the <skos:prefLabel> and the resource URI sufficient? If you need other info, like <skos:altLabel> or <skos:broader>, won't you be able to fetch it via the resource URI?

That's it for now ;-)

I would also like to say that from later today I shall be offline for the next two weeks. So that people don't think we don't want to engage or anything like that if there is no post. I really appreciate feedback.

Cheers

Corine

-----Original Message-----
From: open-bibliography-bounces@lists.okfn.org [mailto:open-bibliography-bounces@lists.okfn.org] On Behalf Of Antoine Isaac
Sent: 03 February 2011 21:39
To: Houghton,Andrew
Cc: open-bibliography@lists.okfn.org; public-lld
Subject: Re: [open-bibliography] New BNB sample data available

Hi Andrew,


>> From: public-lld-request@w3.org [mailto:public-lld-request@w3.org] On
>> Behalf Of Antoine Isaac
>> Sent: Thursday, February 03, 2011 13:54
>> To: open-bibliography@lists.okfn.org; public-lld
>> Subject: Re: New BNB sample data available
>>
>>       <dcterms:subject>
>>         <rdf:Description>
>>           <skos:inScheme
>> rdf:resource="http://id.loc.gov/authorities#conceptScheme" />
>>           <skos:prefLabel>Görner, Rüdiger--Travel--England--
>> London.</skos:prefLabel>
>>           <rdf:type
>> rdf:resource="http://www.w3.org/2004/02/skos/core#Concept" />
>>         </rdf:Description>
>>       </dcterms:subject>
>>
>>       <dcterms:subject>
>>         <rdf:Description
>> rdf:about="http://id.loc.gov/authorities/sh2008107012#concept">
>>           <skos:inScheme
>> rdf:resource="http://id.loc.gov/authorities#conceptScheme" />
>>           <skos:prefLabel>Literary landmarks--England--
>> London.</skos:prefLabel>
>>           <rdf:type
>> rdf:resource="http://www.w3.org/2004/02/skos/core#Concept" />
>>         </rdf:Description>
>>       </dcterms:subject>
>>
>> So I understand why you define "on-the-fly" (and "in-the-data") the
>> concepts that you can't find in the LCSH linked data. And I think this
>> is a reasonable solution.
>
> There is an issue with the way the RDF is specified IMHO and you have to
> read the SKOS specification to understand the implications of the above
> RDF:
>
>    <http://www.w3.org/TR/skos-reference/#L1567>
>
> S13 skos:prefLabel, skos:altLabel and skos:hiddenLabel are pairwise disjoint properties.
> S14 A resource has no more than one value of skos:prefLabel per language tag.
>
> If LC declares:
>
> @BASE<http://id.loc.gov/authorities/>
> <sh2008107012#concept>  skos:prefLabel "Literary landmarks--England--London."
>
> and the above RDF declares the same, then the integrity constraint S13 and S14 are
> violated because there now exists two triples in the combined graph of resources
> that say the same thing, thus S13 is violated because of the pairwise disjoint
> constraint and S14 is violated because there is more than one skos:prefLabel per
> language.
>
> If you want to do something like this then IMHO use rdfs:label instead of
> skos:prefLabel to get around integrity constraints S13 and S14.
>


A quite note, related to my comment on the data served by BL being potentially different from the one at id.loc.gov. The data above has:
<skos:prefLabel>Literary landmarks--England--London.</skos:prefLabel>
And id.loc.gov has
<skos:prefLabel xml:lang="en">Literary landmarks--England--London</skos:prefLabel>
Because of BL not committing to a specific literal language tag, S14 is in fact not violated: interesting side effect ;-)
But I agree, the best would be that BL publishes exactly the same prefLabel as id.loc.gov. That would ensure that there's never any issue!

As I said, I think that the "option" to serve concept data like this is reasonable only when these concepts are not already present on id.loc.gov. Otherwise it could raise quite many problems...

Best,

Antoine

_______________________________________________
open-bibliography mailing list
open-bibliography@lists.okfn.org
http://lists.okfn.org/mailman/listinfo/open-bibliography

Received on Friday, 4 February 2011 11:14:49 UTC