Re: Records in (L)LD (was: New BNB sample data available) from gordon@gordondunsire.com on 2011-02-08 (public-xg-lld@w3.org from February 2011)

From: <gordon@gordondunsire.com>
Date: Tue, 8 Feb 2011 18:54:38 +0000 (GMT)
To: public-xg-lld <public-xg-lld@w3.org>, Antoine Isaac <aisaac@few.vu.nl>
Message-ID: <1840855024.76969.1297191278688.JavaMail.open-xchange@oxltgw02.schlund.de>
Antoine


I was intending to cover some of this in the under-constructionMigrating Library
Legacy Data use case. Then it would feed into the BibData cluster, especially
the Problems and limitations section, and from there into the Problems and
limitations section of the final report. I have a self-imposed deadline of the
end of this week to complete the use case and cluster, and this (as you say)
good synthesis will help, not hinder.
 
The thinking behind this self-contributed use case on legacy records was
prompted by the BL's work; I did urge them to send a use case themselves, but I
guess the work itself has kept them busy.
 
Would this approach be satisfactory to you and others?
 
Cheers
 
Gordon
 

On 08 February 2011 at 17:46 Antoine Isaac <aisaac@few.vu.nl> wrote:

> Hi everyone (scoping it back to the XG list),
>
> This discussion is reaching an interesting point where a small note could be
> written on this "LLD record" topics. I think Kevin's last mail comes close to
> a good synthesis of the issues at stake... What I find really relevant is that
> all this started from conversion of traditional (MARC) records from the BL.
> Maybe that could make a suitable part of our report, or at least something the
> report could point to.
>   
> @people who are interested in the "traditional library side" of records: do
> you agree that both aspects could be combined here?
>
> Cheers,
>
> Antoine
>
>
> > I think issue 3 comes very close - if, in fact, it isn't a fine example of
> > the problem - to the "record" versus triples debate; the
> > wholeness/completeness issue; the concise bounded description (CBD)
> > discussion.
> >
> > Corinne decided to include information - rdf:type, skos:prefLabel,
> > skos:ConceptScheme - in addition to giving URI to the resource at ID. 
> > Furnishing the additional information can reduce network look-ups and is
> > available for immediate indexing.  She also astutely pointed out that
> > including the added info in the BL output provided some safeguard should the
> > data no longer be accessible from the given HTTP URI.  On this last point,
> > here's an example:
> >
> > <dcterms:subject
> > rdf:resource="http://id.loc.gov/authorities/sh85026362#concept" />
> >
> > versus
> >
> > <dcterms:subject>
> >         <skos:Concept
> >rdf:about="http://id.loc.gov/authorities/sh85026362#concept">
> >                 <skos:prefLabel>Civil procedure (International
> >law)</skos:prefLabel>
> >                 <skos:inScheme
> >rdf:resource="http://id.loc.gov/authorities#conceptScheme" />
> >         </skos:Concept>
> > </dcterms:subject>
> >
> > "Civil procedure (International law)" was canceled in December 2010 and
> > replaced by two new concepts.  So, if you follow the link, you no longer get
> > the resource.  I'll be the first to acknowledge that this isn't ideal
> > behavior from ID.  We're working on changing this, but we're not there yet.
> >
> > Yes, the BL's approach introduces duplication and a synchronization issue. 
> >  Personally, I have no problem with this however and, in fact, endorse it
> > with the caveat that the data be kept up-to-date as best possible ( I do
> > like the idea of a CBD).  And, while the decision to include this added
> > information makes the BL's data independently understandable without further
> > look-ups, it does not inhibit a BL data consumer from following the HTTP URI
> > to ID if the user would like to.
> >
> > I suspect that until systems are sophisticated enough to follow the URI's in
> > RDF data without significant human intervention, and gather the data about
> > _that_ resource, then this type of wholeness will be more beneficial than
> > the alternative.  I feel it makes the data more immediately accessible.  I
> > recognize that my position is one from a very practical standpoint. 
> > Ideally, the whole system would be wonderfully interconnected with links and
> > all the software designed to deal with this data would naturally and without
> > prompting fetch the data from those links.
> >
> > Now, as to defining the information to be included - should have the
> > altLabels also been included?  They can be very valuable when indexed -
> > that's part of the wholeness/completeness debate.  Perhaps it is sufficient
> > to include the skos:prefLabel; further information can be had by following
> > the HTTP URI.
> >
> > Warmly,
> >
> > Kevin
> >
> > ________________________________________
> > From: public-lld-request@w3.org [public-lld-request@w3.org] On Behalf Of
> > Antoine Isaac [aisaac@few.vu.nl]
> > Sent: Friday, February 04, 2011 09:35
> > To: Deliot, Corine
> > Cc: List for Working Group on Open Bibliographic Data; public-lld
> > Subject: Re: [open-bibliography] New BNB sample data available
> >
> > Hello Corine,
> >
> > Re. 1 and 2, in fact your decision not to put the language tags is what
> > saves you from the inconsistency Andrew has warned about. If you were using
> > the same language tag as id.loc.gov, but a different literal (and adding one
> > dot to a literal makes it an entirely different literal), then your data
> > would be inconsistent with the id.loc.gov one.
> >
> > Now, on having a language tag or not, I see your issue, but personally I'm
> > ok with originally Spanish labels being considered as English ones, if
> > there's no English translation for them.
> > Anyway, the core issue to me here is that this language tag dilemma also
> > applies for LoC, which made the opposite choice. Ideally if you publish data
> > on LC concepts, it should be compatible with what LC has--"compatible" in
> > the formal but also informal way: whether there is an inconsistency or not,
> > a data consumer may still be extremely puzzled why LC and BL can't agree on
> > their concepts' prefLabels!
> >
> > Re. 3, getting data for indexing is a very valid concern. But it also could
> > be done just before the indexing step, not in the data you publish. But
> > well, you are perhaps in the best position to judge: as you have put it,
> > this is about what you feel you should provide to your typical data
> > consumers. Note, however, that putting the labels re-introduces the risk of
> > being out-of-synch with a central repository, which you correctly identified
> > in your first move.
> >
> > About the danger of a target source being put offline, that is also a valid
> > point. But for id.loc.gov I wouldn't be so worry. In fact, BL starting to
> > rely on it for its data would be a key motivation for LC not to put it
> > offline :-)
> >
> >
> > Re. your last question, I guess I can only repeat what I've written above.
> > My gut feeling would be to replicate as little as possible: ideally, the URI
> > should be the only thing present in your data! But if you have clear ideas
> > about the amount of efforts your data consumers would be willing to undergo,
> > you should adapt your data to make their life easier.
> > Note that the data consumers who'd be interested in such caching might be
> > the ones interested in accessing large dumps of data at once. So the "true
> > linked data version" (what you get when following your nose over HTTP) could
> > include only the URIs, but a fit-for-purpose dump of your entire catalogue
> > may include a bit more.
> >
> > Best,
> >
> > Antoine
> >
> >
> >
> >> Hi Antoine and all,
> >>
> >> Many thanks for the feedback and apologies for the length of this email.
> >>
> >> In answer to the questions about
> >> <dcterms:subject>
> >>>>           <rdf:Description
> >>>> rdf:about="http://id.loc.gov/authorities/sh2008107012#concept">
> >>>>             <skos:inScheme
> >>>> rdf:resource="http://id.loc.gov/authorities#conceptScheme" />
> >>>>             <skos:prefLabel>Literary landmarks--England--
> >>>> London.</skos:prefLabel>
> >>>>             <rdf:type
> >>>> rdf:resource="http://www.w3.org/2004/02/skos/core#Concept" />
> >>>>           </rdf:Description>
> >>>>         </dcterms:subject>
> >>
> >> And
> >>
> >> 1. why does the literal value contained in<skos:prefLabel>   Literary
> >> landmarks--England--
> >> London.</skos:prefLabel>   does not exactly match the one served by LC at
> >> id.loc.gov for http://id.loc.gov/authorities/sh2008107012#concept?
> >>
> >> The answer is that it should. We've matched the LCSH heading contained in
> >> the bibliographic record to the LCSH heading in the authority file. The
> >> issue is to do with punctuation (which is input at the end of the heading
> >> in the bib record but is not part of the heading in the authority file).
> >> We'll address this in the conversion - this is an issue in the LCSH
> >> headings and I believe in other parts of our output. [So no, we "are *not*
> >> essentially trying to say which of the SKOS preflabels the BL prefers" as
> >> one post tried to double-guess]
> >>
> >> 2. Why does our output does not include the xml:lang="en"
> >> in<skos:prefLabel>
> >> This is because in some cases this xml:lang="en" whilst true to the data
> >> served up by id.loc.gov is actually not correct. For example, if you look
> >> at
> >> <http://id.loc.gov/authorities/sh94003128#concept>   for Parque Nacional
> >> Torotoro (Bolivia), we have
> >> <skos:prefLabel xml:lang="en">Parque Nacional Torotoro
> >> (Bolivia)</skos:prefLabel>
> >>
> >> instead of Spanish.
> >>
> >> I assume the reason for that is that there isn't the granularity in MARC 21
> >> - where these headings originates from - to code the language of each data
> >> element. So when LC expresses LCSH in SKOS, they couldn't specify and went
> >> for the language of the majority of the headings, which is English.
> >>
> >> So we - ok, I ;-) thought we could do "without" the xml:lang attribute
> >> since it wasn't "correct" in all cases. I didn't realise the implications.
> >>
> >> 3. Why are we outputting both the literal value and the resource URI?
> >> In a very first attempt, we'd only included the resource URI as you
> >> suggest. They were concerns about the two being out of sync., e.g. when a
> >> LCSH is updated. In fact, this is one of the uses of those URIs - enabling
> >> easier updating of bibliographic data.
> >>
> >> But we got some advice to the contrary. Some linked data platforms index
> >> the literal values to improve searching; it was also pointed out that there
> >> may be a risk of the linked dataset we link to "disappearing".
> >>
> >> There are other considerations: we are putting our data out for people to
> >> use and re-use; and we are not too sure what they want to do with it yet -
> >> so as you suggest, some of them may not want or be able to go and fetch
> >> data from id.loc.gov. or any other data sets we link to. A related question
> >> is to do with the time and resources to produce these files. At the moment,
> >> we are concentrating on the BNB but the intention is to work on other data
> >> sets. We are currently working on two versions of the file, a "non-URI" and
> >> a "with added-URI" version of the data and ideally, it would be good to
> >> have only one version - the "with added-URI" one - to maintain/produce if
> >> it meets the needs of all/most people.
> >>
> >> Now it's my turn for a question ;-)
> >>
> >> In your feedback, you highlight the risk of "that your data is less
> >> complete than the one of other services"[1] e.g., if you don't have
> >> skos:broader that id.loc.gov has for LCSH concepts.
> >>
> >> So to take the example of LCSH at id.loc.gov, how much of the data included
> >> there should I replicate in my instance data? Isn't the<skos:prefLabel> 
> >>  and the resource URI sufficient? If you need other info,
> >> like<skos:altLabel>   or<skos:broader>, won't you be able to fetch it via
> >> the resource URI?
> >>
> >> That's it for now ;-)
> >>
> >> I would also like to say that from later today I shall be offline for the
> >> next two weeks. So that people don't think we don't want to engage or
> >> anything like that if there is no post. I really appreciate feedback.
> >>
> >> Cheers
> >>
> >> Corine
> >
>
>
Received on Tuesday, 8 February 2011 18:55:13 UTC