W3C home > Mailing lists > Public > public-esw-thes@w3.org > January 2011

Re: Fwd: [open-bibliography] Library of Congress subject headings & RDF

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Fri, 21 Jan 2011 08:45:31 +0100
Message-ID: <4D39399B.9070302@few.vu.nl>
To: Simon Spero <ses@unc.edu>
CC: Ross Singer <rossfsinger@gmail.com>, Alistair Miles <alimanfoo@googlemail.com>, SKOS <public-esw-thes@w3.org>
Ross, Simon,

Coming back to this old thread...
@Ross: I'm looking forwards to seeing your attempt at a fully fledged schema! But I'd really advise you to look at the MADS/RDF stuff while doing it. There are not many possible patterns, after all.
Btw it's interesting that you went for containers because you did not know if the list of coordinated concepts was closed. MADS/RDF went exactly the container way because they knew they had all the coordinated concepts for coordinations in their data, and wanted to close the lists.

@Simon: I think I agree with you, the ontologies on the table fail at entirely capturing the rules for creating coordinations--even for the small parts of those rules I suspect I know :-)
But I wonder whether changing the representational pattern will help much. These constraints are either of the kind that "only" require more OWL work (which you can do once you have an appropriate structure for the data, which is the highest priority), or of the kind that just cannot be managed properly with the current RDF(s)/OWL stack, aren't they?

Antoine

  
> Ross -
> [ I need to run another harvest, at least for all the newly created names. I need to check with LC to find the right time windows, as I recall that the system goes offline at some point during the night, presumably for backups. There was also a system upgrade, which hopefully didn't reset all the 001 values. ]
>
> I've spent a fair amount of time working on this issue (the formal, lexical, and cognitive semantics of pre-coordinated headings and subdivisions are the core of my dissertation prospectus, especially as it relates to inferring the underlying ontology for an intentional KOS ). A lot of the official semantics are at best not well understood, or due to canonical orderings, are ambiguous.
>
> My general belief, which is amplified by the problems that show up in the current MADS/RDF draft, is that using containers is not the best way to capture all of the relationships that are contained within a co-ordinated subject heading.
>
> It also makes it hard to express some of the constraints that are required for a heading to be meed the LC rules for a heading to be well formed. Some of these rules are closer to being syntactic than semantic (for example, that geographical subdivisions be placed after the right-most geographically subdivideable heading).
>
> My belief is that a subdivided heading concept entails the existence of a number of other, probably anonymous, concepts which conceptually subsume the original concept. For example, some intervening concepts have components deleted; some components can be reordered, but all are necessarily broader than the initial subdivided heading concept.
>
> Also, to Thad; I had a remarkably high hit rate at inferring $x or $v on subject headings using the Stanford POS tagger trained on the LC corpus. I expect the results would be even better with parts of the fixed field available, but if you've got the fixed field, you've probably got the subfield codes (sigh)
>
>
> Simon // I still think that $v was a really bad idea.
> p.s.
> This message is firmly based on the Airlie House report, because my macbook pro is a little toasty right now.
>
> On Fri, Jan 7, 2011 at 1:49 PM, Ross Singer <rossfsinger@gmail.com <mailto:rossfsinger@gmail.com>> wrote:
>
>     Alistair, these are good pointers and I recall some of this (and did
>     look at Antoine's summary prior to modeling this). This is also
>     exactly why I forwarded from open-bibliography to here, for this kind
>     of advice.
>
>     Here is my rationale for the design choices I made (and, nothing is
>     set in stone, so I'm persuadable on any of this, really):
>
>     1) The decision to use an rdfs:Container (in this case, rdf:Seq) over
>     rdf:List was merely pragmatic, since, as I mentioned a bit in my
>     original email, I couldn't be certain that I had all of the resources
>     being coordinated available (like the name authorities, for example).
>
>     I could (easily!) be misinterpreting the definition of rdf
>     collections, but my understanding was that they were closed lists -
>     which I thought might be problematic given the fact that I might not
>     be able to model the entire collection.
>
>     2) Since there's nothing in SKOS to explain what kind of a heading it
>     is (and I'm not arguing that there should be), it's not semantically
>     obvious what the relationships would be between the resources being
>     coordinated.
>
>     That is, as a human, I can figure out that "20th century" is a
>     chronological heading and if that heading resource is in a
>     ConceptScheme for chronological headings, it's possible for me convey
>     that to my agent with some prior knowledge.
>
>     The goal with lcsh:geographicSubdivision, etc. was, assuming there's
>     some adoption, that relationship wouldn't require prior knowledge and
>     human intervention for my agent to see explicitly how these resources
>     were related.
>
>     Another approach would be to subclass skos:Concept with
>     lcsh:GeneralSubdivision or lcsh:FormSubdivision, etc. but my gut
>     feeling was that was even worse since I don't think most linked data
>     agents will have any reasoning capability, at least, not in the near
>     term.
>
>     Anyway, I look forward to any comments/suggestions. Meanwhile, I'll
>     get working on getting a schema up.
>
>     Thanks!
>     -Ross.
>
>     On Fri, Jan 7, 2011 at 12:47 PM, Alistair Miles
>     <alimanfoo@googlemail.com <mailto:alimanfoo@googlemail.com>> wrote:
>      > Hi Ross,
>      >
>      > I'm a bit behind the times here, and you've probably seen all these already,
>      > but for reference, the page at [1] has good links to previous discussions
>      > of coordination. In particular, Antoine's summary of coordination in the
>      > SKOS primer [2] is a good summary of where we got to in the SWDWG.
>      >
>      > Cheers,
>      >
>      > Alistair
>      >
>      > [1] http://www.w3.org/2001/sw/wiki/SKOS/Issues/Coordination
>      > [2] http://www.w3.org/TR/skos-primer/#secconceptcoordination
>      >
>      > On Fri, Jan 07, 2011 at 12:16:48PM -0500, Ross Singer wrote:
>      >> Hi all, forwarding a thread from the open-bibliography
>      >> (http://lists.okfn.org/mailman/listinfo/open-bibliography) list here.
>      >> It started with a question from Owen Stephens about a topic that's
>      >> come up here before (subdivisions, coordination, etc.).
>      >>
>      >> I'm bringing it here because Owen's question prompted me to explain
>      >> some of the ideas I've been playing around with in this regard in
>      >> http://lcsubjects.org/ which might be of interest here, as well.
>      >>
>      >> First Owen's original post:
>      >>
>      >> "Can anyone point me at (or advise me on) examples of representing
>      >> subject heading fields from a library catalogue record as RDF.
>      >> Specifically I'm interested in how chained sets of subject headings
>      >> are represented.
>      >>
>      >> E.g. a library catalogue record might have a heading:
>      >>
>      >> 650$$aPopular Music$$xHistory$$y20th Century
>      >>
>      >> Each one of these headings:
>      >>
>      >> Popular Music
>      >> History
>      >> 20th Century
>      >>
>      >> will have a SKOS representation on id.loc.gov <http://id.loc.gov>, but to represent each
>      >> heading separately as a dc:subject (or similar) would lose the context
>      >> of chaining them together.
>      >>
>      >> There are some entries on id.loc.gov <http://id.loc.gov> that represent some 'chains'
>      >> (those that have been 'authorised') - e.g.
>      >> http://id.loc.gov/authorities/sh2008109787#concept is 'Popular
>      >> Music--History and Criticism' - but for me this doesn't feel quite
>      >> right - doesn't this lose some of the flexibility of the faceted
>      >> scheme?
>      >>
>      >> I'm wondering about something similar to the way BIBO handles author
>      >> lists (you can both represent each author, and the list of authors,
>      >> including order)"
>      >>
>      >> and then my reply:
>      >>
>      >> ---------- Forwarded message ----------
>      >> From: Ross Singer <ross.singer@talis.com <mailto:ross.singer@talis.com>>
>      >> Date: Fri, Jan 7, 2011 at 11:12 AM
>      >> Subject: Re: [open-bibliography] Library of Congress subject headings & RDF
>      >> To: List for Working Group on Open Bibliographic Data
>      >> <open-bibliography@lists.okfn.org <mailto:open-bibliography@lists.okfn.org>>
>      >>
>      >>
>      >> Hi Owen,
>      >>
>      >> I agree that the status quo at id.loc.gov <http://id.loc.gov> is pretty unsatisfying (on
>      >> several levels, including this one) and this is one the things that I
>      >> changed for lcsubjects.org <http://lcsubjects.org> in the last redesign (although it's
>      >> certainly not "fixed" or even remotely standard - but it was intended
>      >> to get the conversation started in this direction).
>      >>
>      >> Thankfully, though, your specific example works :)
>      >>
>      >> http://lcsubjects.org/subjects/sh2008109787#concept
>      >>
>      >> For subdivided subject headings like this, I've added a few
>      >> properties: lcsh:coordinates, lcsh:generalSubdivision,
>      >> lcsh:chronologicalSubdivision, lcsh:primaryConcept, etc.
>      >>
>      >> The RDF out of lcsubjects.org <http://lcsubjects.org> is pretty brutally verbose, but directly
>      >> out of the Platform it looks like:
>      >> http://api.talis.com/stores/lcsh-info/meta?about=http%3A%2F%2Flcsubjects.org%2Fsubjects%2Fsh2008109787%23concept&output=xml <http://api.talis.com/stores/lcsh-info/meta?about=http%3A%2F%2Flcsubjects.org%2Fsubjects%2Fsh2008109787%23concept&output=xml>
>      >>
>      >> and the coordinates resource is an rdf:Seq (to preserve order):
>      >>
>      >> http://api.talis.com/stores/lcsh-info/meta?about=http%3A%2F%2Flcsubjects.org%2Fsubjects%2Fsh2008109787%23coordinates&output=xml <http://api.talis.com/stores/lcsh-info/meta?about=http%3A%2F%2Flcsubjects.org%2Fsubjects%2Fsh2008109787%23coordinates&output=xml>
>      >>
>      >> This is still totally a work in progress (and incredibly incomplete),
>      >> but is intended to begin to provide the sort of semantics that you're
>      >> looking for (I think). It also (I hope) begins to lay out a
>      >> foundation for how LCSH is actually intended to be used (which is a
>      >> set of building blocks). So to take your original example,
>      >> "650$$aPopular Music$$xHistory$$y20th Century"
>      >>
>      >> This could be created like:
>      >>
>      >> <http://example.org/book/1>
>      >> dcterms:subject
>      >> <http://example.org/subjects/popular-music--history--20th-century#concept>.
>      >>
>      >> <http://example.org/subjects/popular-music--history--20th-century#concept>
>      >> lcsh:generalSubdivision
>      >> <http://lcsubjects.org/subjects/sh99005024#concept> ;
>      >> lcsh:chronologicalSubdivision
>      >> <http://lcsubjects.org/subjects/sh2002012476#concept>;
>      >> lcsh:primaryConcept <http://lcsubjects.org/subjects/sh85088865#concept> ;
>      >> a skos:Concept ;
>      >> skos:prefLabel "Popular Music--History--20th Century" ;
>      >> lcsh:coordinates
>      >> <http://example.org/subjects/popular-music--history--20th-century#coordinates>
>      >> .
>      >>
>      >> <http://example.org/subjects/popular-music--history--20th-century#coordinates>
>      >> a rdf:Seq ;
>      >> rdf:_1 <http://lcsubjects.org/subjects/sh85088865#concept> ;
>      >> rdf:_2 <http://lcsubjects.org/subjects/sh99005024#concept> ;
>      >> rdf:_3 <http://lcsubjects.org/subjects/sh2002012476#concept> .
>      >>
>      >> (the lcsubjects.org <http://lcsubjects.org> URIs could just as easily be id.loc.gov <http://id.loc.gov> URIs -- it
>      >> was just easier to cut and paste from existing data).
>      >>
>      >> With this, it's much easier to make our uncontrolled subject headings
>      >> that are composites of a bunch of controlled headings.
>      >>
>      >> Like I said, this is pretty incomplete on lcsubjects.org <http://lcsubjects.org>, currently,
>      >> mainly because there's a lot missing (namely the corporate names and
>      >> random chronological subdivisions, but there are also subdivision
>      >> terms that don't appear to be derived from an authorized heading).
>      >> See: http://lcsubjects.org/subjects/sh2010007497 or
>      >> http://lcsubjects.org/subjects/sh85045754 as somewhat different
>      >> examples.
>      >>
>      >> The first one has a URI for Austria, but that URI returns a 404 (I
>      >> built this from the Fred 2.0 data, so I have the NAF, I just haven't
>      >> figured out how to incorporate it into lcsubjects.org <http://lcsubjects.org>, yet). The
>      >> second one shows an unauthorized chronological subdivision -- so,
>      >> currently, it just drops it.
>      >>
>      >> Here's another example: http://lcsubjects.org/subjects/sh85134593#concept
>      >>
>      >> this should use: http://lcsubjects.org/subjects/sh99005746#concept as
>      >> the general subdivision -- but that's an altLabel, so it's currently
>      >> failing (as you can see, this is wrought with frustations!).
>      >>
>      >> Another mind bender: http://lcsubjects.org/subjects/sh2010106574#concept
>      >>
>      >> This one chokes, because "Polyglot" isn't an authorized term (instead
>      >> it should be using http://lcsubjects.org/subjects/sh85037700#concept
>      >> -- "Dictionaries, Polyglot") and was created after Fred 2.0 (3.5 years
>      >> after!), so I don't have access to the MARC authority record to
>      >> properly look things up (not that it would help me in this case,
>      >> anyway [1]).
>      >>
>      >> So, to try to bring this on home..., I think there are solutions (and
>      >> linked data solutions) to this, but LC is doing very little to enable
>      >> it. If they'd provide the original MARC as a format for the concepts,
>      >> that would be a start -- but, honestly, without all of the data
>      >> available (including the NAF), this is going to be half-baked.
>      >>
>      >> So, anyway, thanks for prompting me to write a bit about this :)
>      >> Probably worth forwarding to the SKOS list, as well.
>      >>
>      >> -Ross.
>      >>
>      >> [1] Here's the MARC record for Plastics--Dictionaries--Polyglot:
>      >> 000 00476cz a2200169n 450
>      >> 001 8244985
>      >> 005 20100420002715.0
>      >> 008 100413|| anannbabn |n ana
>      >> 035 __ |a (DLC)464428
>      >> 035 __ |a (DLC)sh2010106574
>      >> 906 __ |t 8888 |u tc00 |v 0
>      >> 010 __ |a sh2010106574
>      >> 040 __ |a DLC |b eng |c DLC
>      >> 150 __ |a Plastics |v Dictionaries |x Polyglot
>      >> 667 __ |a Record generated for validation purposes.
>      >> 670 __ |a Work cat.: Fachwörterbuch Kunststofftechnik, c1992
>      >> 953 __ |a tc00
>      >>
>      >> so there's still not an obvious way to know that one should be looking
>      >> for Dictionaries, Polyglot.
>      >>
>      >>
>      >>
>      >> On Fri, Jan 7, 2011 at 7:18 AM, Owen Stephens <owen@ostephens.com <mailto:owen@ostephens.com>> wrote:
>      >> > Can anyone point me at (or advise me on) examples of representing subject
>      >> > heading fields from a library catalogue record as RDF. Specifically I'm
>      >> > interested in how chained sets of subject headings are represented.
>      >> > E.g. a library catalogue record might have a heading:
>      >> > 650$$aPopular Music$$xHistory$$y20th Century
>      >> > Each one of these headings:
>      >> > Popular Music
>      >> > History
>      >> > 20th Century
>      >> > will have a SKOS representation on id.loc.gov <http://id.loc.gov>, but to represent each heading
>      >> > separately as a dc:subject (or similar) would lose the context of chaining
>      >> > them together.
>      >> > There are some entries on id.loc.gov <http://id.loc.gov> that represent some 'chains' (those
>      >> > that have been 'authorised') -
>      >> > e.g. http://id.loc.gov/authorities/sh2008109787#concept is 'Popular
>      >> > Music--History and Criticism' - but for me this doesn't feel quite right -
>      >> > doesn't this lose some of the flexibility of the faceted scheme?
>      >> > I'm wondering about something similar to the way BIBO handles author lists
>      >> > (you can both represent each author, and the list of authors, including
>      >> > order)
>      >> > Thanks,
>      >> > Owen
>      >> > --
>      >> > Owen Stephens
>      >> > Owen Stephens Consulting
>      >> > Web: http://www.ostephens.com
>      >> > Email: owen@ostephens.com <mailto:owen@ostephens.com>
>      >> >
>      >>
>      >> > _______________________________________________
>      >> > open-bibliography mailing list
>      >> > open-bibliography@lists.okfn.org <mailto:open-bibliography@lists.okfn.org>
>      >> > http://lists.okfn.org/mailman/listinfo/open-bibliography
>      >> >
>      >> >
>      >>
>      >> -Ross.
>      >>
>      >
>      > --
>      > Alistair Miles
>      > Head of Epidemiological Informatics
>      > Centre for Genomics and Global Health <http://cggh.org>
>      > The Wellcome Trust Centre for Human Genetics
>      > Roosevelt Drive
>      > Oxford
>      > OX3 7BN
>      > United Kingdom
>      > Web: http://purl.org/net/aliman
>      > Email: alimanfoo@gmail.com <mailto:alimanfoo@gmail.com>
>      > Tel: +44 (0)1865 287669
>      >
>      >
>
>
Received on Friday, 21 January 2011 08:13:43 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:46:06 UTC