Re: [open-bibliography] MARC and other standard formats -> RDF from William Waites on 2010-06-16 (public-xg-lld@w3.org from June 2010)

From: William Waites <william.waites@okfn.org>
Date: Wed, 16 Jun 2010 22:53:04 +0100
To: Karen Coyle <kcoyle@kcoyle.net>
CC: Ross Singer <ross.singer@talis.com>, List for Working Group on Open Bibliographic Data <open-bibliography@lists.okfn.org>, "rufus.pollock@okfn.org" <rufus.pollock@okfn.org>, Ben O'Steen <bosteen@gmail.com>, public-xg-lld@w3.org
Message-ID: <4C1947C0.5070102@okfn.org>

On 10-06-16 16:09, Karen Coyle wrote:
> Quoting William Waites <william.waites@okfn.org>:
> 
>>
>> As it is, it looks like a MARC record at a minimum consists in:
>>
>>     * one xyz:Manifestation
>>           o one dc:publisher
>>     * one (implied) xyz:Work
>>           o one or more dc:contributor (or sub-properties like author,
>>             translator, etc)
>>           o one or more identifiers, bibo:isbn bibo:issn etc
>>           o one or more dc:subject from a controlled vocabulary
> 
> 
> William, I'm not quite sure what your "at a minimum" represents, so this
> answer may or may not fit your use case.... 

I should probably have written "typically"

> I also want to note that some of the more useful data comes out of the
> Leader and the 008 fields (resource type, date of publication, language).

This leads to an interesting question. I'm not
aware of a field specifying the language of text
*in the MARC record itself*. Most of the text looks
neutral at first glance, but people have a habit
of putting things like "[by] Foo Bar" or "electronic
text" in some free-form text fields. Any heuristic
for normalising this will be helped by knowledge of
the language of the metadata as opposed to the book.

Is there existing non-English language data in MARC
format? If so, pointers would be appreciated. If not
what source formats are we looking at for non-
anglophone places? I know the Germans have something
called MAB.

A lot of the practical work that is likely to follow
onto the W3C LLD WG might involve libraries taking
data in these source formats and transforming them
to RDF. To what extent does the shape of the data
in the source records inform the shape of the
resulting triples? Is there anything we can learn from
the (salient) differences between MARC, MAB and
others?

Is it within the scope of the working group to
enumerate these source data formats and provide
recommended mappings to RDF?

Cheers,
-w

-- 
William Waites           <william.waites@okfn.org>
Mob: +44 789 798 9965    Open Knowledge Foundation
Fax: +44 131 464 4948                Edinburgh, UK

Received on Wednesday, 16 June 2010 21:54:27 UTC