- From: Karen Coyle <kcoyle@kcoyle.net>
- Date: Wed, 16 Jun 2010 16:55:42 -0700
- To: William Waites <ww-keyword-okfn.193365@styx.org>, William Waites <william.waites@okfn.org>
- Cc: Ross Singer <ross.singer@talis.com>, List for Working Group on Open Bibliographic Data <open-bibliography@lists.okfn.org>, "rufus.pollock@okfn.org" <rufus.pollock@okfn.org>, Ben O'Steen <bosteen@gmail.com>, public-xg-lld@w3.org
Quoting William Waites <william.waites@okfn.org>: > This leads to an interesting question. I'm not > aware of a field specifying the language of text > *in the MARC record itself*. No, there isn't a way to do that, nor to indicate the language of an individual field. Most of the text looks > neutral at first glance, but people have a habit > of putting things like "[by] Foo Bar" or "electronic > text" in some free-form text fields. Any heuristic > for normalising this will be helped by knowledge of > the language of the metadata as opposed to the book. Language of the metadata gets tricky. There is a concept of "language of the catalog" that is used in the creation of catalog records. This is used for notes and for a few areas like "356 pages." It should be possible to indicate the language of the catalog when transforming data from a library catalog *before* it loses that context. That will NOT tell you the language of each field, and some fields (author name, book title) are very hard to characterize in terms of a language. > > Is there existing non-English language data in MARC > format? If so, pointers would be appreciated. If not > what source formats are we looking at for non- > anglophone places? I know the Germans have something > called MAB. Try the Canadian libraries, who work in MARC in both English and French: http://www.collectionscanada.gc.ca > > A lot of the practical work that is likely to follow > onto the W3C LLD WG might involve libraries taking > data in these source formats and transforming them > to RDF. To what extent does the shape of the data > in the source records inform the shape of the > resulting triples? Is there anything we can learn from > the (salient) differences between MARC, MAB and > others? Yes, I'm sure there is. Some of the differences will be because of different cataloging rules, others will be differences in how the data is encoded. Teasing those apart won't be easy, but in a sense it has begun as the German libraries attempt to move from MAB to MARC. They are asking for numerous changes in MARC so that their data will fit. > > Is it within the scope of the working group to > enumerate these source data formats and provide > recommended mappings to RDF? More like: recommend that this task be undertaken. The LLD W3C group is only live for one year, but it is expected to "incubate" a number of follow-on tasks. kc > > Cheers, > -w > > -- > William Waites <william.waites@okfn.org> > Mob: +44 789 798 9965 Open Knowledge Foundation > Fax: +44 131 464 4948 Edinburgh, UK > -- Karen Coyle kcoyle@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Received on Wednesday, 16 June 2010 23:56:31 UTC