Re: Non- and Partial-FRBR Metadata from Karen Coyle on 2010-09-22 (public-lld@w3.org from September 2010)

From: Karen Coyle <kcoyle@kcoyle.net>
Date: Tue, 21 Sep 2010 17:15:50 -0700
To: William Waites <ww-keyword-okfn.193365@styx.org>, William Waites <william.waites@okfn.org>
Cc: Asaf Bartov <asaf.bartov@gmail.com>, public-lld@w3.org
Message-ID: <20100921171550.7n7hhnbgg48g88gg@kcoyle.net>
Quoting William Waites <william.waites@okfn.org>:

>
> For my part, after having a few discussions and thinking it through
> a bit, I'm not against WEMI. Where my problem lay (still lies) is
> not in the structure but in the rules about when to create a new
> Work. To me, for example, translation -> new work *not* just new
> expression.

You would need to explain WHY that is the case; in other words, how  
does your definition of "work" result in translations being a new  
work? It doesn't matter if you call it "work" or "x" -- you need a  
coherent definition. FRBR's Work is pretty clearly defined for some  
formats (monographic texts, for example), and what you have here  
("William's work") is different in definition from the FRBR Work. So  
define your rules for William's work, but I'm afraid you cannot  
redefine FRBR Work -- that's someone else's domain.


> So suppose you had,
>
> :foo a :BiblioThing ;
>     :author :somebody ;
>     :title "lalala"
>     :language "sw"
>     :isbn "1234567890123"
>     :shelf 3.
>
> In order to later create a more structured view, wouldn't you have
> inference rules that said something like,
>
> { ?x a :BiblioThing .
>   ?x ?p ?o .
>   ?p subPropertyOf :WorkishProperty } =>
> { _:work a :Work . _:work ?p ?o } .


If you look at where the RDA elements have been defined  
(http://metadataregistry.org/rdabrowse.htm), each Group 1 property is  
associated with Class W,E,M, or I. So the WEMI definition is in the  
Class/Property definitions. Whether or not you can easily turn that  
into a set of inference rules, I don't know. I think that for metadata  
sets as large and complex  as RDA and other library metadata, we may  
be relying on logic provided by programs, not solely the RDF  
structuring, but I leave that to the code writers to figure out.


> I like the proposal earlier in this thread to represent the data
> in WEMI structure but to use blank nodes to avoid having to worry
> too much in advance about URIs (that will almost always have to be
> deduped anyways, so why not dedup the blank nodes?) and let a
> standard entailment supply the missing bits.

OK, let me try this again :-). It's not just a question of "missing  
bits." The bits are all there, but they are not separated out into  
WEMI. And it's possible that they cannot be separated out into WEMI  
without human intervention (esp. for Expressions). So any use of  
frbr:Expression or frbr:Manifestation will be erroneous in terms of  
the properties that you associate with that frbr entity. In other  
words, they will be wrong, but in unpredictable ways. I am arguing  
that if you call something a frbr:Manifestation it should really *be*  
a frbr:Manifestation, not just something sort of like a  
frbr:Manifestation but also kind of like a frbr:Expression and maybe  
with some bits of frbr:Work. I think that mis-coding data is going to  
lead to problems in the future, not unlike the mis-use of owl:sameAs  
that is being discussed.



>
> What I did with bibliographica was make an entity, called MarcRecord
> that had all the fields that a MARC record might have. Then run a
> process on it (described as an opmv:Process) that generated a WEMI
> structure (actually I left out the E).

Which unfortunately means that probably either the W or the M is  
mis-coded. BTW, leaving out the E seems to be a fairly common decision  
because no one can really figure out what properties should go there.  
Until that gets clarified, we can't expect any two parties to create  
inter-relating frbr-ized descriptions.

> The process is a bit
> idiosyncratic

Idiosyncracy is exactly what I fear we will end up with.


> So building blocks,
>
> MARC21 Record -> MarcRecord as RDF (transliteration) -> W(E)MI Thing
>
> The first "->" is easy to specify (relatively) and could be the
> subject of some LLD vocabulary and guidance, and likewise for
> other source formats.


I have started looking into that first "->" and it isn't turning out  
to be as easy as I would like. I'm right now working on the fixed  
fields in MARC because those are relatively easy. The variable fields  
and the indicators are going to require some decision-making, things  
like: where do you divide author and title in an author/title field?  
and how to you keep them together as a unit? does an indicator "trace"  
result in a new property compared to "do not trace"? What to do with  
linkage subfields or materials specified subfields? etc. etc.

I have a database with all of the MARC21 fields and subfields and  
indicators, all of the fixed field values (both as codes and terms).  
[Actually, I may be one update behind, but will fix that.] I don't  
think I have all of the indicator values, but I need to look at that.  
I am currently filling in the fixed fields with names and display  
forms (tentative, of course). I've got an idea for linking them back  
to MARC21 from RDF, and a few options for URIs. I'll see if I can't  
find time to get it into good enough shape to make it available for  
comment. Oh, and I grabbed the "marc21.info" domain for the next five  
years -- I hope it doesn't take me that long to make something out of  
it! :-)

kc

>
> The second "->" is much harder, more controversial, subject to
> choices and cataloguing rules.
>
> If you were to have an intermediate step,
>
> MarcRecord as RDF -> GenericFlatBiblioRecord -> W(E)MI Thing
>
> I think this is more or less what you are suggesting. In this case,
> the first "->" is probably pretty easy. The second "->" is still
> hard.
>
> I don't think we can skip the "MarcRecord as RDF" step without
> destroying provenance information.
>
>> class affiliation rather than actual structure does not mean that
>> applications could not take advantage of efficiencies such as allowing
>> catalogers to copy Work or Expression information from other
>> bibliographic descriptions to a new bibliographic entry. The proof of
>> this is that systems (WorldCat; Open Library) have been able to create a
>> Work "view" while maintaining the traditional bibliographic records in
>> their databases. I can imagine WEMI being abstracted from complete or
>> incomplete bibliographic descriptions and used as linked data. I am less
>> able to imagine WEMI as our data structure for library and other
>> bibliographic systems, at least at this moment in time.
>
> I think the "class afiliation" is the second "->". I fear
> it will be very hard to pick apart exactly what this operation
> does without going into things that RDF (i.e. FOPL, DL, N3,
> stratified datalog) cannot express. WorldCat, OpenLibrary,
> have custom code (not written in RDF!) that does this. Maybe
> that is ok, but if that is our conclusion we should be clear
> about it.
>
> Cheers,
> -w
>
> --
> William Waites           <william.waites@okfn.org>
> Mob: +44 789 798 9965    Open Knowledge Foundation
> Fax: +44 131 464 4948                Edinburgh, UK
>
> RDF Indexing, Clustering and Inferencing in Python
> 		http://ordf.org/
>
>



-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
Received on Wednesday, 22 September 2010 00:16:34 UTC