Re: Question about MARCXML to Models transformation

I share Ross and Stu's hard-to-articulate concerns. Rather than trying  
to force our data into FRBR, I'd love to see us toss our data into a  
pot/cloud and see what structures arise "naturally" from the data. If  
we take a big bunch of current library records, give them pretty good,  
descriptive predicates, we could see what comes out. My guess is that  
if we just take some fairly simple books (because bibliographic  
reality can be waaaaay complicated) we might be able to see some  
patterns emerge.

We'd have to decide on a set of predicates for the experiment, but I  
don't think that's too difficult. Using Ross's example, we would  
probably see an odd link that didn't quite make sense to us. That  
would let us know what may need to be adjusted.

Anyone looking for a grant proposal?

kc

Quoting stu <stuart.weibel@gmail.com>:

> I think Ross makes an important point.
>
> The reality is the collection of information assets 'out there'.
>
> The data is the collection of representational assertions about those
> assets.
>
> The model is an abstraction that brings the reality and the data together in
> ways that meet a set of functional requirements.
>
> When the characteristics of the model start driving unnatural,
> bending-over-backwards sort of actions or data structures, it makes me
> uneasy about whats going on, and one of the side effects is what Ross
> alludes to - hidden or unanticipated costs in maintaining data structures
> that flow from the model, rather than the data.
>
> I wish I could be more specific.. its a feeling of unease that is hard to
> articulate clearly
>
> stu
>
>
> On Wed, Mar 9, 2011 at 8:55 AM, Ross Singer <ross.singer@talis.com> wrote:
>
>> On Wed, Mar 9, 2011 at 11:20 AM, Young,Jeff (OR) <jyoung@oclc.org> wrote:
>>
>>> One way to punt on this problem would be to treat the relationship between
>>> W&M as 1-to-1 for now (80/20 rule). This would create some alias URIs for
>>> Expressions and possibly conflate a few, but we could always come in later
>>> and use owl:sameAs to reconcile the aliases and improve the data mining to
>>> split those we conflate.
>>>
>>
>> I'll probably be outnumbered on this, but I begin to feel somewhat
>> uncomfortable to assigning massive amounts of URIs for things in the absence
>> of knowing what they are.  This is further compounded by the fact that
>> they're being created because we have so little data to work with.
>>
>> I can't help but feel there are lots of hidden costs here (persistence of
>> the deprecated "stub" URIs, being one, but even just the general fact that
>> you need to dereference -- and store -- an extra, not-terribly-valuable, URI
>> simply to get a CBD of the Manifestation), but I also, personally, feel it's
>> significantly easier to add data later, when we know with some more
>> confidence what it is we're describing, than it is to edit.  Especially at
>> scale.
>>
>> Do others perceive this to be an issue?
>>
>> -Ross.
>>
>



-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet

Received on Wednesday, 9 March 2011 17:53:13 UTC