Re: is FRBR relevant? from Jodi Schneider on 2010-08-10 (public-xg-lld@w3.org from August 2010)

From: Jodi Schneider <jodi.schneider@deri.org>
Date: Tue, 10 Aug 2010 14:14:54 +0100
To: "Young,Jeff (OR)" <jyoung@OCLC.ORG>
Cc: "Karen Coyle" <kcoyle@kcoyle.net>, <public-xg-lld@w3.org>
Message-Id: <93AAFA01-5E30-4338-A4C9-BE0D7A9C3007@deri.org>
By the end here, I start to understand that you want to distinguish #concept from #heading. That seems useful. But I still have some questions (below, inline) from while I was puzzling out your argument. :) -Jodi

On 10 Aug 2010, at 03:19, Young,Jeff (OR) wrote:

> Karen Coyle wrote:
>> I guess where Jodi and I got lost was in your use of quoted strings in
>> Google, which, as far as Google is concerned, is a literal.
> 
> Google doesn't need to know whether "has as subject" is a literal or the
> name of a key relationship in the FRBR model. Google isn't magic. They
> are constrained by set theory and precision/recall just like the search
> engines of yore. The set of documents Google has indexed that contain
> the exact phrase "has as subject" are relatively few. The set of
> documents that contain the exact phrase "World War, 1939-1945" are
> relatively few. The intersection of these two sets is sadly and happily
> infinitesimal. It's sad in the sense that libraries don't take advantage
> of set theory. It's happy in the sense that my example proves Google
> (and presumably reality in general) are still constrained by set theory.

I don't think that set theory is the point here. It sounds like you want Google to index library catalogs, and for library catalogs to present items with the phrases

"has as subject" and "World War, 1939-1945".

But the phrases only go so far -- for two reasons:

(1) Someone has to provide human entry points (because most humans don't think WWII, oh, I'll search for "World War, 1939-1945"
(2) URI identifiers would be more satisfying than a phrase. This allows disambiguation, in case there are multiple uses of "World War, 1939-1945" (i.e. more precision). It also allows makes it easier to refer to the same thing with multiple terms (e.g. World War II, WWII, ...). We can use
http://id.loc.gov/authorities/sh85148273

I don't think you disagree with this -- so I'm still puzzled -- I feel I must be missing some part of what you are trying so hard to communicate!

I thought there were catalogs using this identifier; I looked at LIBRIS, but the N3 for
http://libris.kb.se/bib/9771391 seems to use the literal string:
<http://libris.kb.se/resource/bib/9771391>    dc:subject    "World War, 1939-1945" .

> 
>> Anyone
>> could create sets of literals for searching that would increase
>> precision. (Ask me some day about "dilcue" :-)).
> 
> Ad hoc literals aren't very convincing. What matters are the names we
> collectively assign in our conceptual models and our adaptation of Web
> architecture to discover, share, understand, and use/reuse them. (So now
> is the time to ask, what is a "dilcue"? :-))
> 
>> In fact, "World War,
>> 1939-1945" retrieves a large number of hits on Google,
> 
> 95,100,000 to be exact. You're not impressed that adding the phrase "has
> as subject" reduces this to 3? There's no doubt that Google plays games
> with "this exact wording or phrase" and we love them for second guessing
> us most of the time. In certain conditions, though, they still seem to
> realize that set theory is useful to us.
> 
>> and I suspect
>> we'd be hard-pressed to find an instance in which "World War,
>> 1939-1945" was not somehow the subject of the retrieved resource. In
>> other words, you could possibly add "has as subject" to all of those
>> pages.
> 
> "has as subject" and "World War, 1939-1945" are not random associations
> of words. These are concepts in models that can be coordinated in Google
> queries to add meaning to patron's lives and give librarians purpose. If
> "we" made a concerted effort to include these schematic names in the
> documents we produce, Google would help us sort wheat from chaff by
> applying page ranking.

One problem is that Google would also collate this listserv conversation with those items. Because the literals
"has as subject" and "World War, 1939-1945" together DON'T mean, "this object has as subject 'World War, 1939-1945'". 

But we *can write* write such a sentence:
<http://libris.kb.se/resource/bib/9771391>    dc:subject    "World War, 1939-1945" .
or even better,
<http://libris.kb.se/resource/bib/9771391>    dc:subject    <http://id.loc.gov/authorities/sh85148273>.

(Insert your favorite object in the nominative).

To me, this is better, because we can say not only "this document has both the phrases of interest" but that someone asserts that this document has *that* as subject.

That, at least, is why I'm here discussing LLD. :)

> 
>> I presume you meant your search to be more than a search of strings.
> 
> The question is, will "we" produce Web documents that contain formalized
> literals so that Google will index those terms and thus allow us to
> avoid using SPARQL most of the time. :-)

I start to see that you're concerned about scalability and implementation. It sounds like you're arguing that formalized literals will help more than URI identifiers in this regard. Is that what you're saying?

If so, could you explain?

<snip>

> 
>> I still don't get how skos-xl would "fix" LCSH.
> 
> LCSH doesn't need "fixed" exactly. The only problem is that too many
> people believe the following URI identifies "the name of the thing"
> (i.e. the literal "World War, 1939-1945") rather than "the thing" (i.e.
> the concept of WWII):
> 
> http://id.loc.gov/authorities/sh85148273#concept 
> 
> Switching from skos:prefLabel to skosxl:prefLabel and coining a new URI
> for the skosxl:Label would help clarify the difference (IMO):
> 
> http://id.loc.gov/authorities/sh85148273#heading

Ok--this seems like the heart of what I have to understand.

I think it's useful to identify the concept. I'm not sure why the literal is helpful, except insofar as LCSH uses this literal to refer to the concept.

But it sounds like you ascribe some importance to the literal (enough to mint it a hash identifier). Could you explain? What do you want to do with the name of the thing, rather than with the concept?

> 
> 
>> To begin with, I'm not
>> sure that the use of #concept in LCSH in RDF refers to the subject
>> heading.
> 
> Exactly. #concept should identify "the thing"; #heading should identify
> "the name of the thing".

Can you give an example of where you'd want to refer to the heading?

I guess this discussion is using the heading as an example. So perhaps this discussion would be relevant to someone searching for #heading but not #concept. Is that what you're thinking?

> 
>> I suspect that you could argue that the authority entry
>> represents a concept, and that the "heading" is simply a prefLabel. Do
>> you see it differently?
> 
> That's my argument exactly. The #concept and the #heading should be two
> different things. The current URI identifies the #concept. If this issue
> with LCSH was clarified by adding a #heading URI (skosxl:Label), I
> suspect more people would be comfortable with this fragment of FRBR OWL
> and using LCSH #concepts as fitting examples:
> 
> frbr:hasAsSubject rdfs:range owl:Thing.
> 
> Jeff

<snip>
Received on Tuesday, 10 August 2010 13:15:29 UTC