RE: SemWeb terminology page from Karen Coyle on 2010-12-04 (public-lld@w3.org from December 2010)

From: Karen Coyle <kcoyle@kcoyle.net>
Date: Sat, 04 Dec 2010 14:25:45 -0800
To: "Young,Jeff (OR)" <jyoung@oclc.org>
Cc: public-lld <public-lld@w3.org>
Message-ID: <20101204142545.599229fcofq85fxl@kcoyle.net>
Quoting "Young,Jeff (OR)" <jyoung@oclc.org>:


>
> Is the "text string" or the "authority record" the surrogate? Here's my
> guess:
>
> Early library models used text strings as "controlled access points". As
> Karen implies, we probably believed the "text string" was "the
> surrogate". Authority records came along with opaque identifiers and
> before we knew it the "text strings" in them started changing over time.
> We reluctantly resigned ourselves to thinking of the authority record as
> "the surrogate". (Idealized immutable "controlled access points" still
> seem to haunt our thinking.)

I don't recall any early library thinking that treats headings as  
"surrogates." I think the main concept is that of "collocation":  
bringing together same or like things. Svenonius again:

"Organizing information if it means nothing else means bringing all  
the same information together." (p.10)

Basically, headings are about retrieval. It's the "access" part of  
"description and access." I can see them as surrogates or as  
identifiers -- in either case the text string stands for a thing or a  
concept.

What authority data in a machine-readable form gave us was the ability  
to do better collocation. In pre-MARC card catalogs, you never went  
back and changed a heading, you created cross references from old  
terms to new ones (with special cards that were interfiled with the  
bibliographic entries). With machine processing it became possible to  
go back and change the headings rather than create a daisy-chain of  
references.

>
> In a Linked Data context, the idea of surrogate appears to be outmoded.
> Distributed agents are able to identify "the thing" directly without
> reference through a surrogate. Like Karen, I assume librarians will be
> concerned that the ideal of immutable "controlled access point" is being
> pushed even further away by further demotion from "established heading"
> to mere "preferred label".

I agree with this, although I would probably word it more in terms of  
identification and access: we are no longer limited to a  
human-readable identifier that ALSO serves as the access vocabulary.  
It's that dual role of the string (identifier and access) that I think  
is the source of difficulty. Well, it's one source.


> I wouldn't call it a "controlled vocabulary" unless skos:inScheme was
> also involved.

That is why I mentioned the W3C provenance work, which would allow one  
to state the rule and agent that would make the prefLabel a formal and  
controlled choice of terms. I think that provenance is needed for  
instance data, while skos:inScheme is for defining ontologies. That is  
because controlled access points may only exist as instance data --  
think of this as the difference between LCSH as a scheme (at  
id.loc.gov) and the actual headings in records, most of which are not  
represented in the LCSH scheme.

>
> Does the additional requirement of using skos:inScheme for "controlled
> vocabularies" that use skos/xl:prefLabel help resolve this concern for
> you?

Almost, but take a look at the FRAD diagram and see if you think skos  
and provide all of that info.

[see p. 7 of this pdf for the diagram I mean:  
http://www.ifla.org/files/hq/papers/ifla75/215-patton-en.pdf]

>
> The concern may be that anyone can create a skos:ConceptScheme, but
> libraries may want to create a subclass they consider to be "authority".
> The new VIAF ontology does this with its viaf:AuthorityScheme class.

The madsrdf does something like this with MADSScheme. Perhaps based on VIAF?

> (Unfortunately, the VIAF ontology is virtually unusable by humans
> because of stylesheet problems. Sorry.)

There seems to be a fairly common problem with displaying ontologies  
in human-friendly forms. That's definitely something that has to get  
ironed out if we want people to actually create instance data. :-)


>
> I think there are still important use cases for treating terms as
> first-class objects (e.g. attaching pronunciation). Nevertheless, we
> need to acknowledge the mutability of naming things (e.g. people and
> concepts) by modeling names/labels/terms as properties of a relatively
> immutable "primary entity" (as the UNIMARC Authority format calls them).

Agreed.
kc


-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
Received on Saturday, 4 December 2010 22:26:19 UTC