RE: SemWeb terminology page from Tillett, Barbara on 2010-12-04 (public-lld@w3.org from December 2010)

From: Tillett, Barbara <btil@loc.gov>
Date: Sat, 4 Dec 2010 18:48:09 -0500
To: Thomas Baker <tbaker@tbaker.de>, Karen Coyle <kcoyle@kcoyle.net>
CC: public-lld <public-lld@w3.org>
Message-ID: <1D525027B29706438707F336D75A279F152C3D7216@LCXCLMB03.LCDS.LOC.GOV>
This very much ties into the notions that helped VIAF to evolve - IFLA had thought one string established by a national bibliographic agency (NBA) would be used by everyone in the world for the "authors" in each country and that one bibliographic description for everything published in a country would be provided by the NBA and that those bib records would be used everywhere.  What's wrong with this picture?  We don't all use the same languages or scripts.  SO the idea of the linked clusters of authority records evolved for VIAF, where all names used for a person, corporate body/conference, uniform title - that is all the text strings, plus the set of other attributes associated with each of those entities, would together represent that entity (be the surrogate) and we could display the context appropriate form to an end user based on their preference/profile/ etc.

However, since not all use cases have an end user with a preference/profile, we still have "default " values for a particular name/text string that libraries will display as their authorized access point.  RDA is trying to eventually get us out of that mind set of the authorized access point, by first giving all the identifying elements needed for each entity (person, corporate body, etc.) so again the context appropriate set of elements could be displayed as needed, but for now with limitations of MARC and no new systems yet developed to make the visions real, we have to continue with authorized access points/aka headings. - Barbara
________________________________________
From: public-lld-request@w3.org [public-lld-request@w3.org] On Behalf Of Thomas Baker [tbaker@tbaker.de]
Sent: Saturday, December 04, 2010 10:23 AM
To: Karen Coyle
Cc: public-lld
Subject: Re: SemWeb terminology page

Karen,

On Fri, Dec 03, 2010 at 03:15:23PM -0800, Karen Coyle wrote:
> In her book "The intellectual foundation of information organization"
> Svenonius has a section on controlled and uncontrolled vocabularies.
> Her statement about controlled vocabularies says:
>
> "[Controlled vocabularies] are constructs in an artificial language;
> their purpose is to map users' vocabulary to a standardized vocabulary
> and to bring like information together." (p.88) [1]
>
> Do we agree that this is the role of our #1 group? I ask because I
> perceive this to be different from the original proposed definition:
>
> "These describe concepts that are used in actual metadata."
>
> If you look at FRAD [2] you see that the assignment of terminology to
> the concept is of equal or greater importance than any description of
> the concept itself. In fact, that's what I would emphasize as the role
> of a controlled vocabulary: that it is a method to *control* *language
> terms*. Many controlled vocabularies have minimal information about
> the concepts, but all exist to make a selection of particular terms of
> use.

This introduces an interesting angle!

My first thought was along the lines of Antoine's: Linked
Data is about using URIs when possible, and since this group
is specifically about Linked Data, we should explain that
values are not just string literals.

But to reuse the metaphor I suggested in [1], URIs are also the
"words" of RDF's "language of data".  If that is so, then I
would argue that the goal "MAP" [2], which is essentially about
mapping URIs, is analogous to the mapping of natural-language
words (string literals) that Svenonius has in mind.

In natural language, people coin different words, or
variants on the same word, to talk about the same thing, and
"controlled vocabularies" as described above are for mapping
those diverse words to an artificial set of "controlled" words.

In the Linked Data context, people are coining URIs for the
things they need to talk about, and the "MAP" goal is about
creating links among those URI-words.  The only thing missing
from the "MAP" goal, as defined, is the notion of mapping to
one particular "authoritative" URI.

I would argue that since we are viewing these things from
a linked data perspective, we should maintain the emphasis
on URIs.  However, it does make me wonder whether there are
potential uses of linked data in leveraging literal values
that are not addressed in our LLD use cases.  Two possibilities:

-- Google Squared uses EAV (entity-attribute-value) "triples"
   in their internal index -- triples composed not of URIs but
   of strings extracted from Web searches.  That's all I know
   about it, but to me it suggests interesting possibilities
   for getting from the analysis of unstructured text data
   to URIs with triples.

-- The other notion is that Linked Data could be used to pull
   together a set of (natural-language) words and (string literal)
   names -- a constellation of information which, taken together,
   could be used to infer more information about the things
   described, in support of the sort of disambiguation that
   librarians engage in when they use birth and death dates,
   occupations, and locations to disambiguate between people
   with the same name.

Tom

[1] http://lists.w3.org/Archives/Public/public-lld/2010Oct/0088.html
[2] http://www.w3.org/2005/Incubator/lld/wiki/Goals
[3] http://bit.ly/hN76wK

--
Tom Baker <tbaker@tbaker.de>
Received on Saturday, 4 December 2010 23:49:40 UTC