Re: SemWeb terminology page

On Sat, Dec 04, 2010 at 06:48:09PM -0500, Barbara Tillett wrote:
> This very much ties into the notions that helped VIAF
> to evolve - IFLA had thought one string established by a
> national bibliographic agency (NBA) would be used by everyone
> in the world for the "authors" in each country and that
> one bibliographic description for everything published in a
> country would be provided by the NBA and that those bib records
> would be used everywhere.  What's wrong with this picture?
> We don't all use the same languages or scripts.  SO the idea
> of the linked clusters of authority records evolved for VIAF,
> where all names used for a person, corporate body/conference,
> uniform title - that is all the text strings, plus the set
> of other attributes associated with each of those entities,
> would together represent that entity (be the surrogate) and
> we could display the context appropriate form to an end user
> based on their preference/profile/ etc.
> 
> However, since not all use cases have an end user with a
> preference/profile, we still have "default " values for a
> particular name/text string that libraries will display as
> their authorized access point.  RDA is trying to eventually
> get us out of that mind set of the authorized access point,
> by first giving all the identifying elements needed for each
> entity (person, corporate body, etc.) so again the context
> appropriate set of elements could be displayed as needed,
> but for now with limitations of MARC and no new systems yet
> developed to make the visions real, we have to continue with
> authorized access points/aka headings. - Barbara

This is really a helpful summary!  

I do not see the use of identifying elements, such as text
strings, which together represent an entity, discussed in the
Use Case for VIAF [1].  This reinforces my sense that there
is a gap in our use-case coverage on this issue.

Tom

[1] http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Virtual_International_Authority_File_%28VIAF%29


> ________________________________________
> From: public-lld-request@w3.org [public-lld-request@w3.org] On Behalf Of Th
> omas Baker [tbaker@tbaker.de]
> Sent: Saturday, December 04, 2010 10:23 AM
> To: Karen Coyle
> Cc: public-lld
> Subject: Re: SemWeb terminology page
> 
> Karen,
> 
> On Fri, Dec 03, 2010 at 03:15:23PM -0800, Karen Coyle wrote:
> > In her book "The intellectual foundation of information organization"
> > Svenonius has a section on controlled and uncontrolled vocabularies.
> > Her statement about controlled vocabularies says:
> >
> > "[Controlled vocabularies] are constructs in an artificial language;
> > their purpose is to map users' vocabulary to a standardized vocabulary
> > and to bring like information together." (p.88) [1]
> >
> > Do we agree that this is the role of our #1 group? I ask because I
> > perceive this to be different from the original proposed definition:
> >
> > "These describe concepts that are used in actual metadata."
> >
> > If you look at FRAD [2] you see that the assignment of terminology to
> > the concept is of equal or greater importance than any description of
> > the concept itself. In fact, that's what I would emphasize as the role
> > of a controlled vocabulary: that it is a method to *control* *language
> > terms*. Many controlled vocabularies have minimal information about
> > the concepts, but all exist to make a selection of particular terms of
> > use.
> 
> This introduces an interesting angle!
> 
> My first thought was along the lines of Antoine's: Linked
> Data is about using URIs when possible, and since this group
> is specifically about Linked Data, we should explain that
> values are not just string literals.
> 
> But to reuse the metaphor I suggested in [1], URIs are also the
> "words" of RDF's "language of data".  If that is so, then I
> would argue that the goal "MAP" [2], which is essentially about
> mapping URIs, is analogous to the mapping of natural-language
> words (string literals) that Svenonius has in mind.
> 
> In natural language, people coin different words, or
> variants on the same word, to talk about the same thing, and
> "controlled vocabularies" as described above are for mapping
> those diverse words to an artificial set of "controlled" words.
> 
> In the Linked Data context, people are coining URIs for the
> things they need to talk about, and the "MAP" goal is about
> creating links among those URI-words.  The only thing missing
> from the "MAP" goal, as defined, is the notion of mapping to
> one particular "authoritative" URI.
> 
> I would argue that since we are viewing these things from
> a linked data perspective, we should maintain the emphasis
> on URIs.  However, it does make me wonder whether there are
> potential uses of linked data in leveraging literal values
> that are not addressed in our LLD use cases.  Two possibilities:
> 
> -- Google Squared uses EAV (entity-attribute-value) "triples"
>    in their internal index -- triples composed not of URIs but
>    of strings extracted from Web searches.  That's all I know
>    about it, but to me it suggests interesting possibilities
>    for getting from the analysis of unstructured text data
>    to URIs with triples.
> 
> -- The other notion is that Linked Data could be used to pull
>    together a set of (natural-language) words and (string literal)
>    names -- a constellation of information which, taken together,
>    could be used to infer more information about the things
>    described, in support of the sort of disambiguation that
>    librarians engage in when they use birth and death dates,
>    occupations, and locations to disambiguate between people
>    with the same name.
> 
> Tom
> 
> [1] http://lists.w3.org/Archives/Public/public-lld/2010Oct/0088.html
> [2] http://www.w3.org/2005/Incubator/lld/wiki/Goals
> [3] http://bit.ly/hN76wK
> 
> --
> Tom Baker <tbaker@tbaker.de>
> 
> 

-- 
Tom Baker <tbaker@tbaker.de>

Received on Sunday, 5 December 2010 00:38:31 UTC