Re: SemWeb terminology page from Thomas Baker on 2010-12-04 (public-lld@w3.org from December 2010)

From: Thomas Baker <tbaker@tbaker.de>
Date: Sat, 4 Dec 2010 10:23:31 -0500
To: Karen Coyle <kcoyle@kcoyle.net>
Cc: public-lld <public-lld@w3.org>
Message-ID: <20101204152331.GB2696@octavius>
Karen,

On Fri, Dec 03, 2010 at 03:15:23PM -0800, Karen Coyle wrote:
> In her book "The intellectual foundation of information organization"  
> Svenonius has a section on controlled and uncontrolled vocabularies.  
> Her statement about controlled vocabularies says:
> 
> "[Controlled vocabularies] are constructs in an artificial language;  
> their purpose is to map users' vocabulary to a standardized vocabulary  
> and to bring like information together." (p.88) [1]
> 
> Do we agree that this is the role of our #1 group? I ask because I  
> perceive this to be different from the original proposed definition:
> 
> "These describe concepts that are used in actual metadata."
> 
> If you look at FRAD [2] you see that the assignment of terminology to  
> the concept is of equal or greater importance than any description of  
> the concept itself. In fact, that's what I would emphasize as the role  
> of a controlled vocabulary: that it is a method to *control* *language  
> terms*. Many controlled vocabularies have minimal information about  
> the concepts, but all exist to make a selection of particular terms of  
> use.

This introduces an interesting angle!

My first thought was along the lines of Antoine's: Linked
Data is about using URIs when possible, and since this group
is specifically about Linked Data, we should explain that
values are not just string literals.

But to reuse the metaphor I suggested in [1], URIs are also the
"words" of RDF's "language of data".  If that is so, then I
would argue that the goal "MAP" [2], which is essentially about
mapping URIs, is analogous to the mapping of natural-language
words (string literals) that Svenonius has in mind.

In natural language, people coin different words, or
variants on the same word, to talk about the same thing, and
"controlled vocabularies" as described above are for mapping
those diverse words to an artificial set of "controlled" words.

In the Linked Data context, people are coining URIs for the
things they need to talk about, and the "MAP" goal is about
creating links among those URI-words.  The only thing missing
from the "MAP" goal, as defined, is the notion of mapping to
one particular "authoritative" URI.

I would argue that since we are viewing these things from
a linked data perspective, we should maintain the emphasis
on URIs.  However, it does make me wonder whether there are
potential uses of linked data in leveraging literal values
that are not addressed in our LLD use cases.  Two possibilities:

-- Google Squared uses EAV (entity-attribute-value) "triples"
   in their internal index -- triples composed not of URIs but
   of strings extracted from Web searches.  That's all I know
   about it, but to me it suggests interesting possibilities
   for getting from the analysis of unstructured text data
   to URIs with triples.

-- The other notion is that Linked Data could be used to pull
   together a set of (natural-language) words and (string literal) 
   names -- a constellation of information which, taken together,
   could be used to infer more information about the things 
   described, in support of the sort of disambiguation that 
   librarians engage in when they use birth and death dates,
   occupations, and locations to disambiguate between people
   with the same name.

Tom

[1] http://lists.w3.org/Archives/Public/public-lld/2010Oct/0088.html
[2] http://www.w3.org/2005/Incubator/lld/wiki/Goals
[3] http://bit.ly/hN76wK

-- 
Tom Baker <tbaker@tbaker.de>
Received on Saturday, 4 December 2010 15:24:11 UTC