Re: SemWeb terminology page from Karen Coyle on 2010-12-05 (public-lld@w3.org from December 2010)

From: Karen Coyle <kcoyle@kcoyle.net>
Date: Sun, 05 Dec 2010 08:13:40 -0800
To: Thomas Baker <tbaker@tbaker.de>
Cc: "Tillett, Barbara" <btil@loc.gov>, public-lld <public-lld@w3.org>
Message-ID: <20101205081340.18274qxbwm2mvl10@kcoyle.net>
Quoting Thomas Baker <tbaker@tbaker.de>:

> On Sat, Dec 04, 2010 at 06:48:09PM -0500, Barbara Tillett wrote:
SO the idea
>> of the linked clusters of authority records evolved for VIAF,
>> where all names used for a person, corporate body/conference,
>> uniform title - that is all the text strings, plus the set
>> of other attributes associated with each of those entities,
>> would together represent that entity (be the surrogate) and
>> we could display the context appropriate form to an end user
>> based on their preference/profile/ etc.

>
> I do not see the use of identifying elements, such as text
> strings, which together represent an entity, discussed in the
> Use Case for VIAF [1].  This reinforces my sense that there
> is a gap in our use-case coverage on this issue.

Barbara and Tom, are you saying that the text strings taken as an  
aggregation are the *identifier* for the entity? If so, I'm not sure  
how that would work in practice. VIAF as structured assigns a VIAF  
identifier that I thought was used to identify the entity. If I have  
mis-understood and the text strings are to be considered a surrogate,  
then I wonder what functions that surrogate plays in the use of VIAF  
in applications.

The other option is that each text string is a 'surrogate' or label  
for the entity in the context in which it is used.

kc

>
> Tom
>
> [1]  
> http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Virtual_International_Authority_File_%28VIAF%29
>
>
>> ________________________________________
>> From: public-lld-request@w3.org [public-lld-request@w3.org] On Behalf Of Th
>> omas Baker [tbaker@tbaker.de]
>> Sent: Saturday, December 04, 2010 10:23 AM
>> To: Karen Coyle
>> Cc: public-lld
>> Subject: Re: SemWeb terminology page
>>
>> Karen,
>>
>> On Fri, Dec 03, 2010 at 03:15:23PM -0800, Karen Coyle wrote:
>> > In her book "The intellectual foundation of information organization"
>> > Svenonius has a section on controlled and uncontrolled vocabularies.
>> > Her statement about controlled vocabularies says:
>> >
>> > "[Controlled vocabularies] are constructs in an artificial language;
>> > their purpose is to map users' vocabulary to a standardized vocabulary
>> > and to bring like information together." (p.88) [1]
>> >
>> > Do we agree that this is the role of our #1 group? I ask because I
>> > perceive this to be different from the original proposed definition:
>> >
>> > "These describe concepts that are used in actual metadata."
>> >
>> > If you look at FRAD [2] you see that the assignment of terminology to
>> > the concept is of equal or greater importance than any description of
>> > the concept itself. In fact, that's what I would emphasize as the role
>> > of a controlled vocabulary: that it is a method to *control* *language
>> > terms*. Many controlled vocabularies have minimal information about
>> > the concepts, but all exist to make a selection of particular terms of
>> > use.
>>
>> This introduces an interesting angle!
>>
>> My first thought was along the lines of Antoine's: Linked
>> Data is about using URIs when possible, and since this group
>> is specifically about Linked Data, we should explain that
>> values are not just string literals.
>>
>> But to reuse the metaphor I suggested in [1], URIs are also the
>> "words" of RDF's "language of data".  If that is so, then I
>> would argue that the goal "MAP" [2], which is essentially about
>> mapping URIs, is analogous to the mapping of natural-language
>> words (string literals) that Svenonius has in mind.
>>
>> In natural language, people coin different words, or
>> variants on the same word, to talk about the same thing, and
>> "controlled vocabularies" as described above are for mapping
>> those diverse words to an artificial set of "controlled" words.
>>
>> In the Linked Data context, people are coining URIs for the
>> things they need to talk about, and the "MAP" goal is about
>> creating links among those URI-words.  The only thing missing
>> from the "MAP" goal, as defined, is the notion of mapping to
>> one particular "authoritative" URI.
>>
>> I would argue that since we are viewing these things from
>> a linked data perspective, we should maintain the emphasis
>> on URIs.  However, it does make me wonder whether there are
>> potential uses of linked data in leveraging literal values
>> that are not addressed in our LLD use cases.  Two possibilities:
>>
>> -- Google Squared uses EAV (entity-attribute-value) "triples"
>>    in their internal index -- triples composed not of URIs but
>>    of strings extracted from Web searches.  That's all I know
>>    about it, but to me it suggests interesting possibilities
>>    for getting from the analysis of unstructured text data
>>    to URIs with triples.
>>
>> -- The other notion is that Linked Data could be used to pull
>>    together a set of (natural-language) words and (string literal)
>>    names -- a constellation of information which, taken together,
>>    could be used to infer more information about the things
>>    described, in support of the sort of disambiguation that
>>    librarians engage in when they use birth and death dates,
>>    occupations, and locations to disambiguate between people
>>    with the same name.
>>
>> Tom
>>
>> [1] http://lists.w3.org/Archives/Public/public-lld/2010Oct/0088.html
>> [2] http://www.w3.org/2005/Incubator/lld/wiki/Goals
>> [3] http://bit.ly/hN76wK
>>
>> --
>> Tom Baker <tbaker@tbaker.de>
>>
>>
>
> --
> Tom Baker <tbaker@tbaker.de>
>



-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
Received on Sunday, 5 December 2010 16:14:16 UTC