RE: SemWeb terminology page

I still see there being an "identifier" for the entity (a URI or URL or id number unique to a particular system) apart from text strings meant for displays to humans. - bt

-----Original Message-----
From: Karen Coyle [mailto:kcoyle@kcoyle.net] 
Sent: Sunday, December 05, 2010 11:14 AM
To: Thomas Baker
Cc: Tillett, Barbara; public-lld
Subject: Re: SemWeb terminology page

Quoting Thomas Baker <tbaker@tbaker.de>:

> On Sat, Dec 04, 2010 at 06:48:09PM -0500, Barbara Tillett wrote:
SO the idea
>> of the linked clusters of authority records evolved for VIAF, where 
>> all names used for a person, corporate body/conference, uniform title 
>> - that is all the text strings, plus the set of other attributes 
>> associated with each of those entities, would together represent that 
>> entity (be the surrogate) and we could display the context 
>> appropriate form to an end user based on their preference/profile/ 
>> etc.

>
> I do not see the use of identifying elements, such as text strings, 
> which together represent an entity, discussed in the Use Case for VIAF 
> [1].  This reinforces my sense that there is a gap in our use-case 
> coverage on this issue.

Barbara and Tom, are you saying that the text strings taken as an aggregation are the *identifier* for the entity? If so, I'm not sure how that would work in practice. VIAF as structured assigns a VIAF identifier that I thought was used to identify the entity. If I have mis-understood and the text strings are to be considered a surrogate, then I wonder what functions that surrogate plays in the use of VIAF in applications.

The other option is that each text string is a 'surrogate' or label for the entity in the context in which it is used.

kc

>
> Tom
>
> [1]
> http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Virtual_Internation
> al_Authority_File_%28VIAF%29
>
>
>> ________________________________________
>> From: public-lld-request@w3.org [public-lld-request@w3.org] On Behalf 
>> Of Th omas Baker [tbaker@tbaker.de]
>> Sent: Saturday, December 04, 2010 10:23 AM
>> To: Karen Coyle
>> Cc: public-lld
>> Subject: Re: SemWeb terminology page
>>
>> Karen,
>>
>> On Fri, Dec 03, 2010 at 03:15:23PM -0800, Karen Coyle wrote:
>> > In her book "The intellectual foundation of information organization"
>> > Svenonius has a section on controlled and uncontrolled vocabularies.
>> > Her statement about controlled vocabularies says:
>> >
>> > "[Controlled vocabularies] are constructs in an artificial 
>> > language; their purpose is to map users' vocabulary to a 
>> > standardized vocabulary and to bring like information together." 
>> > (p.88) [1]
>> >
>> > Do we agree that this is the role of our #1 group? I ask because I 
>> > perceive this to be different from the original proposed definition:
>> >
>> > "These describe concepts that are used in actual metadata."
>> >
>> > If you look at FRAD [2] you see that the assignment of terminology 
>> > to the concept is of equal or greater importance than any 
>> > description of the concept itself. In fact, that's what I would 
>> > emphasize as the role of a controlled vocabulary: that it is a 
>> > method to *control* *language terms*. Many controlled vocabularies 
>> > have minimal information about the concepts, but all exist to make 
>> > a selection of particular terms of use.
>>
>> This introduces an interesting angle!
>>
>> My first thought was along the lines of Antoine's: Linked Data is 
>> about using URIs when possible, and since this group is specifically 
>> about Linked Data, we should explain that values are not just string 
>> literals.
>>
>> But to reuse the metaphor I suggested in [1], URIs are also the 
>> "words" of RDF's "language of data".  If that is so, then I would 
>> argue that the goal "MAP" [2], which is essentially about mapping 
>> URIs, is analogous to the mapping of natural-language words (string 
>> literals) that Svenonius has in mind.
>>
>> In natural language, people coin different words, or variants on the 
>> same word, to talk about the same thing, and "controlled 
>> vocabularies" as described above are for mapping those diverse words 
>> to an artificial set of "controlled" words.
>>
>> In the Linked Data context, people are coining URIs for the things 
>> they need to talk about, and the "MAP" goal is about creating links 
>> among those URI-words.  The only thing missing from the "MAP" goal, 
>> as defined, is the notion of mapping to one particular 
>> "authoritative" URI.
>>
>> I would argue that since we are viewing these things from a linked 
>> data perspective, we should maintain the emphasis on URIs.  However, 
>> it does make me wonder whether there are potential uses of linked 
>> data in leveraging literal values that are not addressed in our LLD 
>> use cases.  Two possibilities:
>>
>> -- Google Squared uses EAV (entity-attribute-value) "triples"
>>    in their internal index -- triples composed not of URIs but
>>    of strings extracted from Web searches.  That's all I know
>>    about it, but to me it suggests interesting possibilities
>>    for getting from the analysis of unstructured text data
>>    to URIs with triples.
>>
>> -- The other notion is that Linked Data could be used to pull
>>    together a set of (natural-language) words and (string literal)
>>    names -- a constellation of information which, taken together,
>>    could be used to infer more information about the things
>>    described, in support of the sort of disambiguation that
>>    librarians engage in when they use birth and death dates,
>>    occupations, and locations to disambiguate between people
>>    with the same name.
>>
>> Tom
>>
>> [1] http://lists.w3.org/Archives/Public/public-lld/2010Oct/0088.html
>> [2] http://www.w3.org/2005/Incubator/lld/wiki/Goals
>> [3] http://bit.ly/hN76wK
>>
>> --
>> Tom Baker <tbaker@tbaker.de>
>>
>>
>
> --
> Tom Baker <tbaker@tbaker.de>
>



--
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet

Received on Monday, 6 December 2010 12:52:56 UTC