An ontology of resources and realization [was: RE: on documents and terms [was: RE: [WNET] new proposal WN URIs and related issues]]

Hi Pat, David, Dan,

I've processed this thread only yesterday, and I find it very 
entertaining, we're talking of substantial stuff here ...

In my opinion, the discussion would be easier if we could negotiate 
our meaning by using ontologies, which are not only an infrastructure 
for the Semantic Web :)

The key notions here are:

- resource
- information resource
- represents
- abstraction

As far as I understand, the point by David and Frank (and TAG) is 
that "information resources" are not data, while "representations" 
are. Information resources are some kind of things that are 
"represented" by a representation, which is called to be an 
"abstraction".
I agree with Pat on the two basic aspects:

- we are talking mainly of relations between entities;
- some terminological choices could lead to confusion, although I am 
not a fan of terminological disputes.

Re: the first aspect, the most relevant aspect, we need to specify 
the kind of relations holding at least between:

a) a bunch of data available somehow on the web
b) an information entity that is called an "abstraction" of those data
c) the abstract symbols used for data
d) possible entities that are referred (implicitly or explicitly) in 
the information entity
e) a URI (or IRI)
f) a resolution method

With Valentina Presutti, we have written a paper on a design pattern 
for describing web resources, to be presented at IRW:

http://www.ibiblio.org/hhalpin/irw2006/vpresutti.pdf

The pattern is a specialization of a more general ontology of 
Information Objects 
(http://www.loa-cnr.it/ontologies/InformationObjects.owl), with 
automatic imports from other reused ontologies) that has been used to 
create annotation systems for multimedia content, an ontology of 
cultural heritage, an ontology of gesture, etc.

According to the pattern, we may be able to distinguish precisely a) 
through f). The pattern consists of classes and properties for:

1) _entities_ whatsoever, including physical and social objects and 
events, information objects, data, and abstracts
2) _resources_, intended as computational objects, and in particular 
as data for which a resolution method exists (unfortunately this 
terminological choice we have made clashes with the "abstract" sense 
of TAG, and we'll change it; as I said, I'm not a fan of harsh 
terminological disputes)
3) (abstract) _web locations_, intended as abstract regions in a space
4) _abstract symbols_ used in data
5) _URIs_, intended as data used as identifiers for web locations
6) _resolution methods_, intended as the specification of procedures 
to access data on the web
7) _information objects_, intended as social entities that are 
created by agents and have a lifecycle; information objects can have 
a fixed state or can be defined as the closure of multiple related 
information objects across e.g. a versioning history
8) the _realization_ of information objects by means of data; in case 
of a document on the web, intended as a fixed state of data, the 
document is a resource (as data, in our sense) that realizes at least 
one information object: text, image, etc.
9) the use of a resource as a _proxy for_ an entity; e.g. a text 
document on the web, besides realizing a text, can be "about" 
something, e.g. lions life; moreover, more rigid documents, like rdf 
files, owl files, etc., are also resources realizing e.g. owl axioms 
as information objects created by some agent for some purpose (and 
encoded by using abstract symbols allowed by a logical language). An 
owl file then results to be similar to a html document, since it 
realizes an information object, and can be a proxy for the entity 
that is referred by e.g. a class or an individual. Similarly then for 
an rdf file encoding the WordNet database.

There is more in the paper, but those distinctions can be used to 
make sense of the discussion in this thread, if I understand 
correctly your points:

- TAG's "represents" maps to "realizes" in the pattern
- TAG's "representation" maps to "resource" (data, computational 
object) in the pattern
- TAG's "resource" maps to "entity" in the pattern
- TAG's "information resource" maps to "information object" in the pattern
- TAG's "abstraction" can map either to "information object", or to 
"abstract symbols" in the pattern, depending on context
- TAG's mechanisms for resolution can map to "resolution method" in the pattern
- Pat's "token-type" relation maps to "realizes" in the pattern
- Pat's (truly?) "represents" relation maps to "about", and, more 
specifically, to "proxy-for" in the pattern

The pattern does not exclude the possibility of describing resolution 
methods. Moreover, only data result to be dependent on the method, 
and only when they are classified as "resources" (in the pattern) or 
"representations" (in TAG).
As a matter of fact, a file is not dependent on a resolution method, 
but *as web data*, it is.
All other entities are not of course dependent on a resolution 
method, therefore I don't see any point in leaving WordNet users with 
the indeterminacy of resolution: a word, sense, or synset should 
resolve to their position in a rdf file, possibly visualized by a 
Semantic Web browser that shows the related information, e.g. 
glosses, additional links to related resources like a wiktionary, etc.

This is just an initial contribution, and is *not* intended as a 
terminological proposal. On the contrary, I'd like to suggest a way 
to formalize the conceptual dependencies among those notions. A 
reusable ontology like that of information objects can be a good 
starting point to do that, because it contains an advanced 
axiomatization of the notions with reference to other notions as 
well, which create a rich descriptive context.

Best
Aldo

At 13:54 -0500 1-05-2006, Pat Hayes wrote:
>
>>  > . . . The definition of "Information Resource" that W3C
>>>  endorses[10] is:
>>>  . . .
>>>
>>http://www.w3.org/TR/2004/REC-webarch-20041215/#def-information-resource
>>>
>>>  I don't think that means that words are not information resources.
>>
>>I think it may depend on what you mean by "words". 
>>
>>If http://example.org/doc.html identifies a single resource, and the
>>associated document is updated to correct typos, then clearly
>>http://example.org/doc.html is identifying more than just the words that
>>are *currently* served from that URI: it is identifying a document
>>*abstraction*, rather than a particular document instance or a
>>particular set of words.  I don't see how "all of [the] essential
>>characteristics"[10] of that document *abstraction* can be "conveyed in
>>a message"[10].
>>
>>Similarly, if http://weather.example.com/oaxaca identifies a single
>>resource that is "a periodically updated report on the weather in
>>Oaxaca"[10], then I don't see how "all of [the] essential
>>characteristics"[10] of that periodically updated report can be
>>"conveyed in a message"[10].
>>
>>Because "information resources" can return different "representations"
>>at different times (even if some happen to return the same
>>representation every time), it seems to me that "information resources"
>>are by their very nature abstract.
>
>Why do you say they are abstract? I think I see what you mean (and I 
>think I agree), but 'abstract' seems like entirely the wrong word to 
>use to characterize it.
>
>>Clearly the notion of an "information resource" is modeled after the
>>real life notion of the contents of a (logical) disk region, on a Web
>>server, that is associated with a URI "racine".  (The "racine" is all of
>>the URI except the fragment identifier.[11])  The server is configured
>>to return those contents, whatever they are, when the URI racine is
>>dereferenced.  And those contents may change over time!  Thus, the URI
>>racine is not identifying any *particular* contents, it is identifying
>>the logical *location* where those contents are stored, and the server
>>provides whatever contents happen to be stored there at the moment they
>>are requested.
>
>OK, great. That all makes wonderful sense. What does not make nearly 
>so much sense, however, is to go on to say that the the contents 
>that happen to be stored there are a "representation" of the logical 
>location.
>
>>In fact, it is not even possible on the Web to create a URI that is
>>permanently bound to a single document instance that can never change:
>>it is *always* possible to change the server configuration or domain IP
>>mapping to cause a different document instance to be served.  In other
>>words, an http URI on the real Web identifies a logical *location* whose
>>content *always* has the potential of changing.  Similarly (I argue), an
>>"information resource" is *necessarily* abstract.  Thus, if something is
>>not abstract, then it cannot be an "information resource".
>>
>>So returning to your comment about whether a word could be an
>>"information resource", it depends on what you mean by "word".  If an
>>alternate spelling of "color" is "colour", then we are referring to an
>>abstract notion of a word, whose spelling may vary.
>
>But that sense of 'abstract' is not the one you have been using, 
>right? Nothing here about time, for example.
>
>Pat
>
>
>--
>---------------------------------------------------------------------
>IHMC		(850)434 8903 or (650)494 3973   home
>40 South Alcaniz St.	(850)202 4416   office
>Pensacola			(850)202 4440   fax
>FL 32502			(850)291 0667    cell
>phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes


-- 



Aldo Gangemi
Research Scientist
Laboratory for Applied Ontology
Institute for Cognitive Sciences and Technology
National Research Council (ISTC-CNR)
Via Nomentana 56, 00161, Roma, Italy
Tel: +390644161535
Fax: +390644161513
aldo.gangemi@istc.cnr.it
http://www.istc.cnr.it/createhtml.php?nbr=71

Received on Tuesday, 2 May 2006 15:36:08 UTC