- From: Aldo Gangemi <aldo.gangemi@istc.cnr.it>
- Date: Tue, 2 May 2006 17:36:00 +0200
- To: Pat Hayes <phayes@ihmc.us>, "Booth, David (HP Software - Boston)" <dbooth@hp.com>
- Cc: "Dan Connolly" <connolly@w3.org>, <public-swbp-wg@w3.org>, presutti@cs.unibo.it
Hi Pat, David, Dan, I've processed this thread only yesterday, and I find it very entertaining, we're talking of substantial stuff here ... In my opinion, the discussion would be easier if we could negotiate our meaning by using ontologies, which are not only an infrastructure for the Semantic Web :) The key notions here are: - resource - information resource - represents - abstraction As far as I understand, the point by David and Frank (and TAG) is that "information resources" are not data, while "representations" are. Information resources are some kind of things that are "represented" by a representation, which is called to be an "abstraction". I agree with Pat on the two basic aspects: - we are talking mainly of relations between entities; - some terminological choices could lead to confusion, although I am not a fan of terminological disputes. Re: the first aspect, the most relevant aspect, we need to specify the kind of relations holding at least between: a) a bunch of data available somehow on the web b) an information entity that is called an "abstraction" of those data c) the abstract symbols used for data d) possible entities that are referred (implicitly or explicitly) in the information entity e) a URI (or IRI) f) a resolution method With Valentina Presutti, we have written a paper on a design pattern for describing web resources, to be presented at IRW: http://www.ibiblio.org/hhalpin/irw2006/vpresutti.pdf The pattern is a specialization of a more general ontology of Information Objects (http://www.loa-cnr.it/ontologies/InformationObjects.owl), with automatic imports from other reused ontologies) that has been used to create annotation systems for multimedia content, an ontology of cultural heritage, an ontology of gesture, etc. According to the pattern, we may be able to distinguish precisely a) through f). The pattern consists of classes and properties for: 1) _entities_ whatsoever, including physical and social objects and events, information objects, data, and abstracts 2) _resources_, intended as computational objects, and in particular as data for which a resolution method exists (unfortunately this terminological choice we have made clashes with the "abstract" sense of TAG, and we'll change it; as I said, I'm not a fan of harsh terminological disputes) 3) (abstract) _web locations_, intended as abstract regions in a space 4) _abstract symbols_ used in data 5) _URIs_, intended as data used as identifiers for web locations 6) _resolution methods_, intended as the specification of procedures to access data on the web 7) _information objects_, intended as social entities that are created by agents and have a lifecycle; information objects can have a fixed state or can be defined as the closure of multiple related information objects across e.g. a versioning history 8) the _realization_ of information objects by means of data; in case of a document on the web, intended as a fixed state of data, the document is a resource (as data, in our sense) that realizes at least one information object: text, image, etc. 9) the use of a resource as a _proxy for_ an entity; e.g. a text document on the web, besides realizing a text, can be "about" something, e.g. lions life; moreover, more rigid documents, like rdf files, owl files, etc., are also resources realizing e.g. owl axioms as information objects created by some agent for some purpose (and encoded by using abstract symbols allowed by a logical language). An owl file then results to be similar to a html document, since it realizes an information object, and can be a proxy for the entity that is referred by e.g. a class or an individual. Similarly then for an rdf file encoding the WordNet database. There is more in the paper, but those distinctions can be used to make sense of the discussion in this thread, if I understand correctly your points: - TAG's "represents" maps to "realizes" in the pattern - TAG's "representation" maps to "resource" (data, computational object) in the pattern - TAG's "resource" maps to "entity" in the pattern - TAG's "information resource" maps to "information object" in the pattern - TAG's "abstraction" can map either to "information object", or to "abstract symbols" in the pattern, depending on context - TAG's mechanisms for resolution can map to "resolution method" in the pattern - Pat's "token-type" relation maps to "realizes" in the pattern - Pat's (truly?) "represents" relation maps to "about", and, more specifically, to "proxy-for" in the pattern The pattern does not exclude the possibility of describing resolution methods. Moreover, only data result to be dependent on the method, and only when they are classified as "resources" (in the pattern) or "representations" (in TAG). As a matter of fact, a file is not dependent on a resolution method, but *as web data*, it is. All other entities are not of course dependent on a resolution method, therefore I don't see any point in leaving WordNet users with the indeterminacy of resolution: a word, sense, or synset should resolve to their position in a rdf file, possibly visualized by a Semantic Web browser that shows the related information, e.g. glosses, additional links to related resources like a wiktionary, etc. This is just an initial contribution, and is *not* intended as a terminological proposal. On the contrary, I'd like to suggest a way to formalize the conceptual dependencies among those notions. A reusable ontology like that of information objects can be a good starting point to do that, because it contains an advanced axiomatization of the notions with reference to other notions as well, which create a rich descriptive context. Best Aldo At 13:54 -0500 1-05-2006, Pat Hayes wrote: > >> > . . . The definition of "Information Resource" that W3C >>> endorses[10] is: >>> . . . >>> >>http://www.w3.org/TR/2004/REC-webarch-20041215/#def-information-resource >>> >>> I don't think that means that words are not information resources. >> >>I think it may depend on what you mean by "words". >> >>If http://example.org/doc.html identifies a single resource, and the >>associated document is updated to correct typos, then clearly >>http://example.org/doc.html is identifying more than just the words that >>are *currently* served from that URI: it is identifying a document >>*abstraction*, rather than a particular document instance or a >>particular set of words. I don't see how "all of [the] essential >>characteristics"[10] of that document *abstraction* can be "conveyed in >>a message"[10]. >> >>Similarly, if http://weather.example.com/oaxaca identifies a single >>resource that is "a periodically updated report on the weather in >>Oaxaca"[10], then I don't see how "all of [the] essential >>characteristics"[10] of that periodically updated report can be >>"conveyed in a message"[10]. >> >>Because "information resources" can return different "representations" >>at different times (even if some happen to return the same >>representation every time), it seems to me that "information resources" >>are by their very nature abstract. > >Why do you say they are abstract? I think I see what you mean (and I >think I agree), but 'abstract' seems like entirely the wrong word to >use to characterize it. > >>Clearly the notion of an "information resource" is modeled after the >>real life notion of the contents of a (logical) disk region, on a Web >>server, that is associated with a URI "racine". (The "racine" is all of >>the URI except the fragment identifier.[11]) The server is configured >>to return those contents, whatever they are, when the URI racine is >>dereferenced. And those contents may change over time! Thus, the URI >>racine is not identifying any *particular* contents, it is identifying >>the logical *location* where those contents are stored, and the server >>provides whatever contents happen to be stored there at the moment they >>are requested. > >OK, great. That all makes wonderful sense. What does not make nearly >so much sense, however, is to go on to say that the the contents >that happen to be stored there are a "representation" of the logical >location. > >>In fact, it is not even possible on the Web to create a URI that is >>permanently bound to a single document instance that can never change: >>it is *always* possible to change the server configuration or domain IP >>mapping to cause a different document instance to be served. In other >>words, an http URI on the real Web identifies a logical *location* whose >>content *always* has the potential of changing. Similarly (I argue), an >>"information resource" is *necessarily* abstract. Thus, if something is >>not abstract, then it cannot be an "information resource". >> >>So returning to your comment about whether a word could be an >>"information resource", it depends on what you mean by "word". If an >>alternate spelling of "color" is "colour", then we are referring to an >>abstract notion of a word, whose spelling may vary. > >But that sense of 'abstract' is not the one you have been using, >right? Nothing here about time, for example. > >Pat > > >-- >--------------------------------------------------------------------- >IHMC (850)434 8903 or (650)494 3973 home >40 South Alcaniz St. (850)202 4416 office >Pensacola (850)202 4440 fax >FL 32502 (850)291 0667 cell >phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes -- Aldo Gangemi Research Scientist Laboratory for Applied Ontology Institute for Cognitive Sciences and Technology National Research Council (ISTC-CNR) Via Nomentana 56, 00161, Roma, Italy Tel: +390644161535 Fax: +390644161513 aldo.gangemi@istc.cnr.it http://www.istc.cnr.it/createhtml.php?nbr=71
Received on Tuesday, 2 May 2006 15:36:08 UTC