Re: owl:sameAs use/misuse/abuse Re: homonym URIs from Ioachim Drugus on 2007-07-01 (semantic-web@w3.org from July 2007)

From: Ioachim Drugus <sw@semanticsoft.net>
Date: Sun, 01 Jul 2007 15:48:05 -0700
To: Renato Golin <renato@ebi.ac.uk>
CC: John Black <JohnBlack@kashori.com>, Tim Berners-Lee <timbl@w3.org>, Richard Cyganiak <richard@cyganiak.de>, Jacek Kopecky <jacek.kopecky@deri.org>, Bernard Vatant <bernard.vatant@mondeca.com>, semantic-web@w3.org
Message-ID: <46882F25.7050200@semanticsoft.net>
In my "personal thesaurus",  *Entity*, *Object*, *Resource*, 
*Information Resource* are distinct terms arranged along a path of 
strictly decreasing generality from left to right.
 From the discussions in this list I noticed that people interpret these 
terms different ways. Nor the standards seem to give a complete 
clarification, which is natural - standards must leave as much freedom 
as possible to (1) developers of the conceptuality and (2) implementors 
of conceptuality into software.

Probably, a correct approach would be to expose my understanding of 
these terms and check on how much it is in sync with general 
understanding. Each person is biased by his background and views. I am 
biased by my background in mathematical logics (Marcov's school, Moscow 
University) and a piece of conceptuality which I informally expose below 
between ****.

*************
A *representation* *represents* something which I call *source* of 
representation  *as* something else,  which I call  *target* or *result* 
of representation. I say both sources and targets of representations to 
be  *presentations*. So, I distinguish between two terms:  
*re-presentation* and *presentation*.  I formalize this view by using 
"category theory" in mathematics, considering re-presentations as 
morphisms and presentations as objects of the category. Presentations 
and re-presentations are ubiquitous in real life. But you do not know if 
something is a representation of something else, and therefore, 
usually,  it is hard to select the correct word  - *representation* or 
*presentation*. Also, the two words differ only by the prefix re-, which 
creates difficulties in reading. Therefore, instead of *presentation*, I 
often use  *phenomenon*. According my view, Category Theory in 
mathematics is the formalism for the *phenomena and their representations*.
I regard
- Cognition as an iterative representation activity
- Knowledge as result of this activity - presentations in mind
This manner, I reduce cognition and knowledge to something with which 
mathematicians have lots of practice - representations. This  might help 
formalize the very complex phenomena of cognition and knowledge.

Suppose I read "XML *presentation* of...".  The authors are involved in 
iterative representation activity which is a process of cognition - they 
write and re-write documents and publish certain versions. To me this is 
*presentation* because the source of re-presentation is hidden. But the 
authors wrote it for me, therefore, they used the word which is correct 
according my view :-)

Why is it so important to distinguish between the two terms? I believe, 
the lack of discrimination between them creates many problems in 
interpretation which can result wrong implementation in software. What 
is worse, this lack of discrimination is due to one term *missing*.  The 
problems are of the following type - if something A can be represented 
as something B, then they sometimes implicitly and unconsciously assume 
that "A is B", despite that "ontologically" A and B are distinct.  The 
introduction of the notion of "presentation" as separate from 
"representation"  helps to avoid this problem.

This language also helps me talk about many things in generic terms. 
Say, a necessary component for an "intelligent agent" (of which there 
should be plenty in semantic web), must be a *presentation system* which 
plays the role of "mind as storage" in humans. A presentation system has 
a *presentation media* which put constraints onto presentations.

To be able to make reference to this conceptuality, for lack of a good 
or final name, I would call it MyPhenomenology. The name is not that bad 
as it looks, because the "My" component is "indexical" and can apply to 
any person who shares this view.

****************
Now I am proceeding to discrimination between notions above.

1. *Entity*

I call *entity* anything which can potentially exist (including the 
legendary bird Phoenix). The word *entity* comes from a Latin word, 
which in English would sound like "existent" with plural "existents" - 
sure, English does not have such a word. "Entity" has same meaning in 
Latin as the word "ontos" in  Greek. Therefore, "entity" looks like the 
best candidate for a name for the most general notion in Ontology as a 
discipline. I say that an entity is a "piece of existence". The word 
Thing has same meaning and usage as "entity" in OWL.

2. *Object*

Very often they don't discriminate between *entity* and *object*, but in 
language sometimes you feel "conceptual discomfort" when "entity" is 
used instead of "object".
I believe, that *object* is a particular kind of *entity* which *has* 
content and *is*  represented in mind (here, *has* and *is* are two 
ontological relationships of fundamental importance and I emphasized 
them). Also, I think, that by a *resource* we mean exactly an *object*. 
But, I found that a "theory of objects" must be very complex - I will 
expose below the beginnings of it in terms of MyPhenomenology.

If an entity exists "somewhere outside" and the mind has no 
representation of it ("no idea" of it), then this is an *entity*, but 
not an *object*. I call *object* only an entity which has a 
representation  in mind - a label, a mental picture,  anything which 
stands in mind for the entity. Same applies to software - if a "visual 
processor" creates an identifier for a fragment of a picture, then this 
fragment becomes an *object* for this software agent and starts making 
part of this agent's *reality*.  There is a huge number of fragments of 
a picture, an infinite number (if you admit infinity), and the human 
mind or device cannot have a label for each of them, say nothing of more 
complex presentations.

The first attempt to formalize the notion of *object* is to regard it as 
an ordered pair  (*presentation* , *content*), where  *presentation* is 
in our mind and *content* is outside (this view complies with 
"form-content" dichotomy). The ordered pair above is a "rudiment" of the 
object and we need a special name for it - I call it object's 
*manifestation*. "Manifestation" is a synonym for "phenomenon" , but it 
is more frequently used as "manifestation of", which makes it good 
candidate name for the "rudiment" *of* an object.
The *presentation* (as first member of the ordered pair *manifestation*) 
of an object plays different roles in different activities - "it gives 
an idea", it is the target in reification process, it is the source in 
identification as referencing (intentionality), and it is initial state 
in identification process (identification has many meanings)  These are 
different activities, and we cannot expect "presentation" to be called 
"presentation" in all contexts. In the domains I mentioned, I say 
*presentation* to be - *notion* or *concept*, *identity*, *identifier*, 
*presentation* or *re-presentation*, respectively.

Now, let us see what happens when one member of the ordered pair above 
vary and the other member remains the same.

A cloud changes in shape but we somehow know that it is the same cloud. 
Here the cloud is content and this content is changing. In order for the 
mind to know that cloud is the same, it must keep the presentation 
unchanged. Now, suppose the content is an elephant which is a solid 
thing and does not change - so the content does not vary. But you look 
at it under different angles and you get different presentations.  What 
property of the manifestations is responsible for sameness in both cases?

I define sameness of manifestations this way

*(A, B) sameAs (A', B') if and only if ((A=A') OR (B=B'))*

And so, two manifestations are the same iff at least one of their two, 
presentation or content, coincide. The *defining* property of an ordered 
pair in set theory differs from my defining property - set theory uses 
the conjunction AND and I am using the disjunction OR. So, in case of 
objects this is not quite an "ordered pair". I call it *reification 
pair* for reasons which will become clearer below. "Reification pair" is 
a synonym for "manifestation", but we need yet another term to disclose 
the type of structure a manifestation is.

The relationship of sameness as defined above is reflexive, symmetric 
and transitive - it is a relationship of equivalence. Therefore, it 
induces a partition over the set of object manifestations as ordered 
pairs. I say each set of this partition to be an  *identification 
class*. For an agent to be able to keep track of "sameness", it has to 
to select exactly one reification pair (i, c) within each identification 
class against which to check the others on sameness. I say the 
presentation *i* of the selected reification pair to be *identity* of an 
object. I think the process of identification is like this - given a 
presentation, the agent looks for the identification class where resides 
this presentation and produces the identity residing in this class. This 
completes the formalisation of the notion of object like this - we call 
an object a reification pair *(ID, content)*, where ID is an identity 
and *content* is an open world (I regard closed world as partial case of 
open world). One last thing to notice is that presentations in object 
manifestations should be better regarded as *re-presentations* when the 
content exists, and presentations when the content is yet to be found or 
created.

Now, the content of an object might be empty (void). This does not mean 
that such an object "does not have content". Same as in set theory they 
gave an identity to the emptiness by introducing the notion of empty 
set,  I regard empty content as void content, which *is* content. 
Therefore, even for agnostics who doubt existence of things outside 
their mind, the objects have content - void content. In terms of 
"intentionality" the association of the void content with a presentation 
in mind signifies "intendedness" of the presentation. For example, in 
case of Phoenix, even if we know  that such a bird does not exist, we 
admit that there is something which we "intend" by Phoenix - this is 
because we have associated  void content to an identity called by its 
name. We can later discover that such a bird exists - then the content 
changes, but the identity remains the same.

The etimology and morphology of "rei-fication"  suggest the meaning of 
"creating things". I treat "things" as objects (not entities) and 
creation of things as an activity of running through various  
re-presentations, until the agent chooses one which is invariant enough 
while content is changing, and assigning it the role of *identity* for 
an object. Therefore, I regard "reification" as a process of creating 
object *identities*.

I regard the relationship of *intentionality* ("referencing") as 
*inverse* to the relationship of *reification*. Due to the defining 
property  of "reification pair", all the manifestations within one 
*identification class* are manifestations of the same object. Therefore, 
the first members of reification pairs within one identification class 
are *identifiers* of the same object. The blind people might get 
incomplete ideas of an elephant, but even their representations are 
identifiers of the elephant. And so, in the activity of referencing the 
presentations play the role of "identifiers".

 3. *Resource*

What is a *resource* in  the  web Architecture? I treat it as an 
*object* with its specific *presentations* called URI references and the 
*content* - any piece of "content of the Universe" (which can be a piece 
of software, a presentation in mind,  or a physical body).  I believe, 
this treatment of resource as an *object* can help better understand the 
notion of *resource*

Web is an agent, the presentation system of which has two main subsystems
-  Web pages presentation system as an interface with human agents
-  A data and information presentation system as an interface with 
applications
  Probably the second system needs some "semantic servers" and "semantic 
browsers"

The name *Universal* Reource Identifiers is a prescription to all  
agents to reify by using this schemata. In order for the agents to be 
able to do this, each needs a central authority for their domain,  which 
would maintain a uniform "scheme" within URI schemata. But there is a 
huge number of domains and they all need authorities. And these 
authorities need guidelines. Also, I am sure that not only people will 
have to be involved in reification, but also software.

Now in order for the URI to be really universal, I believe a standard on 
reification is needed. Currently, there is only one standard on 
reification which shows how to reify only RDF "reality" - the statements.

I did not share my understanding on *Information Resource*, because it 
is already time to call this a message.

Ioachim
(In my first presentation of myself, I used both the short "Joe" and the 
original long "Ioachim". But this created an "identity crisis" - I am 
now called different names. So, now I selected  the original "Ioachim" 
as "identity" for this object here :-)



Renato Golin wrote:
> Hi Ioachim,
>
> Ioachim Drugus wrote:
>   
>> I think, content-type is the type that the *author* of the content
>> *intended*  the content to be. Content-type helps the interpreter
>> (interpreting agent) to select the right approach to interpretation, but
>> does not  guarantee that it will interpret the content as it was
>> intended by the author.
>>     
>
> Exactly, it's only the author's intention, nothing more.
>
>
>   
>> Availability of content-type is necessary but
>> not sufficient for a piece of data to become information.
>> What I wrote previously refers only to discrimination between data and
>> information, but it does not explain how things go further.
>>     
>
> I wouldn't say not even necessary, but optional. You definitely don't
> need content-type to know an HTML when you look at it. Programs aren't
> that different, just a bit dumber.
>
> Of course it's *much* simpler to have context type, even for us. ;)
>
>
>   
>> Now, since the interpreter is confined by the knowledge {content,
>> content-type}, the only other thing which is given to start the
>> interpretation process is *context*.
>>     
>
> As content-type is a kind of context this is a bit redundant.
>
> Data + context = Information
>
> SYN-SUM(Information) = Knowledge
>
> ie. all contexts (known) about the same data, in synergy, so:
>
> SYN-SUM[N](Information) != SYN-SUM[N-1](Information) + Information N
>
> Of course things can get much more complicated, data can be a subset of
> other data in a different context and things like that but that's
> further than the discussion about the same data's contexts.
>
>
>
>   
>> There is yet another aspect - the difference between *information* and
>> *information resource* on which I which I will not write here  to keep
>> to the point of this discussion - discrimination between data and
>> information. This difference is clearly stated in how Tim defined the
>> information resource, but I think,  after I work here a little, I will
>> come back with a " formalized" manner to put it, which might also help.
>>     
>
> Yes, good thread going on about it, I couldn't help much with that,
> though... ;)
>
> cheers,
> --renato
>
>
Received on Sunday, 1 July 2007 22:48:14 UTC