Re: owl:sameAs use/misuse/abuse Re: homonym URIs from Ioachim Drugus on 2007-07-05 (semantic-web@w3.org from July 2007)

From: Ioachim Drugus <sw@semanticsoft.net>
Date: Wed, 04 Jul 2007 20:00:05 -0700
To: Renato Golin <renato@ebi.ac.uk>
CC: John Black <JohnBlack@kashori.com>, Tim Berners-Lee <timbl@w3.org>, Richard Cyganiak <richard@cyganiak.de>, Jacek Kopecky <jacek.kopecky@deri.org>, Bernard Vatant <bernard.vatant@mondeca.com>, semantic-web@w3.org
Message-ID: <468C5EB5.8050703@semanticsoft.net>
Coming back to the notion of s *information resource*, probably  it is 
good to analyze what Tim said:

"An Information Resource conveys information, and in the web 
architecture it can have several representations, but any one of them 
must have a content-type (and possibly other metadata) as well as a 
string of  bits"

I will use below *presentation* instead or "re-presentation" - it is 
shorter and I will use this word often.

First thing to understand is - why an information resource can have 
*several* presentations and not just one.  To be possible to express in 
semantic web the relativity theory, it looks, we cannot but accept an 
infinite number of presentations. Really, physicists say that the shape 
of an object depends on the reference system. Here is a concrete example 
which struck me yet when I was a student. Suppose a spacecraft is moving 
with the speed of light and it is emitting light (well, they usually 
emit radio signals, but it is easier to talk about light than of 
something invisible).  Now, if we say that the light emitted by the 
spacecraft is a *sphere* which expands with the speed of light, then 
this might be wrong if we are not "in the right place" - it can be a 
sphere or hemisphere, depending on whether we are in the spacecraft or 
outside. Also, physicists say that all reference systems are equivalent. 
Therefore, the two shapes and an infinite number of other ones (due to 
different speeds) are different presentations of  *the same* object and 
we cannot regard any one of them as primary.

There must be other examples of "many presentations" objects in the low 
speed world, but they can arise polemics on what a presentation is and 
what a state of the changing object is. The example from relativity 
theory is good because it does not leave place for polemics unless you 
do not accept the main theory on nature of the 20th century. So, to 
define an object, including an information resource, we have to refer to 
the multiplicity of its presentations in the definition.

To become a resource of the semantic web, the many-presentations object 
must have a URI.  Now, what does this URI identify? I believe that it 
identifies the object itself and not the presentations of the object and 
I would place the URI on same level as presentations - given a context, 
these presentations can also *identify* the object. The difference of 
URI from other presentations of the object is that URI is context-free 
in identifying, which amounts for its universal applicability. All these 
presentations including the URI make up an *identification class* and 
the URI plays the role of the *identity* for one object (true, the role 
of the object identity can be assigned to many URIs, which raises other  
issues discussed in this thread).

My understanding is that all the above applies to any resource. As to an 
"information resource" which I regard as something more specific than 
"resource", there must be something which determines that it *conveys 
information* as the definition says. It does not say "it *is* 
information". Therefore, alongside data (a string of bits), there should 
be something which an agent can use to interpret this data and obtain 
information. This is "content-type (and possibly other metadata)". Why 
not say that this is metadata, when a content-type is treated as 
metadata? Moreover, why regard the content type as mandatory and other 
metadata as optional? I see the following reasons for this.

The content of any resource is a "multitude" and each element of this 
"multitude" is is a piece of content with its own presentation, which 
might look completely different from others. The metadata associated 
with one such element of this multitude can contradict to metadata 
associated with another element. And there is no information for an 
agent to separate between different pieces of metadata related to 
different presentations.  The only way to enable an agent to 
discriminate between them is to associate a type to each such separate 
"piece of content".

Now suppose that the object has only one presentation and,  from the 
conteXt or metadata, we can infer the type of the content which comes in 
one piece. Then we can interpret the data and, thus, the information is 
"conveyed" to us. But we might have misinterpeted what this information 
resource was meant to *convey*. And if there are different possibilities 
of interpretation then there is *uncertainty*, and information is 
defined as "elimination of uncertainty". So, without a content-type, 
this cannot be an "information" resource.

But also, we "violate uniformity" if we define a one-presentation object 
differently from many-presentations object. In software, we would have 
to write a separate patch of software for an agent to behave 
specifically for this artificially created "special case".

So, I believe the definition above cannot be simplified.




Ioachim Drugus wrote:
>
> In my "personal thesaurus",  *Entity*, *Object*, *Resource*, 
> *Information Resource* are distinct terms arranged along a path of 
> strictly decreasing generality from left to right.
> From the discussions in this list I noticed that people interpret 
> these terms different ways. Nor the standards seem to give a complete 
> clarification, which is natural - standards must leave as much freedom 
> as possible to (1) developers of the conceptuality and (2) 
> implementors of conceptuality into software.
>
> Probably, a correct approach would be to expose my understanding of 
> these terms and check on how much it is in sync with general 
> understanding. Each person is biased by his background and views. I am 
> biased by my background in mathematical logics (Marcov's school, 
> Moscow University) and a piece of conceptuality which I informally 
> expose below between ****.
>
> *************
> A *representation* *represents* something which I call *source* of 
> representation  *as* something else,  which I call  *target* or 
> *result* of representation. I say both sources and targets of 
> representations to be  *presentations*. So, I distinguish between two 
> terms:  *re-presentation* and *presentation*.  I formalize this view 
> by using "category theory" in mathematics, considering 
> re-presentations as morphisms and presentations as objects of the 
> category. Presentations and re-presentations are ubiquitous in real 
> life. But you do not know if something is a representation of 
> something else, and therefore, usually,  it is hard to select the 
> correct word  - *representation* or *presentation*. Also, the two 
> words differ only by the prefix re-, which creates difficulties in 
> reading. Therefore, instead of *presentation*, I often use  
> *phenomenon*. According my view, Category Theory in mathematics is the 
> formalism for the *phenomena and their representations*.
> I regard
> - Cognition as an iterative representation activity
> - Knowledge as result of this activity - presentations in mind
> This manner, I reduce cognition and knowledge to something with which 
> mathematicians have lots of practice - representations. This  might 
> help formalize the very complex phenomena of cognition and knowledge.
>
> Suppose I read "XML *presentation* of...".  The authors are involved 
> in iterative representation activity which is a process of cognition - 
> they write and re-write documents and publish certain versions. To me 
> this is *presentation* because the source of re-presentation is 
> hidden. But the authors wrote it for me, therefore, they used the word 
> which is correct according my view :-)
>
> Why is it so important to distinguish between the two terms? I 
> believe, the lack of discrimination between them creates many problems 
> in interpretation which can result wrong implementation in software. 
> What is worse, this lack of discrimination is due to one term 
> *missing*.  The problems are of the following type - if something A 
> can be represented as something B, then they sometimes implicitly and 
> unconsciously assume that "A is B", despite that "ontologically" A and 
> B are distinct.  The introduction of the notion of "presentation" as 
> separate from "representation"  helps to avoid this problem.
>
> This language also helps me talk about many things in generic terms. 
> Say, a necessary component for an "intelligent agent" (of which there 
> should be plenty in semantic web), must be a *presentation system* 
> which plays the role of "mind as storage" in humans. A presentation 
> system has a *presentation media* which put constraints onto 
> presentations.
>
> To be able to make reference to this conceptuality, for lack of a good 
> or final name, I would call it MyPhenomenology. The name is not that 
> bad as it looks, because the "My" component is "indexical" and can 
> apply to any person who shares this view.
>
> ****************
> Now I am proceeding to discrimination between notions above.
>
> 1. *Entity*
>
> I call *entity* anything which can potentially exist (including the 
> legendary bird Phoenix). The word *entity* comes from a Latin word, 
> which in English would sound like "existent" with plural "existents" - 
> sure, English does not have such a word. "Entity" has same meaning in 
> Latin as the word "ontos" in  Greek. Therefore, "entity" looks like 
> the best candidate for a name for the most general notion in Ontology 
> as a discipline. I say that an entity is a "piece of existence". The 
> word Thing has same meaning and usage as "entity" in OWL.
>
> 2. *Object*
>
> Very often they don't discriminate between *entity* and *object*, but 
> in language sometimes you feel "conceptual discomfort" when "entity" 
> is used instead of "object".
> I believe, that *object* is a particular kind of *entity* which *has* 
> content and *is*  represented in mind (here, *has* and *is* are two 
> ontological relationships of fundamental importance and I emphasized 
> them). Also, I think, that by a *resource* we mean exactly an 
> *object*. But, I found that a "theory of objects" must be very complex 
> - I will expose below the beginnings of it in terms of MyPhenomenology.
>
> If an entity exists "somewhere outside" and the mind has no 
> representation of it ("no idea" of it), then this is an *entity*, but 
> not an *object*. I call *object* only an entity which has a 
> representation  in mind - a label, a mental picture,  anything which 
> stands in mind for the entity. Same applies to software - if a "visual 
> processor" creates an identifier for a fragment of a picture, then 
> this fragment becomes an *object* for this software agent and starts 
> making part of this agent's *reality*.  There is a huge number of 
> fragments of a picture, an infinite number (if you admit infinity), 
> and the human mind or device cannot have a label for each of them, say 
> nothing of more complex presentations.
>
> The first attempt to formalize the notion of *object* is to regard it 
> as an ordered pair  (*presentation* , *content*), where  
> *presentation* is in our mind and *content* is outside (this view 
> complies with "form-content" dichotomy). The ordered pair above is a 
> "rudiment" of the object and we need a special name for it - I call it 
> object's *manifestation*. "Manifestation" is a synonym for 
> "phenomenon" , but it is more frequently used as "manifestation of", 
> which makes it good candidate name for the "rudiment" *of* an object.
> The *presentation* (as first member of the ordered pair 
> *manifestation*) of an object plays different roles in different 
> activities - "it gives an idea", it is the target in reification 
> process, it is the source in identification as referencing 
> (intentionality), and it is initial state in identification process 
> (identification has many meanings)  These are different activities, 
> and we cannot expect "presentation" to be called "presentation" in all 
> contexts. In the domains I mentioned, I say *presentation* to be - 
> *notion* or *concept*, *identity*, *identifier*, *presentation* or 
> *re-presentation*, respectively.
>
> Now, let us see what happens when one member of the ordered pair above 
> vary and the other member remains the same.
>
> A cloud changes in shape but we somehow know that it is the same 
> cloud. Here the cloud is content and this content is changing. In 
> order for the mind to know that cloud is the same, it must keep the 
> presentation unchanged. Now, suppose the content is an elephant which 
> is a solid thing and does not change - so the content does not vary. 
> But you look at it under different angles and you get different 
> presentations.  What property of the manifestations is responsible for 
> sameness in both cases?
>
> I define sameness of manifestations this way
>
> *(A, B) sameAs (A', B') if and only if ((A=A') OR (B=B'))*
>
> And so, two manifestations are the same iff at least one of their two, 
> presentation or content, coincide. The *defining* property of an 
> ordered pair in set theory differs from my defining property - set 
> theory uses the conjunction AND and I am using the disjunction OR. So, 
> in case of objects this is not quite an "ordered pair". I call it 
> *reification pair* for reasons which will become clearer below. 
> "Reification pair" is a synonym for "manifestation", but we need yet 
> another term to disclose the type of structure a manifestation is.
>
> The relationship of sameness as defined above is reflexive, symmetric 
> and transitive - it is a relationship of equivalence. Therefore, it 
> induces a partition over the set of object manifestations as ordered 
> pairs. I say each set of this partition to be an  *identification 
> class*. For an agent to be able to keep track of "sameness", it has to 
> to select exactly one reification pair (i, c) within each 
> identification class against which to check the others on sameness. I 
> say the presentation *i* of the selected reification pair to be 
> *identity* of an object. I think the process of identification is like 
> this - given a presentation, the agent looks for the identification 
> class where resides this presentation and produces the identity 
> residing in this class. This completes the formalisation of the notion 
> of object like this - we call an object a reification pair *(ID, 
> content)*, where ID is an identity and *content* is an open world (I 
> regard closed world as partial case of open world). One last thing to 
> notice is that presentations in object manifestations should be better 
> regarded as *re-presentations* when the content exists, and 
> presentations when the content is yet to be found or created.
>
> Now, the content of an object might be empty (void). This does not 
> mean that such an object "does not have content". Same as in set 
> theory they gave an identity to the emptiness by introducing the 
> notion of empty set,  I regard empty content as void content, which 
> *is* content. Therefore, even for agnostics who doubt existence of 
> things outside their mind, the objects have content - void content. In 
> terms of "intentionality" the association of the void content with a 
> presentation in mind signifies "intendedness" of the presentation. For 
> example, in case of Phoenix, even if we know  that such a bird does 
> not exist, we admit that there is something which we "intend" by 
> Phoenix - this is because we have associated  void content to an 
> identity called by its name. We can later discover that such a bird 
> exists - then the content changes, but the identity remains the same.
>
> The etimology and morphology of "rei-fication"  suggest the meaning of 
> "creating things". I treat "things" as objects (not entities) and 
> creation of things as an activity of running through various  
> re-presentations, until the agent chooses one which is invariant 
> enough while content is changing, and assigning it the role of 
> *identity* for an object. Therefore, I regard "reification" as a 
> process of creating object *identities*.
>
> I regard the relationship of *intentionality* ("referencing") as 
> *inverse* to the relationship of *reification*. Due to the defining 
> property  of "reification pair", all the manifestations within one 
> *identification class* are manifestations of the same object. 
> Therefore, the first members of reification pairs within one 
> identification class are *identifiers* of the same object. The blind 
> people might get incomplete ideas of an elephant, but even their 
> representations are identifiers of the elephant. And so, in the 
> activity of referencing the presentations play the role of "identifiers".
>
> 3. *Resource*
>
> What is a *resource* in  the  web Architecture? I treat it as an 
> *object* with its specific *presentations* called URI references and 
> the *content* - any piece of "content of the Universe" (which can be a 
> piece of software, a presentation in mind,  or a physical body).  I 
> believe, this treatment of resource as an *object* can help better 
> understand the notion of *resource*
>
> Web is an agent, the presentation system of which has two main subsystems
> -  Web pages presentation system as an interface with human agents
> -  A data and information presentation system as an interface with 
> applications
>  Probably the second system needs some "semantic servers" and 
> "semantic browsers"
>
> The name *Universal* Reource Identifiers is a prescription to all  
> agents to reify by using this schemata. In order for the agents to be 
> able to do this, each needs a central authority for their domain,  
> which would maintain a uniform "scheme" within URI schemata. But there 
> is a huge number of domains and they all need authorities. And these 
> authorities need guidelines. Also, I am sure that not only people will 
> have to be involved in reification, but also software.
>
> Now in order for the URI to be really universal, I believe a standard 
> on reification is needed. Currently, there is only one standard on 
> reification which shows how to reify only RDF "reality" - the statements.
>
> I did not share my understanding on *Information Resource*, because it 
> is already time to call this a message.
>
> Ioachim
> (In my first presentation of myself, I used both the short "Joe" and 
> the original long "Ioachim". But this created an "identity crisis" - I 
> am now called different names. So, now I selected  the original 
> "Ioachim" as "identity" for this object here :-)
>
>
>
> Renato Golin wrote:
>> Hi Ioachim,
>>
>> Ioachim Drugus wrote:
>>  
>>> I think, content-type is the type that the *author* of the content
>>> *intended*  the content to be. Content-type helps the interpreter
>>> (interpreting agent) to select the right approach to interpretation, 
>>> but
>>> does not  guarantee that it will interpret the content as it was
>>> intended by the author.
>>>     
>>
>> Exactly, it's only the author's intention, nothing more.
>>
>>
>>  
>>> Availability of content-type is necessary but
>>> not sufficient for a piece of data to become information.
>>> What I wrote previously refers only to discrimination between data and
>>> information, but it does not explain how things go further.
>>>     
>>
>> I wouldn't say not even necessary, but optional. You definitely don't
>> need content-type to know an HTML when you look at it. Programs aren't
>> that different, just a bit dumber.
>>
>> Of course it's *much* simpler to have context type, even for us. ;)
>>
>>
>>  
>>> Now, since the interpreter is confined by the knowledge {content,
>>> content-type}, the only other thing which is given to start the
>>> interpretation process is *context*.
>>>     
>>
>> As content-type is a kind of context this is a bit redundant.
>>
>> Data + context = Information
>>
>> SYN-SUM(Information) = Knowledge
>>
>> ie. all contexts (known) about the same data, in synergy, so:
>>
>> SYN-SUM[N](Information) != SYN-SUM[N-1](Information) + Information N
>>
>> Of course things can get much more complicated, data can be a subset of
>> other data in a different context and things like that but that's
>> further than the discussion about the same data's contexts.
>>
>>
>>
>>  
>>> There is yet another aspect - the difference between *information* and
>>> *information resource* on which I which I will not write here  to keep
>>> to the point of this discussion - discrimination between data and
>>> information. This difference is clearly stated in how Tim defined the
>>> information resource, but I think,  after I work here a little, I will
>>> come back with a " formalized" manner to put it, which might also help.
>>>     
>>
>> Yes, good thread going on about it, I couldn't help much with that,
>> though... ;)
>>
>> cheers,
>> --renato
>>
>>   
>
>
Received on Thursday, 5 July 2007 03:00:12 UTC