Re: owl:sameAs use/misuse/abuse Re: homonym URIs from Ioachim Drugus on 2007-06-27 (semantic-web@w3.org from June 2007)

From: Ioachim Drugus <sw@semanticsoft.net>
Date: Wed, 27 Jun 2007 11:24:24 -0700
To: John Black <JohnBlack@kashori.com>
CC: Tim Berners-Lee <timbl@w3.org>, Richard Cyganiak <richard@cyganiak.de>, Jacek Kopecky <jacek.kopecky@deri.org>, Bernard Vatant <bernard.vatant@mondeca.com>, semantic-web@w3.org
Message-ID: <4682AB58.9080606@semanticsoft.net>
I am new to this list, but have been working on these notions, including 
as architect at www.semanicsoft.net, and I hope my thoughts will be useful.

1. To distinguish information from data, I follow the principle: 
Information = Data + Interpretation.
Without a content-type I cannot interpret the data - therefore, what 
comes without a content-type is not information. I believe, in web 
Architecture, by content type they made a perfect distinction between 
data and information.
2. When I call "information" the non-interpreted data, then I refer to 
the *potentiality* for data to be interpreted, or the "intention" of an 
agent for the data to be information. But we cannot regularily call 
something by the name of what it can *potentially* be, or based on the 
"intention" of an agent - a better name would come from what *it is*. 
So, non-interpreted data is just this - data.
3. Whether a piece of data is information is relative to an agent - 
software or human. If you, as an agent, can interpret a piece of data, 
then you have a content-type (which might be written in your own 
format). Another agent, like a program, without an appropriate 
content-type will not be able to interpret the data. I might find the 
data format coinciding with a system of music notation and play a 
melody, which somebody will treat as a cacophony and others as a new 
style in music. All this sums up to the statement that a piece of data 
can serve as different pieces of information for different agents due to 
them using different content-types to interpret the data.
4. A resource must necessarily have a URI. Resources and their URIs are 
in the relationship of "intentionality" as understood in philosophy and 
informally treated as "aboutness" 
(http://en.wikipedia.org/wiki/Intentionality). I believe, the semantic 
web architecturers were aware of this when they used the term "about" to 
make connection between a resource and its URI.

Now, according 4, a URI is *not* an information resource. Moreover, an 
URI is *not* a resource. To become a resourse, the URI should have its 
own URI ("URI of URI"). To become an information resource, the "inner 
URI" should also come with one or several content types. If my 
understanding 4 is interesting, I can share it in more detail.

Joe
Ioachim Drugus, Ph.D.
Architect
Semantic Soft, Inc.



John Black wrote:
> Tim,
> Ok. Now I am officially freaked out. I thought I was illustrating 
> another difficulty with eliminating ambiguity. But after your response 
> below, wherein you say a text string, in a text file, on my server, 
> representing a URI, is NOT a representation of an "information 
> resource", I am thrown back again to just trying to understand. If 
> your response is accurate then the idea of an "information resource" 
> has become incomprehensible to me.
> On 2007-06-26, at 19:25, Tim Berners-Lee wrote:
>
>     On 2007-06 -25, at 11:00, John Black wrote:
>>     [...] But surely a URI is an information resource in the same way
>>     that a blog post is and so it can be represented by a web page
>>     the same way a blog post is represented by the web page you get
>>     through HTTP.
>>
>>     Now my FOAF URI is this
>>     http://kashori.com/JohnBlack/foaf.rdf#jpb. As a URI, it is an
>>     information resource, namely a string of characters conforming to
>>     rfc3986.
>     Well, that is not how Information Resource is used in the web
>     Architecture. An Information Resource conveys information, and in
>     the web architecture it can severl representations, but any one of
>     them must have a content-type (and possibly other metadata) as
>     well as a string of bits.
>
> I am going by something like this: """We do not limit the scope of 
> what might be a resource. The term "resource" is used in a general 
> sense for whatever might be identified by a URI. It is conventional on 
> the hypertext Web to describe Web pages, images, product catalogs, 
> etc. as “resources”. The distinguishing characteristic of these 
> resources is that all of their essential characteristics can be 
> conveyed in a message. We identify this set as “information 
> resources.”""" from http://www.w3.org/TR/webarch/#id-resources.
> Please tell me which of the essential characteristics of a URI cannot 
> be conveyed in a message. I don't see any. How is a URI less of an 
> information resource than a web page, image, product catalog, or that 
> document itself?
>
>     In other words, the architecture is not that strings of bits are
>     self-describing. It is not that you can guess what a string of
>     bits is intended to convey when you meet it on the street. It is
>     that the content-type tells you how to interpret it. So, the same
>     string of bits may signify the source markup of an HTML page when
>     paired text/plain and the document as represented in HTML (the
>     noemal bowsers case) when paired with text/html.
>
>     So, strictly, you can say that an IR has a representation whcih is
>     48 bytes long, but not that the IR is 45 bytes long.
>
> When I access a representation of that information resource identified 
> by http://kashori.com/ontology/MyURI and capture the full HTTP return 
> with Paros, I do in fact get a Content-Type:
> HTTP/1.1 200 OK
> Date: Wed, 27 Jun 2007 03:14:43 GMT
> Server: Apache/2.0.51 (Fedora)
> Last-Modified: Mon, 25 Jun 2007 12:08:07 GMT
> ETag: "aff01a2-2a-dd9f17c0"
> Accept-Ranges: bytes
> Content-Length: 42
> Connection: close
> Content-Type: text/plain; charset=UTF-8
> As you can see, that representation has a Content-Type of 
> "text/plain". How is that different from "...the source markup of an 
> HTML page..."? And If I embed it in HTML, and return that 
> representation, as a URI as represented in HTML, how is that different 
> from a "...document as represented in HTML"? Why is a URI less of an 
> information resource than a document?
>
>>
>>     I have created a web page representation of this information
>>     resource at http://kashori.com/ontology/MyURI according to
>>     standard REST web architecture principles. As the owner of and
>>     therefore the authority about the referent of that URI, I hereby
>>     proclaim that this web URI denotes my RDF FOAF URI,
>>     http://kashori.com/JohnBlack/foaf.rdf#jpb.
>
>     In other words we would say <http://kashori.com/ontology/MyURI>
>     owl:sameAs "http://kashori.com/JohnBlack/foaf.rdf#jpb".
>
>     The thing denoted by the MyURI is the string "..#jpb".
>
> You mean without the base file? Why is that?
>
>
>     Well, yes, but is this useful?
>
> You mean useful to anyone, ever? Well, I wasn't yet at the point of 
> deciding the utility of this method for everyone for all time. But if 
> you think, as I do, that most the semantics in RDF to date is 
> accomplished by the incorporation of natural language words inside of 
> URI identifiers, I should think it may be helpful to be able to parse 
> them and use those embedded components at the level of RDF statements.
>
>
>>     This uses web technologies to identify that FOAF URI by another
>>     URI. In particular, as an information resource, something that
>>     can be completely characterized by a message, I can identify it
>>     directly with a 'slash' URI. I don't need a 303 or a 'hash' URI.
>
>     Oh, Yes you do, as a literal string is not an information resource.
>
> As I said, this is incomprehensible to me. Many 'documents' can be 
> represented as literal strings. Why can't a URI be represented that 
> way also?
>
>
>>     Now I can talk directly about, or mention, that FOAF URI in RDF.
>>
>>     <http://kashori.com/ontology/MyURI> str:numOfCharacters 41.
>>
>>     In this case, the RDF statement is about the identifier. This
>>     contradicts your statement that "...RDF statements always are
>>     about the referents, and never about the identifier." Here the
>>     referent is the identifier.
>
>     No, not THE identifier, a different identifier.
>
> Yes, thats what I meant, the URI used in the RDF statement, denotes an 
> identifier that is mentioned in the RDF statement.
>
>
>>     I am talking as directly about my FOAF URI as I am talking
>>     directly about any other information resource as represented by a
>>     web page by stating in RDF:
>>
>>     1. <http://kashori.com/ontology/MyURI> owl:sameAs
>>     "http://kashori.com/JohnBlack/foaf.rdf#jpb"^^xsd:anyURI.
>>     2. <http://kashori.com/ontology/MyURI> dc:creator
>>     <http://kashori.com/JohnBlack/foaf.rdf#jpb>.
>>
>>     In natural language, 1. that FOAF URI is the same as that literal
>>     URI. and 2. that FOAF URI has a creator that is John Black.
>>
>>     Finally, consider this URI:
>>     http://kashori.com/ontology/self-referential. This URI
>>     identifies/denotes itself. So we can say
>>
>>     <http://kashori.com/ontology/self-referential> owl:sameAs
>>     "http://kashori.com/ontology/self-referential"^^xsd:anyURI.
>>
>>     Only problem is, these URI are ambiguous, we can't tell if they
>>     identify the identifiers or the web pages representing the
>>     identifiers.
>
>     No, they are not ambiguous, you said they represent the
>     identifiers and so they must NOT return 200.
>
> Ok. Here is where I must draw a line in the sand with my toe. Here I 
> will not cross. I interpret this to mean that you classify a URI along 
> with cars and people and other non-information resources, and claim 
> that best practices require that I set up a 303 redirect for it. I 
> can't comprehend that. For if that is required because I called it an 
> 'identifier' then why would it not be true if I call a document a 
> 'contract', for example? But it also brings up another problem for me.
> For years I have been under the impression that an HTTP URI 
> identifies/denotes the content that is returned when a GET is 
> performed using that URI. But lately I have learned that is not the 
> case. The URI identifies an "information resource" that is represented 
> by the content that is returned. As a result, doesn't it now become 
> impossible to distinguish between a URI that identifies a 
> representation of an information resource from one that identifies the 
> information resource? Which does this URI identify, 
> http://www.w3.org/TR/webarch/, the document or the content that is 
> returned with GET? If the former, how do I identify the later? And If 
> the W3C asserts that the "information resource" identified is a 
> 'recommendation', does that mean it must NOT return 200? If not, then 
> how can you say that because I call a text string an 'identifier', it 
> must NOT return a 200?
>
>
>     As far as I can see, the semantic web has a consistent
>     architecture which works.
>
>     (I am not sure whether you are trying to understand it or to
>     suggest an alternative or
>     try to show it doesn't work, or just check the seals. :-)
>
> Once again thrown back to just trying to understand it, as I said. But 
> in general, for several years now, I have been investigating 
> alternative ways to establish and convey the reference 
> (denotation/nterpretation) of an RDF URI using HTTP technology. I 
> believe there must be something more powerful than to just to 'return 
> useful information'. However, many of my ideas are apparently outlawed 
> (or strongly discouraged) by the Architecture. So I have tried to show 
> where the Architecture that outlaws these alternatives may not be 
> optimal - or at least show that it has leaks.
> John
>
>
>     Tim
>
>>
>>     John Black
>>     www.kashori.com
>
Received on Thursday, 28 June 2007 12:17:14 UTC