Re: owl:sameAs use/misuse/abuse Re: homonym URIs from John Black on 2007-06-27 (semantic-web@w3.org from June 2007)

From: John Black <JohnBlack@kashori.com>
Date: Wed, 27 Jun 2007 12:15:39 -0400
To: "Tim Berners-Lee" <timbl@w3.org>
Cc: "Richard Cyganiak" <richard@cyganiak.de>, "Jacek Kopecky" <jacek.kopecky@deri.org>, "Bernard Vatant" <bernard.vatant@mondeca.com>, <semantic-web@w3.org>
Message-ID: <0d7801c7b8d6$68b65380$6601a8c0@KASHORI001>
Tim,

Ok. Now I am officially freaked out. I thought I was illustrating another difficulty with eliminating ambiguity. But after your response below, wherein you say a text string, in a text file, on my server, representing a URI, is NOT a representation of an "information resource", I am thrown back again to just trying to understand. If your response is accurate then the idea of an "information resource" has become incomprehensible to me. 

On 2007-06-26, at 19:25, Tim Berners-Lee wrote:
  On 2007-06 -25, at 11:00, John Black wrote:
    [...] But surely a URI is an information resource in the same way that a blog post is and so it can be represented by a web page the same way a blog post is represented by the web page you get through HTTP.


    Now my FOAF URI is this http://kashori.com/JohnBlack/foaf.rdf#jpb. As a URI, it is an information resource, namely a string of characters conforming to rfc3986.
  Well, that is not how Information Resource is used in the web Architecture.  An Information Resource conveys information, and in the web architecture it can severl representations, but any one of them must have a content-type (and possibly other metadata) as well as a string of  bits.

I am going by something like this: """We do not limit the scope of what might be a resource. The term "resource" is used in a general sense for whatever might be identified by a URI. It is conventional on the hypertext Web to describe Web pages, images, product catalogs, etc. as "resources". The distinguishing characteristic of these resources is that all of their essential characteristics can be conveyed in a message. We identify this set as "information resources."""" from http://www.w3.org/TR/webarch/#id-resources. 

Please tell me which of the essential characteristics of a URI cannot be conveyed in a message. I don't see any. How is a URI less of an information resource than a web page, image, product catalog, or that document itself? 

  In other words, the architecture is not that strings of bits are self-describing.  It is not that you can guess what a string of bits is intended to convey when you meet it on the street.  It is that the content-type tells you how to interpret it.  So, the same string of bits may signify the source markup of an HTML page when paired text/plain and the document as represented in HTML (the noemal bowsers case) when paired with text/html.


  So, strictly, you can say that an IR has a representation whcih is 48 bytes long, but not that the IR is 45 bytes long.


When I access a representation of that information resource identified by  http://kashori.com/ontology/MyURI  and capture the full HTTP return with Paros, I do in fact get a Content-Type:
HTTP/1.1 200 OK
Date: Wed, 27 Jun 2007 03:14:43 GMT
Server: Apache/2.0.51 (Fedora)
Last-Modified: Mon, 25 Jun 2007 12:08:07 GMT
ETag: "aff01a2-2a-dd9f17c0"
Accept-Ranges: bytes
Content-Length: 42
Connection: close
Content-Type: text/plain; charset=UTF-8

As you can see, that representation has a Content-Type of "text/plain". How is that different from "...the source markup of an HTML page..."? And If I embed it in HTML, and return that representation, as a URI as represented in HTML, how is that different from a "...document as represented in HTML"? Why is a URI less of an information resource than a document? 




    I have created a web page representation of this information resource at http://kashori.com/ontology/MyURI according to standard REST web architecture principles. As the owner of and therefore the authority about the referent of that URI, I hereby proclaim that this web URI denotes my RDF FOAF URI, http://kashori.com/JohnBlack/foaf.rdf#jpb. 



  In other words we would say <http://kashori.com/ontology/MyURI> owl:sameAs "http://kashori.com/JohnBlack/foaf.rdf#jpb".


  The thing denoted by the MyURI is the string "..#jpb".

You mean without the base file? Why is that?


  Well, yes, but is this useful?

You mean useful to anyone, ever? Well, I wasn't yet at the point of deciding the utility of this method for everyone for all time. But if you think, as I do, that most the semantics in RDF to date is accomplished by the incorporation of natural language words inside of URI identifiers, I should think it may be helpful to be able to parse them and use those embedded components at the level of RDF statements.


    This uses web technologies to identify that FOAF URI by another URI. In particular, as an information resource, something that can be completely characterized by a message, I can identify it directly with a 'slash' URI. I don't need a 303 or a 'hash' URI.


  Oh, Yes you do, as a literal string is not an information resource.

As I said, this is incomprehensible to me. Many 'documents' can be represented as literal strings. Why can't a URI be represented that way also?


    Now I can talk directly about, or mention, that FOAF URI in RDF.


    <http://kashori.com/ontology/MyURI> str:numOfCharacters 41.


    In this case, the RDF statement is about the identifier. This contradicts your statement that "...RDF statements always are about the referents, and never about the identifier." Here the referent is the identifier.


  No, not THE identifier, a different identifier.   

Yes, thats what I meant, the URI used in the RDF statement, denotes an identifier that is mentioned in the RDF statement.


    I am talking as directly about my FOAF URI as I am talking directly about any other information resource as represented by a web page by stating in RDF:


    1. <http://kashori.com/ontology/MyURI> owl:sameAs "http://kashori.com/JohnBlack/foaf.rdf#jpb"^^xsd:anyURI.
    2. <http://kashori.com/ontology/MyURI> dc:creator <http://kashori.com/JohnBlack/foaf.rdf#jpb>.


    In natural language, 1. that FOAF URI is the same as that literal URI. and 2. that FOAF URI has a creator that is John Black.


    Finally, consider this URI: http://kashori.com/ontology/self-referential. This URI identifies/denotes itself. So we can say


    <http://kashori.com/ontology/self-referential> owl:sameAs "http://kashori.com/ontology/self-referential"^^xsd:anyURI.


    Only problem is, these URI are ambiguous, we can't tell if they identify the identifiers or the web pages representing the identifiers.


  No, they are not ambiguous, you said they represent the identifiers and so they must NOT return 200.

Ok. Here is where I must draw a line in the sand with my toe. Here I will not cross. I interpret this to mean that you classify a URI along with cars and people and other non-information resources, and claim that best practices require that I set up a 303 redirect for it. I can't comprehend that. For if that is required because I called it an 'identifier' then why would it not be true if I call a document a 'contract', for example? But it also brings up another problem for me. 

For years I have been under the impression that an HTTP URI identifies/denotes the content that is returned when a GET is performed using that URI. But lately I have learned that is not the case. The URI identifies an "information resource" that is represented by the content that is returned. As a result, doesn't it now become impossible to distinguish between a URI that identifies a representation of an information resource from one that identifies the information resource? Which does this URI identify, http://www.w3.org/TR/webarch/, the document or the content that is returned with GET? If the former, how do I identify the later?  And If the W3C asserts that the "information resource" identified is a 'recommendation', does that mean it must NOT return 200? If not, then how can you say that because I call a text string an 'identifier', it must NOT return a 200? 


  As far as I can see, the semantic web has a consistent architecture which works.


  (I am not sure whether you are trying to understand it or to suggest an alternative or
  try to show it doesn't work, or just check the seals. :-)

Once again thrown back to just trying to understand it, as I said.  But in general, for several years now, I have been investigating alternative ways to establish and convey the reference (denotation/nterpretation) of an RDF URI  using HTTP technology. I believe there must be something more powerful than to just to 'return useful information'.  However, many of my ideas are apparently outlawed (or strongly discouraged) by the Architecture. So I have tried to show where the Architecture that outlaws these alternatives may not be optimal - or at least show that it has leaks. 

John



  Tim




    John Black
    www.kashori.com
Received on Wednesday, 27 June 2007 16:16:27 UTC