Re: owl:sameAs use/misuse/abuse Re: homonym URIs from Richard Cyganiak on 2007-06-26 (semantic-web@w3.org from June 2007)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Tue, 26 Jun 2007 12:05:07 +0200
To: Jacek Kopecky <jacek.kopecky@deri.org>
Cc: Bernard Vatant <bernard.vatant@mondeca.com>, Tim Berners-Lee <timbl@w3.org>, semantic-web@w3.org
Message-Id: <84235ADE-A22A-4611-ADF4-A7F4FA3ED5B4@cyganiak.de>
Jacek,

Thanks for the well-argued response.

On 25 Jun 2007, at 16:39, Jacek Kopecky wrote:
>> Because RDF statements always are about the
>> referents, and never about the identifier. The redirection is a
>> property of the identifier system (the URI), and not of the
>> identified thing. If I say:
>>
>> <http://dbpedia.org/resource/Berlin> http:redirectsTo <http://
>> dbpedia.org/page/Berlin> .
>>
>> Then I have said “the city of Berlin redirects to a web page about
>> the city of Berlin.” Which is nonsense.
>>
>> Same with things like:
>>
>> <http://dbpedia.org/resource/Berlin> str:numOfCharacters 33 .
>
> It's not the same. The redirection is not a property of the URI (you
> can't tell if a URI will redirect just by looking at the URI), it's a
> property of the dereferencing mechanism and the server setting.

You are right. But still, RDF talks about the resource identified by  
the URI, and not about the dereferencing mechanism and server  
settings. Thus, I believe my point still stands.

> I expect we can say for an information resource something like this:
>
> <http://example.com/>  http:representation "<HTML>...</HTML>" .
>
> Because information resources do have representations. Let's assume  
> that
> http:representation means "at one point in time had this
> representation", or it could be timestamped and conneg-qualified etc.

Yes, that seems reasonable to me. If I have, say, an information  
resource “the paper I wrote for the so-and-so workshop”, or “the  
database record #123 in the ‘users’ table on my server”, then it's  
reasonable to say that the paper or the database record has a certain  
representation, that is, a certain sequence of bytes with a  
corresponding MIME type.

> But IMO the representation is as worthy of being had by an information
> resource as are the other HTTP properties, e.g. the status code  
> when GET
> is done on the resource:
>
> <http://example.com/> http:getStatusCode "200"^^xs:int .

Here I disagree. The representation is a property of the resource.  
But the status code is a part of the transfer mechanism that was used  
to deliver the representation to the client. Thus, the 200 is not a  
property of the resource itself. Same for other HTTP headers.

<snip>
> Especially if this triple is asserted by an automated crawler that  
> tries
> to dereference URIs and records the status codes *returned by the
> resources*.

Watch your words: The status codes are not, strictly speaking,  
“returned by the resources”. They are returned by a server that  
generates or stores a representation of the resource.

If I'd build a web crawler with an RDF store in the backend, I'd  
probably not care about the fine points here and just assert the  
triple just as you did. I can do this if my crawler is a closed  
world, where I can introduce additional assumptions. In an open  
world, I wouldn't do it, but instead lift the request into its own  
resource:

[ a http:Request;
     http:requestResource <http://example.com/>;
     http:requestURI "http://example.com/"^^xsd:anyURI;
     http:responseRepresentation "<html>...";
     http:responseStatusCode 200;
     ...
]

> And my http:redirects303To is IMO on par with http:getStatusCode.
>
> You see, I'm not trying to talk about the URI (e.g. being 33 chars  
> long)
> but about the resource. An HTTP information resource is available for
> dereferencing (communication) over HTTP, so it should have HTTP
> properties. And if so, any resource identified by a URI starting with
> http:// with no fragID gives me the license to talk to it over  
> HTTP, so
> it should also have HTTP properties.

I understand the sentiment. The claim that it's not OK to make RDF  
statements about HTTP interactions in the obvious fashion (by using  
the URI as in your example) certainly seems to be against good sense.  
But consider the actual impact. Who needs to make statements of this  
kind? Implementers of HTTP clients and servers. In return, everyone  
else gets a more consistent architecture.

>>> I've always been uneasy about the 303 approach to having http: URIs
>>> denote non-information resources; I guess I'd be in the 'hash' camp.
>>> Basically, my feeling is that 303 does not fully solve the issue,
>>> so it
>>> should be a softer recommendation than a W3C Recommendation MUST.

Apologies, I misunderstood your remarks as saying that the W3C  
doesn't allow hash URIs as identifiers for non-information resources,  
but instead forces the 303 upon us.

In my eyes, the 303 compromise is far from perfect and leaves a lot  
of questions open, but it is better than the earlier state (where we  
couldn't confidently use RDF to talk about web pages at all, because  
we never could be sure that the URI doesn't identify a person or  
country or whatnot) and has allowed us to move on and build useful  
systems.

Cheers,
Richard


>>
>> It isn't a MUST, and I've never seen anyone suggest that it should  
>> be.
>>
>> Hash URIs and 303 URIs are both perfectly fine as identifiers for  
>> non-
>> information resources, both with their pros and cons (discussed at
>> length in e.g. [1], [2] and [3]).
>
> Well, the HttpRange draft [3] says:
>
>         According to the HTTP specification, when a code of 200 is
>         received in response to an HTTP GET request, it indicates that
>         "an entity corresponding to the requested resource" has been
>         returned in the response. The contents of this entity is  
> what we
>         understand as a representation of the resource. This
>         correspondence between a resource and a representation is
>         defined in [AWWW] as characterising an information resource.
>         Consequently, we can assume that if we receive this particular
>         response code in response to an HTTP GET request, we have also
>         received a representation and that the URI references an
>         information resource.
>
> This is a chain of statements not qualified to be less than true (e.g.
> SHOULD-level recommendations). I interpret MUST as "it just is so",  
> same
> as factual statements. MUST is used in specese to make sure the reader
> understands it, but in my reading, "the client send a message" is the
> same as "the client MUST send a message".
>
> So this is where I get the "if you get 200, the URI MUST identify an
> information resource", and this is what I'm not comfy with.


>
> Best regards,
> Jacek
>
>>
>> Richard
>>
>> [1] http://www.w3.org/TR/swbp-vocab-pub/
>> [2] http://www.dfki.uni-kl.de/~sauermann/2006/11/cooluris/
>> [3] http://www.w3.org/2001/tag/doc/httpRange-14/2007-05-31/ 
>> HttpRange-14
>>
>>
>
>
>
Received on Tuesday, 26 June 2007 10:07:02 UTC