Re: xsd:anyURI, rdf URIs, information resources from Alan Ruttenberg on 2008-07-03 (public-awwsw@w3.org from July 2008)

From: Alan Ruttenberg <alanruttenberg@gmail.com>
Date: Thu, 3 Jul 2008 02:30:59 -0400
To: "Booth, David (HP Software - Boston)" <dbooth@hp.com>
Cc: "public-awwsw@w3.org" <public-awwsw@w3.org>, Stasinos Konstantopoulos <konstant@iit.demokritos.gr>, Ivan Herman <ivan@w3.org>, Dan Connolly <connolly@w3.org>, Phil Archer <parcher@icra.org>, W3C SW Coordination Group <w3c-semweb-cg@w3.org>, Matt Womer <mdw@w3.org>, "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
Message-Id: <501B5C48-EC67-4B08-B1E9-5820FA8B4A6B@gmail.com>
On Jul 2, 2008, at 2:05 PM, Booth, David (HP Software - Boston) wrote:

> Hi Alan,
>
> This analysis looks rather tricky, and I'm not sure I've properly  
> understood it all, but here are some comments.
>>
>> Suppose we have:
>>
>> <http://purl.org/obo/obi.owl>  rdf:has_name "http://purl.org/obo/ 
>> obi.owl"^^xsd:anyURI
>
> One thing you didn't mention: where would such an assertion come from?

It wasn't germane to the discussion at that point.

>   In the n3 ontology and rules that I drafted to describe the  
> semantics of HTTP, I've assumed that the *parser* of an RDF  
> document would implicitly add such an assertion to the triples that  
> were explicitly asserted by the document, because it relates the  
> syntax of the URI to the semantics of the resource it denotes:
> http://esw.w3.org/topic/AwwswDboothsRules
> [[
>  41. # We assume that the parser has automatically asserted:
>  42.
>  43. <http://example/people#dan> uri:hasURI
>  44.         "http://example/people#dan"^^xsd:anyURI .
> ]]

That's good, I was wondering about that. My point is that it isn't  
complete, however.

>> In XML Schema terms, the mapping of lexical space to value space of
>> rdf:anyURI (if one was defined) would be identity.
>
> This that really true?  Later in your message you seem to point out  
> that RDF-MT assumes that there could be some kind of URI  
> normalization applied before the semantics are examined:
> http://www.w3.org/TR/rdf-mt/#urisandlit
> [[
> This document does not take any position on the way that URI  
> references may be composed from other expressions, e.g. from  
> relative URIs or QNames; the semantics simply assumes that such  
> lexical issues have been resolved in some way that is globally  
> coherent, so that a single URI reference can be taken to have the  
> same meaning wherever it occurs.
> ]]

This is all about what happens before the URI appears in RDF.  
However, the consequence of your interpretation could be that we only  
allow canonicalized URIs as RDF URI-REFS. That certainly isn't said  
anywhere, and not checked by any parser, but I suppose we  could  
consider it.

>
>> For xsd:anyURI the lexical to value mapping must take in to  
>> account the (schema dependent) unescaping of percent encoded  
>> characters. (They would seem to me to make the pattern facet of  
>> xsd:anyURI rather difficult to implement in practice, as the  
>> pattern matching happens in value space).
>
> Why would that make it difficult?  Wouldn't it just mean that URIs  
> would be normalized (to value space) before a pattern is applied?   
> Or are you saying that this normalization would be difficult?  It's  
> true that normalization isn't free, as the URI spec points out:
> http://tools.ietf.org/html/rfc3986#section-6

The latter. The normalization depends on the schema. There are lots  
of schemas and there will be more. So doing so would mean that  
developers would have to update their RDF software each time a new  
schema was approved.

> But it sounds like RDF-MT admits the possibility of normalization  
> before the semantics is applied, i.e., before the character-by- 
> character comparison of URIs:
> http://www.w3.org/TR/rdf-mt/#urisandlit
> [[
> This document does not take any position on the way that URI  
> references may be composed from other expressions, e.g. from  
> relative URIs or QNames; the semantics simply assumes that such  
> lexical issues have been resolved in some way that is globally  
> coherent, so that a single URI reference can be taken to have the  
> same meaning wherever it occurs.
> ]]
> However I'm not certain that I'm reading that correctly.

I don't think so. Even it admits the possibility, it doesn't  
constrain you to doing it.

> I can see that if you write
> "http://purl.org/obo/obi%2Eowl" as an untyped string literal then  
> its mapping to value space would be direct, i.e., not normalized,  
> whereas if you write
> "http://purl.org/obo/obi%2Eowl"^^xsd:anyURI the URI normalization  
> would be applied in going to the value space for type xsd:anyURI.
>
> But I don't see why the type of <http://purl.org/obo/obi.owl> has  
> much bearing on whether any of the above are true.  Can you explain?

It depends on what an IR is. If we determine that two IRs are the  
same if asking the WEB about them always (would) return the same  
representation, then because you can't ask the web about these other  
than by http (that's my assumption) and http treats them the same,  
they could be inferred to be the same (by reading the http protocol).  
OTOH, there is no such suggestion relating response to 303s to the  
identity of the asked about resource.

>> -------------------------
>>
>> Suppose I have an resource <http://neurocommons.org/page/Main_Page>
>> and I make a request for
>> GET http://neurocommons.org/page/Main%5FPage (implicitly xsd:anyURI)
>>
>> Should or should not the response be the same as if I did
>> GET http://neurocommons.org/page/Main_Page (implicitly xsd:anyURI)
>>
>> In fact, I will get the same responses (always, and by definition of
>> the http protocol)
>>
>> If <http://purl.org/obo/obi.owl> is an IR, and it is strictly defined
>> by the function that maps to representations, then we would conclude
>> that <http://neurocommons.org/page/Main_Page> owl:sameAs <http://
>> neurocommons.org/page/Main%5FPage>
>
> Not quite.  You can conclude that the IR *aspects* of those two  
> resources are identical.  But a resource can have characteristics  
> of an IR (i.e., it can satisy all of the assertions required for it  
> to qualify as being an IR) *and* it can have characteristics of  
> other things (i.e., other assertions can also be true of it).

I understand this to be your view of IRs, however I haven't seen any  
consensus at all around that interpretation.

>> However, what should happen if <http://purl.org/obo/obi.owl> is not
>> an IR?
>>
>> According to RDF, <http://neurocommons.org/page/Main%5FPage> a priori
>> could have *absolutely nothing* to do with <http://neurocommons.org/
>> page/Main_Page>.  The above owl:sameAs is concluded not based on
>> anything in RDF, but by analysis of HTTP. However,  we have no
>> separate way to ask for these two resources using HTTP.
>
> The fact that the URI declarations of those two URIs turns out to  
> be the exact same page (via a 303 redirect) doesn't matter.  If the  
> page makes assertions involving one of those URIs and not the  
> other, then the other is unconstrained: you don't know what it  
> denotes in RDF, though you might guess.

Again, this makes sense in the framework of your proposal about URI  
declarations, but this isn't accepted by everyone. Even so, it seems  
problematic that if one allows that they could be different  
resources, only one of them is actually accessible via the http  
protocol, despite them being both http URIs.

>
>> One might
>> argue that since the 303 response is just "see other" or "you might
>> be interested in this too", there is no harm done. (Using "#" doesn't
>> fix this, btw). But if people put RDF there, and we believe the RDF,
>> then there could be mistakes easily made.
>
> I don't follow what you mean.  What potential harm?  What mistakes?

You think you are asking about <http://purl.org/obo/obi%2Eowl> but  
you are actually getting a response a question that is interpreted to  
be about <http://purl.org/obo/obi.owl>

Think phishing.

>> So I think we should be worried about the RDF/Web connection if my
>> analysis is right. a) This might be turned into an argument why HTTP
>> isn't appropriate for SemWeb use.
>
> I don't follow that.  Can you explain?

If we can't get access to all the resources that are named by http  
URIs, then this undermines the universality of http URIs as a gateway  
for documentation (at least) about all resources.

>> b) It points to an possible *actual* difference between IRs and  
>> non IRs that ought to be
>> measurable  in some sense (first that I know of, other than the  
>> tautological 200 response).
>
> The difference I see is this.  If you get a 200 response when you  
> dereference
> http://neurocommons.org/page/Main%5FPage
> then you learn *both* that
> http://neurocommons.org/page/Main%5FPage
> denotes an IR *and* you learn that
> http://neurocommons.org/page/Main_Page
> denotes an IR, and the IR aspects of the resource are the same.
> Whereas if you get a 303 response that redirects to a URI  
> declaration page, then you might only learn what *one* of those  
> URIs denotes.
>
>> c) It make life difficult for those poor
>> POWDER folks trying to figure out how to use OWL to do their bidding.
>
> Hmm.
>
>> d) Means we have to look a little more carefully at dbooth's hasURI
>> relation.
>
> At present the range of uri:hasURI is defined as xsd:anyURI:
> http://esw.w3.org/topic/AwwswDboothsRules
> [[
>  95. uri:hasURI a rdf:Property ;
>  96.    rdf:label "hasURI" ;
>  97.    rdf:comment ". . . " ;
>  98.    rdfs:subPropertyOf log:uri ;
>  99.    # rdfs:domain rdfs:Resource ;
> 100.    rdfs:range xsd:anyURI .
> ]]
> which, if your analysis of the xsd:anyURI type is correct, uses URI  
> normalization in going to value space.  So if
> <http://neurocommons.org/page/Main%5FPage> and
> <http://neurocommons.org/page/Main_Page> really are intended to be  
> treated as different URIs in the RDF semantics, then I guess I  
> should change the range of uri:hasURI to be an untyped literal string.

You would need to, I think, if you wanted <HTTP://purl.org> to have a  
URI whose value wasn't "http://purl.org".

>> If you wanted to repair this in  quick hacky way, one could amend
>> both the RDF or RDF/XML specifications so that they take in to
>> account the http escaping rules for names.
>
> Are you saying that the RDF/XML spec does not specify URI  
> normalization, but RDF-MT admits the possibility of URI normalization,

No. I don't see that it admits the possibility that all RDF URI Refs  
are normalized.

> and hence there is an ambiguity in determining which URI(s) denote  
> a particular resource?  So for example, if the following n3  
> assertion is parsed:
>
>   <http://neurocommons.org/page/Main%5FPage> _:a _:b .
>
> we will not know whether the parser will assert
>
>   <http://neurocommons.org/page/Main%5FPage> uri:hasURI
>        "http://neurocommons.org/page/Main%5FPage"^^xsd:anyURI .
>
> or
>
>   <http://neurocommons.org/page/Main%5FPage> uri:hasURI
>        "http://neurocommons.org/page/Main_Page"^^xsd:anyURI .
>
> or both, and we will not know whether

Or quite a few others.
>
>   <http://neurocommons.org/page/Main_Page> _:a _:b .
>
> has been asserted.  Is that what you mean?
Mostly. I don't understand this last bit.

-Alan
Received on Thursday, 3 July 2008 06:31:49 UTC