RE: xsd:anyURI, rdf URIs, information resources from Booth, David (HP Software - Boston) on 2008-07-07 (public-awwsw@w3.org from July 2008)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Mon, 7 Jul 2008 16:59:54 +0000
To: Alan Ruttenberg <alanruttenberg@gmail.com>
CC: "public-awwsw@w3.org" <public-awwsw@w3.org>, Stasinos Konstantopoulos <konstant@iit.demokritos.gr>, Ivan Herman <ivan@w3.org>, Dan Connolly <connolly@w3.org>, Phil Archer <parcher@icra.org>, W3C SW Coordination Group <w3c-semweb-cg@w3.org>, Matt Womer <mdw@w3.org>, "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
Message-ID: <184112FE564ADF4F8F9C3FA01AE50009FCF7E7E72C@G1W0486.americas.hpqcorp.net>
> From: Alan Ruttenberg [mailto:alanruttenberg@gmail.com]
> [ . . . ]
> >> In XML Schema terms, the mapping of lexical space to value space of
> >> rdf:anyURI (if one was defined) would be identity.
> >
> > This that really true?  Later in your message you seem to point out
> > that RDF-MT assumes that there could be some kind of URI
> > normalization applied before the semantics are examined:
> > http://www.w3.org/TR/rdf-mt/#urisandlit
> > [[
> > This document does not take any position on the way that URI
> > references may be composed from other expressions, e.g. from
> > relative URIs or QNames; the semantics simply assumes that such
> > lexical issues have been resolved in some way that is globally
> > coherent, so that a single URI reference can be taken to have the
> > same meaning wherever it occurs.
> > ]]
>
> This is all about what happens before the URI appears in RDF.
> However, the consequence of your interpretation could be that we only
> allow canonicalized URIs as RDF URI-REFS. That certainly isn't said
> anywhere, and not checked by any parser, but I suppose we  could
> consider it.

I'm not particularly advocating that interpretation, and having looked further at the definitions of "RDF/XML Document"
http://www.w3.org/TR/rdf-syntax-grammar/#dfn-rdf-xml-document
"RDF Document"
http://www.w3.org/TR/rdf-syntax-grammar/#dfn-rdf-document
and "RDF Graph"
http://www.w3.org/TR/rdf-concepts/#dfn-rdf-graph
I'm even more skeptical about it.

> [ . . . ]
> >> Suppose I have an resource <http://neurocommons.org/page/Main_Page>
> >> and I make a request for
> >> GET http://neurocommons.org/page/Main%5FPage (implicitly
> >> xsd:anyURI)
> >>
> >> Should or should not the response be the same as if I did
> >> GET http://neurocommons.org/page/Main_Page (implicitly xsd:anyURI)
> >>
> >> In fact, I will get the same responses (always, and by
> >> definition of the http protocol)
> >>
> >> If <http://purl.org/obo/obi.owl> is an IR, and it is
> >> strictly defined by the function that maps to representations,
> >> then we would conclude
> >> that <http://neurocommons.org/page/Main_Page> owl:sameAs <http://
> >> neurocommons.org/page/Main%5FPage>
> >
> > Not quite.  You can conclude that the IR *aspects* of those two
> > resources are identical.  But a resource can have characteristics
> > of an IR (i.e., it can satisy all of the assertions required for it
> > to qualify as being an IR) *and* it can have characteristics of
> > other things (i.e., other assertions can also be true of it).
>
> I understand this to be your view of IRs, however I haven't seen any
> consensus at all around that interpretation.

Let me phrase it differently.  Surely a resource can have properties that are not directly implied by the fact that the resource is an IR, right?  In other words, just because every IR *must* have certain properties, that does not prohibit a resource that is an IR from having *additional* properties, right?  Certainly this is normally true in RDF: just because a resource is known to be a member of class X, that does not necessarily mean that it cannot *also* be a member of some other class Y unless X and Y are somehow known to be disjoint.  And the question of whether class IR is disjoint from some other class Y depends on the definitions that are chosen for classes IR and Y, right?

Therefore, unless we assume an extremely restrictive definition of class IR that forces it to be disjoint from everything else in the universe, it should be safe to assume that a resource could indeed have characteristics of being an IR *and* have characteristics of being something else.  Make sense?  Does this still sound controversial?

>
> >> However, what should happen if <http://purl.org/obo/obi.owl> is not
> >> an IR?
> >>
> >> According to RDF,
> <http://neurocommons.org/page/Main%5FPage> a priori
> >> could have *absolutely nothing* to do with
> <http://neurocommons.org/
> >> page/Main_Page>.  The above owl:sameAs is concluded not based on
> >> anything in RDF, but by analysis of HTTP. However,  we have no
> >> separate way to ask for these two resources using HTTP.
> >
> > The fact that the URI declarations of those two URIs turns out to
> > be the exact same page (via a 303 redirect) doesn't matter.  If the
> > page makes assertions involving one of those URIs and not the
> > other, then the other is unconstrained: you don't know what it
> > denotes in RDF, though you might guess.
>
> Again, this makes sense in the framework of your proposal about URI
> declarations, but this isn't accepted by everyone. Even so, it seems
> problematic that if one allows that they could be different
> resources, only one of them is actually accessible via the http
> protocol, despite them being both http URIs.

Why is that problematic to allow http URIs that cannot be referenced?  The URI http://example/foo#bar also cannot be dereferenced until the fragment identifier #bar is stripped off.

Also, I notice that the RDF Concepts document section 6.4 already warns against unnecessary %-escaping:
http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref
[[
Note: Because of the risk of confusion between RDF URI references that would be equivalent if derefenced, the use of %-escaped characters in RDF URI references is strongly discouraged. See also the URI equivalence issue of the Technical Architecture Group [TAG].
]]
and the URI equivalence issue is here:
http://www.w3.org/2001/tag/issues.html#URIEquivalence-15

>
> >
> >> One might
> >> argue that since the 303 response is just "see other" or "you might
> >> be interested in this too", there is no harm done. (Using
> >> "#" doesn't
> >> fix this, btw). But if people put RDF there, and we
> >> believe the RDF, then there could be mistakes easily made.
> >
> > I don't follow what you mean.  What potential harm?  What mistakes?
>
> You think you are asking about <http://purl.org/obo/obi%2Eowl> but
> you are actually getting a response a question that is interpreted to
> be about <http://purl.org/obo/obi.owl>
>
> Think phishing.

Okay, so it's possible that you'll get information from a different location than you thought -- a classic phishing scenario -- and harm may occur if you believe that information.   Yes, this is a security risk, but I don't see anything unique to RDF or the semantic web about it.  In one case misleading HTML might be served, in the other case misleading RDF might be served.  Would there be some unique risk in the RDF case?

>
> >> So I think we should be worried about the RDF/Web connection if my
> >> analysis is right. a) This might be turned into an
> >> argument why HTTP
> >> isn't appropriate for SemWeb use.
> >
> > I don't follow that.  Can you explain?
>
> If we can't get access to all the resources that are named by http
> URIs, then this undermines the universality of http URIs as a gateway
> for documentation (at least) about all resources.

I don't think that follows at all.  Just because a URI is not dereferenceable because its owner failed to follow good practice in minting it (using unnecessary %-escaping, for example) that does not undermine the general utility of http URIs being dereferenceable.

> [ . . . ]
> >> If you wanted to repair this in  quick hacky way, one could amend
> >> both the RDF or RDF/XML specifications so that they take in to
> >> account the http escaping rules for names.

I am still trying to understand what needs to be fixed.  To summarize what I've understood so far:

 - RDF doesn't normalize URIs, hence in RDF
<http://neurocommons.org/page/Main%5FPage> and
<http://neurocommons.org/page/Main_Page> may denote different things.

 - HTTP does normalize URIs, hence
<http://neurocommons.org/page/Main%5FPage> and
<http://neurocommons.org/page/Main_Page> *must* access the same IR.

 - Therefore, if dereferencing http://neurocommons.org/page/Main_Page yields a 200 response, then we can safely conclude that *both*
<http://neurocommons.org/page/Main%5FPage> and
<http://neurocommons.org/page/Main_Page> are IRs in RDF, and that (at least) their IR aspects are the same.  (However, there still seems to be disagreement about whether they necessarily denote the same *resource*.)

 - On the other hand, if it yields a 303 response, we cannot automatically infer any particular relationship between
<http://neurocommons.org/page/Main%5FPage> and
<http://neurocommons.org/page/Main_Page> .

 - Therefore, in the 200 case, we get a bit more information than we get in the 303 case.

It seems as though something like POWDER cannot avoid talking about canonicalized URIs, because access decisions that involve URI patterns must be made on canonicalized URIs.  So I would think that POWDER applications would want to use a predicate like :hasCanonicalURI, and say things like:

  <http://neurocommons.org/page/Main%5FPage> :hasCanonicalURI
        "http://neurocommons.org/page/Main_Page"^^xsd:anyURI .

which could just as well be written as:

  <http://neurocommons.org/page/Main%5FPage> :hasCanonicalURI
        "http://neurocommons.org/page/Main%5FPage"^^xsd:anyURI .

So if a user is not supposed to access pages whose URIs match http://example/restricted/.* , then an application might use a rule something like this to deny access:

  {   ?user :wantsToAccess ?page .
      ?page :hasCanonicalURI ?u .
      ?u :matchesRegex "http://example/restricted/.*" .
  } =>
          { ?user :isDeniedAccessTo ?page . }

Of course, the same page may have multiple canonical URIs associated with it, and there is nothing to prevent access from being granted through one URI path while being denied through another.

I'm having trouble seeing why this needs to be fixed.  Why does this need to be fixed?



David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@hp.com
http://www.hp.com/go/software

Statements made herein represent the views of the author and do not necessarily represent the official views of HP unless explicitly so stated.
Received on Monday, 7 July 2008 17:01:24 UTC