- From: Alan Ruttenberg <alanruttenberg@gmail.com>
- Date: Sun, 29 Jun 2008 05:08:06 -0400
- To: public-awwsw@w3.org
- Cc: Stasinos Konstantopoulos <konstant@iit.demokritos.gr>, Ivan Herman <ivan@w3.org>, Dan Connolly <connolly@w3.org>, Phil Archer <parcher@icra.org>, W3C SW Coordination Group <w3c-semweb-cg@w3.org>, Matt Womer <mdw@w3.org>, "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
This note is triggered by a discussion on the SWCG group about POWDER and it's desire to discuss, in OWL the relation of a URI to the thing it denotes. Specifically they want to have a regular expression on the URI define a class of resources. I think this is a bit tricky, and it raises an interesting, and possibly problematic, interaction between http and rdf. It is also prompted by a comment in the OWL working group suggesting xsd:anyURI is a subclass of xsd:string. Suppose we have: <http://purl.org/obo/obi.owl> rdf:has_name "http://purl.org/obo/ obi.owl"^^xsd:anyURI Now. As I understand it, URIs in RDF are compared character by character, as lexically written. http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-URI-reference > Two RDF URI references are equal if and only if they compare as > equal, character by character, as Unicode strings. > In XML Schema terms, the mapping of lexical space to value space of rdf:anyURI (if one was defined) would be identity. For xsd:anyURI the lexical to value mapping must take in to account the (schema dependent) unescaping of percent encoded characters. (They would seem to me to make the pattern facet of xsd:anyURI rather difficult to implement in practice, as the pattern matching happens in value space). http://www.w3.org/TR/xmlschema-2/#rf-pattern > ·pattern· provides for: > > Constraining a ·value space· to values that are denoted by literals > which match a specific ·regular expression·. > So compare: 1. http://neurocommons.org/page/Main%5FPage 2. http://neurocommons.org/page/Main_Page 3. http://neurocommons.org/page%2FMainPage All three are different URIs, but 1 and 2, but not 3, will get you to the same web page. Specifically, the value space of xsd:anyURI is the canonicalized URI, that is, the unescaped version, as far as I can tell, and this is dependent on the scheme, as escaping and unescaping is scheme dependent. By my read, 1 and 2 are equal in value space and the value is string-equal to #2. However, RDF URIs don't work like this. Effectively, equality (a value comparison) is checked in what corresponds to the *lexical* space of anyURI. So these are three different URIs. These all say the same thing: <http://purl.org/obo/obi.owl> rdf:has_name "http://purl.org/obo/ obi.owl"^^xsd:anyURI <http://purl.org/obo/obi.owl> rdf:has_name "http://purl.org/obo/obi% 2Eowl"^^xsd:anyURI <http://purl.org/obo/obi.owl> rdf:has_name "http://purl.org/obo/obi.% 6Fwl"^^xsd:anyURI <http://purl.org/obo/obi.owl> rdf:has_name "http://purl.org/obo/ obi.ow%77"^^xsd:anyURI ... IF has_name's range is xsd:anyURI. But depending on perhaps the type of <http://purl.org/obo/obi.owl> either one or all of these might be true. <http://purl.org/obo/obi.owl> rdf:has_name "http://purl.org/obo/ obi.owl" <http://purl.org/obo/obi.owl> rdf:has_name "http://purl.org/obo/obi% 2Eowl" <http://purl.org/obo/obi.owl> rdf:has_name "http://purl.org/obo/obi.% 6Fwl" <http://purl.org/obo/obi.owl> rdf:has_name "http://purl.org/obo/ obi.ow%77" -------------------------- Suppose I have an resource <http://neurocommons.org/page/Main_Page> and I make a request for GET http://neurocommons.org/page/Main%5FPage (implicitly xsd:anyURI) Should or should not the response be the same as if I did GET http://neurocommons.org/page/Main_Page (implicitly xsd:anyURI) In fact, I will get the same responses (always, and by definition of the http protocol) If <http://purl.org/obo/obi.owl> is an IR, and it is strictly defined by the function that maps to representations, then we would conclude that <http://neurocommons.org/page/Main_Page> owl:sameAs <http:// neurocommons.org/page/Main%5FPage> However, what should happen if <http://purl.org/obo/obi.owl> is not an IR? According to RDF, <http://neurocommons.org/page/Main%5FPage> a priori could have *absolutely nothing* to do with <http://neurocommons.org/ page/Main_Page>. The above owl:sameAs is concluded not based on anything in RDF, but by analysis of HTTP. However, we have no separate way to ask for these two resources using HTTP. One might argue that since the 303 response is just "see other" or "you might be interested in this too", there is no harm done. (Using "#" doesn't fix this, btw). But if people put RDF there, and we believe the RDF, then there could be mistakes easily made. So I think we should be worried about the RDF/Web connection if my analysis is right. a) This might be turned into an argument why HTTP isn't appropriate for SemWeb use. b) It points to an possible *actual* difference between IRs and non IRs that ought to be measurable in some sense (first that I know of, other than the tautological 200 response). c) It make life difficult for those poor POWDER folks trying to figure out how to use OWL to do their bidding. d) Means we have to look a little more carefully at dbooth's hasURI relation. I have assumed in the above, that in the absence of a crystal clear stance on the issue URI references in in RDF-MT > This document does not take any position on the way that URI > references may be composed from other expressions, e.g. from > relative URIs or QNames; the semantics simply assumes that such > lexical issues have been resolved in some way that is globally > coherent, so that a single URI reference can be taken to have the > same meaning wherever it occurs. > > the RDF/XML equality conditions on RDF URI references are normative. If you wanted to repair this in quick hacky way, one could amend both the RDF or RDF/XML specifications so that they take in to account the http escaping rules for names. Best, Alan http://www.w3.org/TR/xmlschema-2/#anyURI > 3.2.17 anyURI > > [Definition:] anyURI represents a Uniform Resource Identifier > Reference (URI). An anyURI value can be absolute or relative, and > may have an optional fragment identifier (i.e., it may be a URI > Reference). This type should be used to specify the intention that > the value fulfills the role of a URI as defined by [RFC 2396], as > amended by [RFC 2732]. > > The mapping from anyURI values to URIs is as defined by the URI > reference escaping procedure defined in Section 5.4 Locator > Attribute of [XML Linking Language] (see also Section 8 Character > Encoding in URI References of [Character Model]). This means that a > wide range of internationalized resource identifiers can be > specified when an anyURI is called for, and still be understood as > URIs per [RFC 2396], as amended by [RFC 2732], where appropriate to > identify resources. > > Note: Section 5.4 Locator Attribute of [XML Linking Language] > requires that relative URI references be absolutized as defined in > [XML Base] before use. This is an XLink-specific requirement and is > not appropriate for XML Schema, since neither the ·lexical space· > nor the ·value space· of the anyURI type are restricted to absolute > URIs. Accordingly absolutization must not be performed by schema > processors as part of schema validation. > Note: Each URI scheme imposes specialized syntax rules for URIs in > that scheme, including restrictions on the syntax of allowed > fragment identifiers. Because it is impractical for processors to > check that a value is a context-appropriate URI reference, this > specification follows the lead of [RFC 2396] (as amended by [RFC > 2732]) in this matter: such rules and restrictions are not part of > type validity and are not checked by ·minimally conforming· > processors. Thus in practice the above definition imposes only very > modest obligations on ·minimally conforming· processors. http://www.cs.tut.fi/~jkorpela/rfc/2396/full.html#2.4.2 > 2.4.2. When to Escape and Unescape > > A URI is always in an "escaped" form, since escaping or > unescaping a > completed URI might change its semantics. Normally, the only time > escape encodings can safely be made is when the URI is being > created > from its component parts; each component may have its own set of > characters that are reserved, so only the mechanism responsible for > generating or interpreting that component can determine whether or > not escaping a character will change its semantics. Likewise, a URI > must be separated into its components before the escaped characters > within those components can be safely decoded. > > In some cases, data that could be represented by an unreserved > character may appear escaped; for example, some of the unreserved > "mark" characters are automatically escaped by some systems. If > the > given URI scheme defines a canonicalization algorithm, then > unreserved characters may be unescaped according to that algorithm. > For example, "%7e" is sometimes used instead of "~" in an http URL > path, but the two are equivalent for an http URL. >
Received on Sunday, 29 June 2008 09:08:50 UTC