Absolute IRIs (Was: Re: IRI guidance) from Eric Prud'hommeaux on 2011-04-29 (public-rdf-wg@w3.org from April 2011)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Fri, 29 Apr 2011 13:50:04 -0400
To: Alex Hall <alexhall@revelytix.com>
Cc: Ivan Herman <ivan@w3.org>, nathan@webr3.org, RDF WG <public-rdf-wg@w3.org>
Message-ID: <20110429175003.GF2737@w3.org>
* Alex Hall <alexhall@revelytix.com> [2011-04-29 09:42-0400]
> On Fri, Apr 29, 2011 at 8:14 AM, Eric Prud'hommeaux <eric@w3.org> wrote:
> 
> > * Ivan Herman <ivan@w3.org> [2011-04-29 08:24+0200]
> > >
> > > On Apr 28, 2011, at 23:59 , Eric Prud'hommeaux wrote:
> > > <snip/>
> > > >>
> > > >> Unfortunately this can lead to unexpected consequences, such as an
> > > >> application dereferencing the IRI http://xn--rsum-bpad.example.org(not sure
> > > >> how GMail will escape that -- that's the punycode version) and getting
> > a
> > > >> document with a description of some resource with IRI
> > > >> http://résumé.example.org <http://xn--rsum-bpad.example.org> <
> > http://xn--rsum-bpad.example.org> (Unicode
> > > >> version).  To help prevent this, we could discourage the use of IRIs
> > with
> > > >> encoded IDNs in RDF, similar to how the existing spec discourages the
> > use of
> > > >> URI Refs with percent-escaped characters.
> > > >
> > > > I think this leads down the path of not using IRIs. When dereferencing
> > > > an HTTP IRI, one has to punyify the domain name and percentulate the
> > > > path, mapping http://伝言.example/?user=أكرم<http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85>to
> > > > http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85 . Any IRI
> > > > with characters outside of the legal URI characters will map to a
> > > > differently spelled URI, necessitating some typing of these respective
> > > > strings. If we're taking away the sharp knives, we'll have to take
> > > > away non-ascii characters and díäcrìtïcâl markç.
> > >
> > > Eric, I am not sure I understand that. The proposal is to say that, in
> > RDF, there should be a preference for the UTF version of the URI-s, ie, I
> > should, if possible, opt for http://伝言.example/?user=أكرم<http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85>rather than the the other version. What happens underneath if I dereference
> > that URI and send it to tools for an HTTP get or anything similar is a
> > separate issue. Indeed, on an English keyboard typing something even as
> > simple as http://iván.herman.net <http://xn--ivn-fla.herman.net> is a pain
> > for a user, but that is a practical problem which is again outside the realm
> > of RDF.
> >
> > Ahh, I interpreted "discourage … encoded IDNs" as discouraging
> > UTF-8-encoded IRIs while the intent was discouraging punycode-encoded.
> > Sorry.
> >
> >
> No worries -- "encoded" is too vague a term, I should have been more
> specific.
> 
> 
> >
> > > Ie: saying that we keep to the current version of RDF, ie, equality of
> > IRI-s is based on a character-by-character comparison (like now) but giving
> > an advice to, if possible, use the IRI without the punycode seems to be a
> > reasonable way of handling this... What else would you propose instead?
> >
> > I'm all for character-by-character comparison. I think the emphasis should
> > be on keeping track of the type. Here's a draft of a minimal change to the
> > Concepts document:
> > [[
> > 6.2 RDF Graph
> > An RDF triple contains three components:
> >
> >    * the subject, which is an IRI or a blank node
> >    * the predicate, which is an IRI
> >    * the object, which is an IRI, a literal or a blank node
> > …
> > 6.4 IRI
> >
> > An IRI within an RDF graph (an RDF URI reference) is a Unicode string
                               ^^^^^^^^^^^^^^^^^^^^^^
> > [UNICODE] that conforms to the definition of an IRI in RFC2397 [IRI].
> > Implementations may issue warnings concerning the use of RDF terms
> > designated to be IRIs but which are not conformant to the IRI
> > definition.
> >
> 
> I wonder if it's too confusing to mention IRI and RDF URI reference in the
> same breath, in the very first sentence no less?  I'd prefer to keep URIs
> out of the discussion as much as possible.

oops, pasto. intended just "An IRI within an RDF graph is a Unicode
string".


> > Note: RFC2397 Section 3.1. "Mapping of IRIs to URIs" specifies the
> > mapping to URIs, which must be done, for instance, when constructing
> > an HTTP GET request. This specification does not define a relationship
> > between an IRI and the URI to which it is mapped.
> >
> > Note: RFC2397 Section 5.3.1. "Simple String Comparison" specifies
> > equivalence for IRIs used as identity tokes, as they are in RDF
> > graphs.
> >
> > Note: IRIs are compatible with the anyURI datatype as defined by XML
> > schema datatypes [XML-SCHEMA2], constrained to be an absolute rather
> > than a relative URI reference.
> >
> > Note: IRIs are compatible with International Resource Identifiers as
> > defined by [XML Namespaces 1.1].
> >
> > Note: The restriction to absolute IRIs is found in this abstract
> > syntax. When there is a well-defined base, concrete syntaxes, such as
> > RDF/XML, may permit relative IRIs as a shorthand for such absolute IRIs.
> > ]]
> >
> 
> I think this part could use some clarification.  An IRI is, by definition,
> absolute per section 2.2 of RFC3987.  IRI references may be absolute or
> relative, but resolve to an absolute IRI (as described in section 1.3).
> 
> To muddy the waters even further, the "absolute-IRI" grammar construct in
> section 2.2 omits the fragment identifier, but I cannot find any references
> to this either internal or external to the RFC.
> 
> So I think we should (a) specifically call out out the definition in section
> 2.2; and (b) avoid any mention of the terms "IRI reference" or "absolute
> IRI" except in an informative context.

I'm not personally keen on this absolute IRI restriction. I included
it in this proposal in order to minimize the permutations being
examined at once ("minimal change"). For usability, I find
  Data:
    <s> <p> <o> .
  Query:
    ASK { ?s <p> ?o }

very intuitive when you don't have to specifically call out a base
URI. Using IRI references instead of IRIs would permit the above query
to work in e.g. Jena (which currently presumes absolute IRIs).


> -Alex
> 
> 
> 
> >
> > Note, I changed "RDF URI reference" to "IRI" instead of "RDF IRI" as I'm
> > not convinced that an IRI which appears in an RDF document is of a different
> > type than an IRI which appears in an email or in the location bar of my
> > browser.
> >
> > Here I proposed saying that IRIs and their URIs are simply different
> > things, eliding the syntactic hint
> > x [[
> > x Note: Because of the risk of confusion between RDF URI references that
> > x would be equivalent if derefenced, the use of %-escaped characters in
> > x RDF URI references is strongly discouraged. See also the URI
> > x equivalence issue of the Technical Architecture Group [TAG].
> > x ]]
> >
> > I agree with Alex that punycoded domain names and %-escaped characters
> > should be mentioned in the same breath. From a human-engineering
> > perspective, I think any text specifying syntactic hints to help observers
> > visually discriminate them discourages programmers from being conscientious
> > about the distinction. However, if we want to encourage the world to mint
> > IRIs which we can procedurally calculate from URIs (motivated perhaps by
> > associating HTTP traffic with assertions about resources), we could add some
> > text encouraging an unambiguous transformation:
> >
> > [[
> > Note: RFC2397's mapping of IRIs to URIs does not alter "%25" or
> > punycoded domain names, which means that the IRIs
> > <http://伝言.example/R&D <http://xn--9oqp94l.example/R&D>> and <
> > http://xn--9oqp94l.example/R%25D> will
> > both be transformed to the URI to <http://xn--9oqp94l.example/R%25D>.
> > RFC2397 section 3.2. "Converting URIs to IRIs" defines a function
> > which produces a single IRI for any URI. When minting IRIs for RDF,
> > it is encouraged to mint forms which can round trip to a URI form
> > and back.
> > ]]
> >
> >
> > > Cheers
> > >
> > > Ivan
> > >
> > >
> > > >
> > > >
> > > >> -Alex
> > > >
> > > > --
> > > > -ericP
> > > >
> > >
> > >
> > > ----
> > > Ivan Herman, W3C Semantic Web Activity Lead
> > > Home: http://www.w3.org/People/Ivan/
> > > mobile: +31-641044153
> > > PGP Key: http://www.ivan-herman.net/pgpkey.html
> > > FOAF: http://www.ivan-herman.net/foaf.rdf
> > >
> > >
> > >
> > >
> > >
> >
> > --
> > -ericP
> >

-- 
-ericP
Received on Friday, 29 April 2011 17:50:38 UTC