- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Fri, 29 Apr 2011 13:50:04 -0400
- To: Alex Hall <alexhall@revelytix.com>
- Cc: Ivan Herman <ivan@w3.org>, nathan@webr3.org, RDF WG <public-rdf-wg@w3.org>
* Alex Hall <alexhall@revelytix.com> [2011-04-29 09:42-0400] > On Fri, Apr 29, 2011 at 8:14 AM, Eric Prud'hommeaux <eric@w3.org> wrote: > > > * Ivan Herman <ivan@w3.org> [2011-04-29 08:24+0200] > > > > > > On Apr 28, 2011, at 23:59 , Eric Prud'hommeaux wrote: > > > <snip/> > > > >> > > > >> Unfortunately this can lead to unexpected consequences, such as an > > > >> application dereferencing the IRI http://xn--rsum-bpad.example.org(not sure > > > >> how GMail will escape that -- that's the punycode version) and getting > > a > > > >> document with a description of some resource with IRI > > > >> http://résumé.example.org <http://xn--rsum-bpad.example.org> < > > http://xn--rsum-bpad.example.org> (Unicode > > > >> version). To help prevent this, we could discourage the use of IRIs > > with > > > >> encoded IDNs in RDF, similar to how the existing spec discourages the > > use of > > > >> URI Refs with percent-escaped characters. > > > > > > > > I think this leads down the path of not using IRIs. When dereferencing > > > > an HTTP IRI, one has to punyify the domain name and percentulate the > > > > path, mapping http://伝言.example/?user=أكرم<http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85>to > > > > http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85 . Any IRI > > > > with characters outside of the legal URI characters will map to a > > > > differently spelled URI, necessitating some typing of these respective > > > > strings. If we're taking away the sharp knives, we'll have to take > > > > away non-ascii characters and díäcrìtïcâl markç. > > > > > > Eric, I am not sure I understand that. The proposal is to say that, in > > RDF, there should be a preference for the UTF version of the URI-s, ie, I > > should, if possible, opt for http://伝言.example/?user=أكرم<http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85>rather than the the other version. What happens underneath if I dereference > > that URI and send it to tools for an HTTP get or anything similar is a > > separate issue. Indeed, on an English keyboard typing something even as > > simple as http://iván.herman.net <http://xn--ivn-fla.herman.net> is a pain > > for a user, but that is a practical problem which is again outside the realm > > of RDF. > > > > Ahh, I interpreted "discourage … encoded IDNs" as discouraging > > UTF-8-encoded IRIs while the intent was discouraging punycode-encoded. > > Sorry. > > > > > No worries -- "encoded" is too vague a term, I should have been more > specific. > > > > > > > Ie: saying that we keep to the current version of RDF, ie, equality of > > IRI-s is based on a character-by-character comparison (like now) but giving > > an advice to, if possible, use the IRI without the punycode seems to be a > > reasonable way of handling this... What else would you propose instead? > > > > I'm all for character-by-character comparison. I think the emphasis should > > be on keeping track of the type. Here's a draft of a minimal change to the > > Concepts document: > > [[ > > 6.2 RDF Graph > > An RDF triple contains three components: > > > > * the subject, which is an IRI or a blank node > > * the predicate, which is an IRI > > * the object, which is an IRI, a literal or a blank node > > … > > 6.4 IRI > > > > An IRI within an RDF graph (an RDF URI reference) is a Unicode string ^^^^^^^^^^^^^^^^^^^^^^ > > [UNICODE] that conforms to the definition of an IRI in RFC2397 [IRI]. > > Implementations may issue warnings concerning the use of RDF terms > > designated to be IRIs but which are not conformant to the IRI > > definition. > > > > I wonder if it's too confusing to mention IRI and RDF URI reference in the > same breath, in the very first sentence no less? I'd prefer to keep URIs > out of the discussion as much as possible. oops, pasto. intended just "An IRI within an RDF graph is a Unicode string". > > Note: RFC2397 Section 3.1. "Mapping of IRIs to URIs" specifies the > > mapping to URIs, which must be done, for instance, when constructing > > an HTTP GET request. This specification does not define a relationship > > between an IRI and the URI to which it is mapped. > > > > Note: RFC2397 Section 5.3.1. "Simple String Comparison" specifies > > equivalence for IRIs used as identity tokes, as they are in RDF > > graphs. > > > > Note: IRIs are compatible with the anyURI datatype as defined by XML > > schema datatypes [XML-SCHEMA2], constrained to be an absolute rather > > than a relative URI reference. > > > > Note: IRIs are compatible with International Resource Identifiers as > > defined by [XML Namespaces 1.1]. > > > > Note: The restriction to absolute IRIs is found in this abstract > > syntax. When there is a well-defined base, concrete syntaxes, such as > > RDF/XML, may permit relative IRIs as a shorthand for such absolute IRIs. > > ]] > > > > I think this part could use some clarification. An IRI is, by definition, > absolute per section 2.2 of RFC3987. IRI references may be absolute or > relative, but resolve to an absolute IRI (as described in section 1.3). > > To muddy the waters even further, the "absolute-IRI" grammar construct in > section 2.2 omits the fragment identifier, but I cannot find any references > to this either internal or external to the RFC. > > So I think we should (a) specifically call out out the definition in section > 2.2; and (b) avoid any mention of the terms "IRI reference" or "absolute > IRI" except in an informative context. I'm not personally keen on this absolute IRI restriction. I included it in this proposal in order to minimize the permutations being examined at once ("minimal change"). For usability, I find Data: <s> <p> <o> . Query: ASK { ?s <p> ?o } very intuitive when you don't have to specifically call out a base URI. Using IRI references instead of IRIs would permit the above query to work in e.g. Jena (which currently presumes absolute IRIs). > -Alex > > > > > > > Note, I changed "RDF URI reference" to "IRI" instead of "RDF IRI" as I'm > > not convinced that an IRI which appears in an RDF document is of a different > > type than an IRI which appears in an email or in the location bar of my > > browser. > > > > Here I proposed saying that IRIs and their URIs are simply different > > things, eliding the syntactic hint > > x [[ > > x Note: Because of the risk of confusion between RDF URI references that > > x would be equivalent if derefenced, the use of %-escaped characters in > > x RDF URI references is strongly discouraged. See also the URI > > x equivalence issue of the Technical Architecture Group [TAG]. > > x ]] > > > > I agree with Alex that punycoded domain names and %-escaped characters > > should be mentioned in the same breath. From a human-engineering > > perspective, I think any text specifying syntactic hints to help observers > > visually discriminate them discourages programmers from being conscientious > > about the distinction. However, if we want to encourage the world to mint > > IRIs which we can procedurally calculate from URIs (motivated perhaps by > > associating HTTP traffic with assertions about resources), we could add some > > text encouraging an unambiguous transformation: > > > > [[ > > Note: RFC2397's mapping of IRIs to URIs does not alter "%25" or > > punycoded domain names, which means that the IRIs > > <http://伝言.example/R&D <http://xn--9oqp94l.example/R&D>> and < > > http://xn--9oqp94l.example/R%25D> will > > both be transformed to the URI to <http://xn--9oqp94l.example/R%25D>. > > RFC2397 section 3.2. "Converting URIs to IRIs" defines a function > > which produces a single IRI for any URI. When minting IRIs for RDF, > > it is encouraged to mint forms which can round trip to a URI form > > and back. > > ]] > > > > > > > Cheers > > > > > > Ivan > > > > > > > > > > > > > > > > > >> -Alex > > > > > > > > -- > > > > -ericP > > > > > > > > > > > > > ---- > > > Ivan Herman, W3C Semantic Web Activity Lead > > > Home: http://www.w3.org/People/Ivan/ > > > mobile: +31-641044153 > > > PGP Key: http://www.ivan-herman.net/pgpkey.html > > > FOAF: http://www.ivan-herman.net/foaf.rdf > > > > > > > > > > > > > > > > > > > -- > > -ericP > > -- -ericP
Received on Friday, 29 April 2011 17:50:38 UTC