- From: Nathan <nathan@webr3.org>
- Date: Fri, 29 Apr 2011 18:57:36 +0100
- To: Eric Prud'hommeaux <eric@w3.org>
- CC: Alex Hall <alexhall@revelytix.com>, Ivan Herman <ivan@w3.org>, RDF WG <public-rdf-wg@w3.org>
Eric Prud'hommeaux wrote: > * Alex Hall <alexhall@revelytix.com> [2011-04-29 09:42-0400] >> On Fri, Apr 29, 2011 at 8:14 AM, Eric Prud'hommeaux <eric@w3.org> wrote: >> >>> * Ivan Herman <ivan@w3.org> [2011-04-29 08:24+0200] >>>> On Apr 28, 2011, at 23:59 , Eric Prud'hommeaux wrote: >>>> <snip/> >>>>>> Unfortunately this can lead to unexpected consequences, such as an >>>>>> application dereferencing the IRI http://xn--rsum-bpad.example.org(not sure >>>>>> how GMail will escape that -- that's the punycode version) and getting >>> a >>>>>> document with a description of some resource with IRI >>>>>> http://résumé.example.org <http://xn--rsum-bpad.example.org> < >>> http://xn--rsum-bpad.example.org> (Unicode >>>>>> version). To help prevent this, we could discourage the use of IRIs >>> with >>>>>> encoded IDNs in RDF, similar to how the existing spec discourages the >>> use of >>>>>> URI Refs with percent-escaped characters. >>>>> I think this leads down the path of not using IRIs. When dereferencing >>>>> an HTTP IRI, one has to punyify the domain name and percentulate the >>>>> path, mapping http://伝言.example/?user=أكرم<http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85>to >>>>> http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85 . Any IRI >>>>> with characters outside of the legal URI characters will map to a >>>>> differently spelled URI, necessitating some typing of these respective >>>>> strings. If we're taking away the sharp knives, we'll have to take >>>>> away non-ascii characters and díäcrìtïcâl markç. >>>> Eric, I am not sure I understand that. The proposal is to say that, in >>> RDF, there should be a preference for the UTF version of the URI-s, ie, I >>> should, if possible, opt for http://伝言.example/?user=أكرم<http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85>rather than the the other version. What happens underneath if I dereference >>> that URI and send it to tools for an HTTP get or anything similar is a >>> separate issue. Indeed, on an English keyboard typing something even as >>> simple as http://iván.herman.net <http://xn--ivn-fla.herman.net> is a pain >>> for a user, but that is a practical problem which is again outside the realm >>> of RDF. >>> >>> Ahh, I interpreted "discourage … encoded IDNs" as discouraging >>> UTF-8-encoded IRIs while the intent was discouraging punycode-encoded. >>> Sorry. >>> >>> >> No worries -- "encoded" is too vague a term, I should have been more >> specific. >> >> >>>> Ie: saying that we keep to the current version of RDF, ie, equality of >>> IRI-s is based on a character-by-character comparison (like now) but giving >>> an advice to, if possible, use the IRI without the punycode seems to be a >>> reasonable way of handling this... What else would you propose instead? >>> >>> I'm all for character-by-character comparison. I think the emphasis should >>> be on keeping track of the type. Here's a draft of a minimal change to the >>> Concepts document: >>> [[ >>> 6.2 RDF Graph >>> An RDF triple contains three components: >>> >>> * the subject, which is an IRI or a blank node >>> * the predicate, which is an IRI >>> * the object, which is an IRI, a literal or a blank node >>> … >>> 6.4 IRI >>> >>> An IRI within an RDF graph (an RDF URI reference) is a Unicode string > ^^^^^^^^^^^^^^^^^^^^^^ >>> [UNICODE] that conforms to the definition of an IRI in RFC2397 [IRI]. >>> Implementations may issue warnings concerning the use of RDF terms >>> designated to be IRIs but which are not conformant to the IRI >>> definition. >>> >> I wonder if it's too confusing to mention IRI and RDF URI reference in the >> same breath, in the very first sentence no less? I'd prefer to keep URIs >> out of the discussion as much as possible. > > oops, pasto. intended just "An IRI within an RDF graph is a Unicode > string". > > >>> Note: RFC2397 Section 3.1. "Mapping of IRIs to URIs" specifies the >>> mapping to URIs, which must be done, for instance, when constructing >>> an HTTP GET request. This specification does not define a relationship >>> between an IRI and the URI to which it is mapped. >>> >>> Note: RFC2397 Section 5.3.1. "Simple String Comparison" specifies >>> equivalence for IRIs used as identity tokes, as they are in RDF >>> graphs. >>> >>> Note: IRIs are compatible with the anyURI datatype as defined by XML >>> schema datatypes [XML-SCHEMA2], constrained to be an absolute rather >>> than a relative URI reference. >>> >>> Note: IRIs are compatible with International Resource Identifiers as >>> defined by [XML Namespaces 1.1]. >>> >>> Note: The restriction to absolute IRIs is found in this abstract >>> syntax. When there is a well-defined base, concrete syntaxes, such as >>> RDF/XML, may permit relative IRIs as a shorthand for such absolute IRIs. >>> ]] >>> >> I think this part could use some clarification. An IRI is, by definition, >> absolute per section 2.2 of RFC3987. IRI references may be absolute or >> relative, but resolve to an absolute IRI (as described in section 1.3). >> >> To muddy the waters even further, the "absolute-IRI" grammar construct in >> section 2.2 omits the fragment identifier, but I cannot find any references >> to this either internal or external to the RFC. >> >> So I think we should (a) specifically call out out the definition in section >> 2.2; and (b) avoid any mention of the terms "IRI reference" or "absolute >> IRI" except in an informative context. > > I'm not personally keen on this absolute IRI restriction. I included > it in this proposal in order to minimize the permutations being > examined at once ("minimal change"). For usability, I find > Data: > <s> <p> <o> . > Query: > ASK { ?s <p> ?o } > > very intuitive when you don't have to specifically call out a base > URI. Using IRI references instead of IRIs would permit the above query > to work in e.g. Jena (which currently presumes absolute IRIs). Ahh my favourite topic, it's "IRI" that we need (not absolute-IRI since no fragment). IRI = scheme ":" ihier-part [ "?" iquery ] [ "#" ifragment ] So we just say the value space is "IRI", and the lexical space can be "IRI-reference" (when coupled to a known base via serialization or a base pre-known to the environment you're currently working in). Best, Nathan
Received on Friday, 29 April 2011 17:58:30 UTC