W3C home > Mailing lists > Public > public-rdf-wg@w3.org > April 2011

Re: IRI guidance

From: Ivan Herman <ivan@w3.org>
Date: Fri, 29 Apr 2011 17:03:55 +0200
Cc: Alex Hall <alexhall@revelytix.com>, Eric Prud'hommeaux <eric@w3.org>, Nathan Rixham <nathan@webr3.org>, RDF WG <public-rdf-wg@w3.org>, Dan Brickley <danbri@danbri.org>
Message-Id: <01119706-6D93-458B-B447-D86E4ED71695@w3.org>
To: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>

On Apr 29, 2011, at 16:37 , Pierre-Antoine Champin wrote:

> I agree to drop the round-tripping. I nevertheless suggest to make
> Ivan's proposal a little more explicit (though I assume this was not
> intended to be the complete text):
> 
>  Note: according to RFC3987, different IRIs are transformed into the
>  same URI. For example <http://伝言.example/R&D> and <http://xn--
>  9oqp94l.example/R%25D> will both be transformed to the URI to
>  <http://xn--9oqp94l.example/R%25D>. However, those two IRIs are *not*
>  equivalent in RDF.
> 
>  When minting *new* IRIs from scratch or from URIs, the use of
>  %-escaped characters or punycode encoded IDN-s are strongly
>  discouraged (except of course for the %-escaping imposed by the IRI
>  syntax).
> 
> It would probably be good to add one or two sentence giving the
> rationale of this recommendation. For IRIs minted from scratch, my main
> argument would be readability.

Same here, but I am not sure it is worth getting into a rationale... 

Ivan

> 
> For converting IRIs to URIs, this is more tricky to argue, as one has to
> make a guess about which IRI was actually conveyed by the URI, and that
> guessing can not be 100% correct...
> 
>  pa
> 
> 
> On 04/29/2011 03:04 PM, Alex Hall wrote:
>> On Fri, Apr 29, 2011 at 9:53 AM, Ivan Herman <ivan@w3.org
>> <mailto:ivan@w3.org>> wrote:
>> 
>> 
>>    On Apr 29, 2011, at 15:42 , Pierre-Antoine Champin wrote:
>> 
>>> On 04/29/2011 02:29 PM, Ivan Herman wrote:
>>>> 
>>>> On Apr 29, 2011, at 15:17 , Pierre-Antoine Champin wrote:
>>>> <snip/>
>>>>>>> 
>>>>>>> [[
>>>>>>> Note: RFC2397's mapping of IRIs to URIs does not alter "%25" or
>>>>>>> punycoded domain names, which means that the IRIs
>>>>>>> <http://伝言.example/R&D <http://xn--9oqp94l.example/R&D>> and
>>    <http://xn--9oqp94l.example/R%25D> will
>>>>>>> both be transformed to the URI to
>>    <http://xn--9oqp94l.example/R%25D>.
>>>>>>> RFC2397 section 3.2. "Converting URIs to IRIs" defines a function
>>>>>>> which produces a single IRI for any URI. When minting IRIs for
>>    RDF,
>>>>>>> it is encouraged to mint forms which can round trip to a URI form
>>>>>>> and back.
>>>>>>> ]]
>>>>>> 
>>>>>> I think that the round-trip issue may not be clear (it is not
>>    100% clear to me either:-).
>>>>> 
>>>>> I, on the other hand, think the round-trip is a nice way to put
>>    it, and
>>>>> quite well defined (although, see my concern #1 below).
>>>>> An example of which IRI is produced from the URI above would
>>    help, though.
>>>>> 
>>>> 
>>>> My understanding is that, concentrating on the IDN case, the
>>>> IRI->punycode does not work in 100% cases, although the punycode->IRI
>>>> does. So round-trip would then mean using the punycode.
>>> 
>>> This is also my reading of RFC3987. Hence my concern #2 below.
>>> 
>>>> Is this what we want?
>>> 
>>> I would sure prefer <http://伝言.example/R&D
>>    <http://xn--9oqp94l.example/R&D>> to be the encouraged URI
>>> rather than <http://xn--9oqp94l.example/R&D
>>    <http://xn--9oqp94l.example/R&D>> .
>> 
>>    Absolutely. But that means referring to round-tripping is _not_ what
>>    we want there!
>> 
>> 
>> I think the notion of round-tripping only works if you stick with
>> percent-encoding and avoid punycode -- note that section 3.2 (URI to
>> IRI) doesn't decode the punycode-encoded IDN, it is only concerned with
>> unescaping %-encoded octets.  The only mapping guaranteed to produce a
>> valid URI in 3.1 is percent-encoding (although such a mapped URI may not
>> meet scheme-specific restrictions); punycoding is optional.  I don't
>> think we should ignore punycoding altogether, so maybe we should discard
>> the sentence about round-tripping and just go with Ivan's suggestion:
>> 
>> 'The use of %-escaped characters or punycode encoded IDN-s are strongly
>> discouraged.'
>> 
>> -Alex
>> 
>> 
>> 
>> 
>>    Ivan
>> 
>> 
>>> 
>>> pa
>>> 
>>> 
>>>> 
>>>> Ivan
>>>> 
>>>> 
>>>> 
>>>>>> Why not adding something like
>>>>>> 
>>>>>> 'In other words, the use of %-escaped characters or punycode
>>    encoded IDN-s are strongly discouraged.'
>>>>> 
>>>>> It definitely would not hurt.
>>>>> 
>>>>> I have three concerns, though:
>>>>> 
>>>>> 1/ from what I read in RFC3987, section 3.2, the mapping from
>>    URI to IRI
>>>>> is not completely specified (refering to section 6.1 of that
>>    same RFC)
>>>>> 
>>>>> 2/ the URI-to-IRI described in section 3.2 does not eliminate
>>    punycode.
>>>>> So <http://伝言.example/R&D <http://xn--9oqp94l.example/R&D>> is
>>    *not* round-trip-safe, but
>>>>> <http://xn--9oqp94l.example/R&D
>>    <http://xn--9oqp94l.example/R&D>> is.
>>>>> 
>>>>> 3/ it should be made very clear that this is about minting IRIs from
>>>>> scratch or from URIs, but *not* about converting IRIs (as IRIs that
>>>>> would convert to the same URI are not consider equivalent).
>>>>> 
>>>>> pa
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Ivan
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Cheers
>>>>>>>> 
>>>>>>>> Ivan
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> -Alex
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> -ericP
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ----
>>>>>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>>>>> mobile: +31-641044153
>>>>>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>>>>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> -ericP
>>>>>> 
>>>>>> 
>>>>>> ----
>>>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>>> mobile: +31-641044153
>>>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> ----
>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>> Home: http://www.w3.org/People/Ivan/
>>>> mobile: +31-641044153
>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>>    ----
>>    Ivan Herman, W3C Semantic Web Activity Lead
>>    Home: http://www.w3.org/People/Ivan/
>>    mobile: +31-641044153
>>    PGP Key: http://www.ivan-herman.net/pgpkey.html
>>    FOAF: http://www.ivan-herman.net/foaf.rdf
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Friday, 29 April 2011 15:02:52 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:41 GMT