W3C home > Mailing lists > Public > public-rdf-wg@w3.org > April 2011

Re: IRI guidance

From: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
Date: Fri, 29 Apr 2011 15:37:59 +0100
Message-ID: <4DBACD47.1020500@liris.cnrs.fr>
To: Alex Hall <alexhall@revelytix.com>
CC: Ivan Herman <ivan@w3.org>, Eric Prud'hommeaux <eric@w3.org>, Nathan Rixham <nathan@webr3.org>, RDF WG <public-rdf-wg@w3.org>, Dan Brickley <danbri@danbri.org>
I agree to drop the round-tripping. I nevertheless suggest to make
Ivan's proposal a little more explicit (though I assume this was not
intended to be the complete text):

  Note: according to RFC3987, different IRIs are transformed into the
  same URI. For example <http://伝言.example/R&D> and <http://xn--
  9oqp94l.example/R%25D> will both be transformed to the URI to
  <http://xn--9oqp94l.example/R%25D>. However, those two IRIs are *not*
  equivalent in RDF.

  When minting *new* IRIs from scratch or from URIs, the use of
  %-escaped characters or punycode encoded IDN-s are strongly
  discouraged (except of course for the %-escaping imposed by the IRI
  syntax).

It would probably be good to add one or two sentence giving the
rationale of this recommendation. For IRIs minted from scratch, my main
argument would be readability.

For converting IRIs to URIs, this is more tricky to argue, as one has to
make a guess about which IRI was actually conveyed by the URI, and that
guessing can not be 100% correct...

  pa


On 04/29/2011 03:04 PM, Alex Hall wrote:
> On Fri, Apr 29, 2011 at 9:53 AM, Ivan Herman <ivan@w3.org
> <mailto:ivan@w3.org>> wrote:
> 
> 
>     On Apr 29, 2011, at 15:42 , Pierre-Antoine Champin wrote:
> 
>     > On 04/29/2011 02:29 PM, Ivan Herman wrote:
>     >>
>     >> On Apr 29, 2011, at 15:17 , Pierre-Antoine Champin wrote:
>     >> <snip/>
>     >>>>>
>     >>>>> [[
>     >>>>> Note: RFC2397's mapping of IRIs to URIs does not alter "%25" or
>     >>>>> punycoded domain names, which means that the IRIs
>     >>>>> <http://伝言.example/R&D <http://xn--9oqp94l.example/R&D>> and
>     <http://xn--9oqp94l.example/R%25D> will
>     >>>>> both be transformed to the URI to
>     <http://xn--9oqp94l.example/R%25D>.
>     >>>>> RFC2397 section 3.2. "Converting URIs to IRIs" defines a function
>     >>>>> which produces a single IRI for any URI. When minting IRIs for
>     RDF,
>     >>>>> it is encouraged to mint forms which can round trip to a URI form
>     >>>>> and back.
>     >>>>> ]]
>     >>>>
>     >>>> I think that the round-trip issue may not be clear (it is not
>     100% clear to me either:-).
>     >>>
>     >>> I, on the other hand, think the round-trip is a nice way to put
>     it, and
>     >>> quite well defined (although, see my concern #1 below).
>     >>> An example of which IRI is produced from the URI above would
>     help, though.
>     >>>
>     >>
>     >> My understanding is that, concentrating on the IDN case, the
>     >> IRI->punycode does not work in 100% cases, although the punycode->IRI
>     >> does. So round-trip would then mean using the punycode.
>     >
>     > This is also my reading of RFC3987. Hence my concern #2 below.
>     >
>     >> Is this what we want?
>     >
>     > I would sure prefer <http://伝言.example/R&D
>     <http://xn--9oqp94l.example/R&D>> to be the encouraged URI
>     > rather than <http://xn--9oqp94l.example/R&D
>     <http://xn--9oqp94l.example/R&D>> .
> 
>     Absolutely. But that means referring to round-tripping is _not_ what
>     we want there!
> 
> 
> I think the notion of round-tripping only works if you stick with
> percent-encoding and avoid punycode -- note that section 3.2 (URI to
> IRI) doesn't decode the punycode-encoded IDN, it is only concerned with
> unescaping %-encoded octets.  The only mapping guaranteed to produce a
> valid URI in 3.1 is percent-encoding (although such a mapped URI may not
> meet scheme-specific restrictions); punycoding is optional.  I don't
> think we should ignore punycoding altogether, so maybe we should discard
> the sentence about round-tripping and just go with Ivan's suggestion:
> 
> 'The use of %-escaped characters or punycode encoded IDN-s are strongly
> discouraged.'
> 
> -Alex
> 
>  
> 
> 
>     Ivan
> 
> 
>     >
>     >  pa
>     >
>     >
>     >>
>     >> Ivan
>     >>
>     >>
>     >>
>     >>>> Why not adding something like
>     >>>>
>     >>>> 'In other words, the use of %-escaped characters or punycode
>     encoded IDN-s are strongly discouraged.'
>     >>>
>     >>> It definitely would not hurt.
>     >>>
>     >>> I have three concerns, though:
>     >>>
>     >>> 1/ from what I read in RFC3987, section 3.2, the mapping from
>     URI to IRI
>     >>> is not completely specified (refering to section 6.1 of that
>     same RFC)
>     >>>
>     >>> 2/ the URI-to-IRI described in section 3.2 does not eliminate
>     punycode.
>     >>> So <http://伝言.example/R&D <http://xn--9oqp94l.example/R&D>> is
>     *not* round-trip-safe, but
>     >>> <http://xn--9oqp94l.example/R&D
>     <http://xn--9oqp94l.example/R&D>> is.
>     >>>
>     >>> 3/ it should be made very clear that this is about minting IRIs from
>     >>> scratch or from URIs, but *not* about converting IRIs (as IRIs that
>     >>> would convert to the same URI are not consider equivalent).
>     >>>
>     >>> pa
>     >>>
>     >>>
>     >>>>
>     >>>> Ivan
>     >>>>
>     >>>>
>     >>>>
>     >>>>>
>     >>>>>
>     >>>>>> Cheers
>     >>>>>>
>     >>>>>> Ivan
>     >>>>>>
>     >>>>>>
>     >>>>>>>
>     >>>>>>>
>     >>>>>>>> -Alex
>     >>>>>>>
>     >>>>>>> --
>     >>>>>>> -ericP
>     >>>>>>>
>     >>>>>>
>     >>>>>>
>     >>>>>> ----
>     >>>>>> Ivan Herman, W3C Semantic Web Activity Lead
>     >>>>>> Home: http://www.w3.org/People/Ivan/
>     >>>>>> mobile: +31-641044153
>     >>>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>     >>>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>     >>>>>>
>     >>>>>>
>     >>>>>>
>     >>>>>>
>     >>>>>>
>     >>>>>
>     >>>>> --
>     >>>>> -ericP
>     >>>>
>     >>>>
>     >>>> ----
>     >>>> Ivan Herman, W3C Semantic Web Activity Lead
>     >>>> Home: http://www.w3.org/People/Ivan/
>     >>>> mobile: +31-641044153
>     >>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>     >>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>     >>>>
>     >>>>
>     >>>>
>     >>>>
>     >>>>
>     >>>>
>     >>>
>     >>>
>     >>
>     >>
>     >> ----
>     >> Ivan Herman, W3C Semantic Web Activity Lead
>     >> Home: http://www.w3.org/People/Ivan/
>     >> mobile: +31-641044153
>     >> PGP Key: http://www.ivan-herman.net/pgpkey.html
>     >> FOAF: http://www.ivan-herman.net/foaf.rdf
>     >>
>     >>
>     >>
>     >>
>     >>
>     >
> 
> 
>     ----
>     Ivan Herman, W3C Semantic Web Activity Lead
>     Home: http://www.w3.org/People/Ivan/
>     mobile: +31-641044153
>     PGP Key: http://www.ivan-herman.net/pgpkey.html
>     FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
> 
> 
Received on Friday, 29 April 2011 14:38:32 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:41 GMT