W3C home > Mailing lists > Public > public-iri@w3.org > July 2012

RE: Query parts in IRIs in plain text mail

From: Dave Thaler <dthaler@microsoft.com>
Date: Wed, 11 Jul 2012 19:45:14 +0000
To: Dave Thaler <dthaler@microsoft.com>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, "public-iri@w3.org" <public-iri@w3.org>
Message-ID: <9B57C850BB53634CACEC56EF4853FF653B6B01A8@TK5EX14MBXW601.wingroup.windeploy.ntdev.microsoft.com>
Let me take a slightly different example:
http://www.sw.it.aoyama.ac.jp/non-existent?é
(I don't know what charset my email is in)

If the charset were iso-8859-1 then under RFC 3987 as I understand it,
this would become
http://www.sw.it.aoyama.ac.jp/non-existent?%C3%83%C2%A9
In other words, you have to convert iso-8859-1 to UTF-8 and then pct-encode
the UTF-8.

But as I understand 3987bis it would become
http://www.sw.it.aoyama.ac.jp/non-existent?%C3%A9
which would then be passed around via various APIs and protocols
that would not pass the charset along with it.
As such it would be interpreted by the receiving code as pct-encoded UTF-8:
http://www.sw.it.aoyama.ac.jp/non-existent?é
which of course it isn't.

Am I missing something?

-Dave

> -----Original Message-----
> From: Dave Thaler [mailto:dthaler@microsoft.com]
> Sent: Wednesday, July 11, 2012 9:35 AM
> To: "Martin J. Dürst"; public-iri@w3.org
> Subject: RE: Query parts in IRIs in plain text mail
> 
> Outlook 2010 + IE10:
> http://www.sw.it.aoyama.ac.jp/non-existent?résumé
> shows in the address bar (which will show IRIs).   I suspect that means UTF-8,
> but a
> better test would be one that actually has 2 pages of content at the two
> different URIs where the content tells you which one it was (so the display of
> the address won't matter).
> 
> > -----Original Message-----
> > From: "Martin J. Dürst" [mailto:duerst@it.aoyama.ac.jp]
> > Sent: Wednesday, July 11, 2012 3:47 AM
> > To: public-iri@w3.org
> > Subject: Query parts in IRIs in plain text mail
> >
> > This is a test based on the comment by Dave Thaler that IRIs may also
> > appear in (plain text) email.
> >
> > If my MUA (Eudora/Penelope/Thunderbird) does what I told it, this mail
> > should be in iso-8859-1 (Latin-1). If you have any way to check that
> > it's still
> > Latin-1 at your end, please do so.
> >
> > The IRI to test,
> >     http://www.sw.it.aoyama.ac.jp/non-existent?résumé
> > is the same as the one I used in the SVG test.
> >
> > It won't resolve (and doesn't need to), but should show you where your
> > browser wants to go
> > (http://www.sw.it.aoyama.ac.jp/non-existent?r%C3%A9sum%C3%A9 if it
> > uses
> > UTF-8 for the IRI->URI conversion,
> > http://www.sw.it.aoyama.ac.jp/non-existent?r%E9sum%E9 if it uses iso-
> > 8859-1).
> >
> > Please report any results on the list.
> >
> > Regards,   Martin.
> >
> 
> 
> 
Received on Wednesday, 11 July 2012 19:45:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 11 July 2012 19:45:47 GMT