- From: olivier Thereaux <ot@w3.org>
- Date: Fri, 18 Jan 2008 15:27:22 +0900
- To: Martin Duerst <duerst@it.aoyama.ac.jp>
- Cc: Tools dev list <public-qa-dev@w3.org>
Hi Martin,
I was doing some tests with IRIs in perl and your name kept cropping
up in documentation, so I was wondering if you could answer some of my
doubts. Do you know what the state of adoption of IRIs (and in
particular IDNs) in perl?
I have seen some IDN-related modules (e.g [1]) being released, but it
seems that the top obstacle to nicely handling IRIs in perl is that
the URI module [2] is not IRI-friendly. As my little test script
(attached, but not worth much) showed the URI constructor ignores and
trashes all non-ascii characters in the host [3].
[1] Net::IDN::Encode
[2] http://search.cpan.org/~gaas/URI-1.35/URI.pm
[3] http://search.cpan.org/~gaas/URI-1.35/URI.pm#CONSTRUCTORS
I was hoping I'd be able to 1) construct the URI object and THEN 2)
prepname and encode to punycode the hostname with something like:
$uri->host( domain_to_ascii($uri->host) );
but that won't work because by that time all the non-ascii characters
in the hostname have already been trashed by URI::Escape. The other
solution would be to first encode into punycode, then construct the
URI object, but that means reinventing the wheel and parsing the URI
by hand (to get the host part) first.
So, that's not satisfying. What is surprising me is that apparently
there is nothing in the tracker for this module mentioning IDNs and
punycode. Maybe noone has yet suggested to the module maintainers that
instead of trashing all non-ascii chars, they should be attempting a
punycode conversion?
Do you recall any such discussion? Have you already experimented in
this area?
Thanks.
--
olivier
Received on Friday, 18 January 2008 06:27:41 UTC