W3C home > Mailing lists > Public > www-international@w3.org > July to September 2007

Re: Urdu IDNs: TLDs

From: Jeremy Carroll <jjc@hpl.hp.com>
Date: Tue, 31 Jul 2007 12:29:06 +0100
Message-ID: <46AF1D02.8070500@hpl.hp.com>
To: Richard Ishida <ishida@w3.org>
CC: www-international@w3.org, public-iri@w3.org, "'Sarmad Hussain'" <sarmad.hussain@nu.edu.pk>


In principle, I'm strongly opposed to 'plug-ins' for IRI processing (and 
hence IDN processing).

However, I can see the argument for  TLDs more easily than for  Characters

===

A plug-in approach is likely to address one or two leading Web browsers, 
  but would be very unlikely to also address less core applications that 
also use IRIs (e.g. any Semantic Web software).

I do not wish to be close-minded, and if a plug-in is an appropriate 
medium-term measure then so-be-it, but alarm bells start ringing when 
this is billed as a language specific process, rather than one that 
generalises ....

===

On TLDs - I think this looks like a legitimate user-need, and any 
solution involves a look-up table that maps non-English TLDs to English 
ones, e.g. from column C to A on the spreadsheet, but potentially 
multiplied many fold. In some sense, the English one ends up as the 
canonical form.

The sort of application-end solution with which I would be happiest, is 
one for which it is easy for me to support in my software too. e.g.

A) There is a specified Web site which has the mappings, which can be 
accessed both one at a time, and downloaded all at once. Ideally there 
should be some process for adding new mappings and new languages.

B) The 'plug-in' would then have its own copy of the mapping table which 
would be refreshed from the Web from time to time, and do a very simple 
replacement on the TLD.

C) This is easy to code up so that other pieces of software that wish to 
support these mappings can.

To be generally usable an RFC or similar would be needed. I would have a 
strong preference for this to be seen as part of the ToASCII operation 
in IDN processing (e.g. strip off TLD, and if it is in the lookup-table, 
replace it); so that it is clear that it is done as late-as-possible, 
during retrieval, etc. etc.

The canonical form is then the one used during retrieval - e.g. the URL, 
which for better or worse is ASCII.

Jeremy





-- 
Hewlett-Packard Limited
registered Office: Cain Road, Bracknell, Berks RG12 1HN
Registered No: 690597 England
Received on Tuesday, 31 July 2007 11:29:42 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:14 GMT