W3C home > Mailing lists > Public > uri@w3.org > October 2000

IURIs with bidi text

From: Yves Savourel <ysavourel@translate.com>
Date: Sun, 15 Oct 2000 17:09:03 -0600
Message-ID: <002501c036fc$ec3c30c0$67b6fea9@sykes.com>
To: <uri@w3.org>
Hello,

I would like to ask for confirmation on how to represent URIs that include bidi text.

For example, the following URL: http‎:‎/‎/www‎.الأمان‎.com‬ contain the Arabic word for "security". Looking at the various draft documents about IURI I assume it should go through the following steps:

1-- Represent the URI in UCS with the LRE and PDF characters at each end and the LRM prefixing each reserved character. That would give the following (with the non-ASCII character between brackets):

<202A>http<200E>:<200E>/<200E>/www<200E>.<0627><0644><0623><0645><0627><0646><200E>.com<202C>

2-- Then convert it into UTF-8.

3-- Then escape any escapable octet. This would give the following:

%E2%80%AAhttp%E2%80%8E:E2%80%8E/%E2%80%8E/www%E2%80%8E.%D8%A7%D9%84%D8%A3%D9%85%D8%A7%D9%86%E2%80%8E.com%E2%80%AC

Is this a correct example of implementing an Internationalized URI with bidi text? Or I've got something wrong in the process?

In addition, in an XML document, as per section 4.2.2 of the specs, is it correct that I should use this last form directly in the document, and not rely on the XML processor to do the transformation?

Thanks
-yves
Received on Sunday, 15 October 2000 19:09:29 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:02 UTC