IURIs with bidi text

Hello,

I would like to ask for confirmation on how to represent URIs that include bidi text.

For example, the following URL: http‎:‎/‎/www‎.الأمان‎.com‬ contain the Arabic word for "security". Looking at the various draft documents about IURI I assume it should go through the following steps:

1-- Represent the URI in UCS with the LRE and PDF characters at each end and the LRM prefixing each reserved character. That would give the following (with the non-ASCII character between brackets):

<202A>http<200E>:<200E>/<200E>/www<200E>.<0627><0644><0623><0645><0627><0646><200E>.com<202C>

2-- Then convert it into UTF-8.

3-- Then escape any escapable octet. This would give the following:

%E2%80%AAhttp%E2%80%8E:E2%80%8E/%E2%80%8E/www%E2%80%8E.%D8%A7%D9%84%D8%A3%D9%85%D8%A7%D9%86%E2%80%8E.com%E2%80%AC

Is this a correct example of implementing an Internationalized URI with bidi text? Or I've got something wrong in the process?

In addition, in an XML document, as per section 4.2.2 of the specs, is it correct that I should use this last form directly in the document, and not rely on the XML processor to do the transformation?

Thanks
-yves

Received on Sunday, 15 October 2000 19:09:29 UTC