W3C home > Mailing lists > Public > public-iri@w3.org > August 2009

[i18n+P&C] IRI/URI normalization

From: Marcin Hanclik <Marcin.Hanclik@access-company.com>
Date: Fri, 14 Aug 2009 16:13:16 +0200
To: "public-i18n-core@w3.org" <public-i18n-core@w3.org>
CC: "public-webapps@w3.org" <public-webapps@w3.org>, "public-iri@w3.org" <public-iri@w3.org>
Message-ID: <FAA1D89C5BAF1142A74AF116630A9F2C2890C6584A@OBEEX01.obe.access-company.com>
Hi i18n,

Within "Widgets 1.0: Packaging & Configuration" [1] specification we have identified a potential issue related to feature names being IRIs and to corresponding feature discovery algorithm.

The details of the problem are summarized in the thread starting with [2].
A few excerpts from the P&C specification are listed below as [3].

The short summary of the problem is as follows:

1. The widget configuration document may contain only US-ASCII characters, and thus conform to P&C.

2. The feature name may match IRI grammar and not match URI grammar, i.e. it could contain some non-US-ASCII characters, e.g.
http://example.com/鏚zki如iewnik查嬌這wy.

3. Currently the IRI matching is based on direct match of characters from the IRI supported by the WUA and specified in the configuration document.

4. To use the non-US-ASCII feature-name, I would percent-encode it, as e.g. in [2]. (This seems to be the core of the problem, namely usage of feature-name specified in one language within the configuration document and text editor using another language/encoding).

5. The currently specified IRI-matching algorithm would fail, since percent-encoded IRI (i.e. conforming to the URI grammar on the octet and character level) specified in the configuration document would not match the actual IRI supported by the WUA.


Proposed solutions (OR-ed):

a. Define a rule similar to "10.1.4 Rule for Getting a Single Attribute Value" (or a statement in that rule) that would specify the IRI/URI normalization according to RFC3987 (section 5.3.2.3).

b. Modify section 10.1.18, "A feature element:" with a statement about IRI/URI normalization from 3987.

c. Mandate only UTF-8 encoded configuration documents and disallow other encodings (like Shift-JIS, ISO-XY etc).

d. Mandate only US-ASCII feature-names (probably bad/against internationalization).

e. Define another normalization algorithm (not recommended).


The WebApps WG is awaiting your kind guidance in the resolution of the above problem.
At present we assume that IRI/URI normalization based on RFC3987 is the most convenient solution and we need your advice to accommodate the current best practices.

Thanks.

Kind regards,
Marcin

[1] http://www.w3.org/TR/2009/CR-widgets-20090723/
[2] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0365.html
[3] Relevant excerpts from P&C specification:

a)
IRI attribute
An attribute defined as containing a valid IRI. A valid IRI is one that matches the IRI token of the [RFC3987] specification.
A user agent must retrieve the value for a IRI attribute using the rule for getting a single attribute value.

b)
10.1.4 Rule for Getting a Single Attribute Value
The rule for getting a single attribute value is given in the following algorithm. The algorithm always returns a string, which may be empty.
1. Let result be the value of the attribute.
2. In result, replace any sequences of space characters (in any order) with a single U+0020 SPACE character.
3. In result, remove any leading or trailing U+0020 SPACE character.
4. Return result.

c)
10.1.18 Processing Algorithm
...
#
Let feature-name be the result of applying the rule for getting a single attribute value to the value of the name attribute.
#
If a required attribute is used, let required-feature be the result of applying the rule for getting a single attribute value to the required attribute. If required-feature is not a valid boolean value, let the value of required-feature be the value 'true'.
#
If feature-name is not a valid IRI, and required-feature is true, then treat this widget as an invalid Zip archive.
#
If feature-name is not supported by the user agent, and required-feature is true, then treat this widget as an invalid Zip archive.
#
If feature-name is not supported by the user agent, and required-feature is false, then this element, its attributes, and its children are in error and must be ignored by the user agent. Stop processing this element and proceed to the next element in the elements list.


Marcin Hanclik
ACCESS Systems Germany GmbH
Tel: +49-208-8290-6452  |  Fax: +49-208-8290-6465
Mobile: +49-163-8290-646
E-Mail: marcin.hanclik@access-company.com

________________________________________

Access Systems Germany GmbH
Essener Strasse 5  |  D-46047 Oberhausen
HRB 13548 Amtsgericht Duisburg
Geschaeftsfuehrer: Michel Piquemal, Tomonori Watanabe, Yusuke Kanda

www.access-company.com

CONFIDENTIALITY NOTICE
This e-mail and any attachments hereto may contain information that is privileged or confidential, and is intended for use only by the
individual or entity to which it is addressed. Any disclosure, copying or distribution of the information by anyone else is strictly prohibited.
If you have received this document in error, please notify us promptly by responding to this e-mail. Thank you.
Received on Friday, 14 August 2009 14:14:37 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 April 2012 19:51:55 GMT