W3C home > Mailing lists > Public > public-webapps@w3.org > July to September 2009

RE: [Widget URI] Internationalization, widget IRI?

From: Marcin Hanclik <Marcin.Hanclik@access-company.com>
Date: Mon, 7 Sep 2009 13:27:26 +0200
To: "marcosc@opera.com" <marcosc@opera.com>
CC: Robin Berjon <robin@berjon.com>, public-webapps WG <public-webapps@w3.org>
Message-ID: <FAA1D89C5BAF1142A74AF116630A9F2C2890C66026@OBEEX01.obe.access-company.com>
Hi Marcos,

As a summary of the URI/IRI-related issues, we have currently the following as far as I can tell:
1. URI/IRI normalization in P&C [1], it is currently at I18N [2]
2. Widget URI issues related to internationalization [3]

The URI/IRI normalization in P&C is mainly for attribute values that are to be IRIs. At present these are:
a) @id in <widget>
b) @href in <author>
c) @href in <license>
d) @name in <feature>

Your use cases seem to be related to the above, since you quote non-ASCII character in the @src of <content>.
They are exactly the same with regard to the above issues 1. and 2.
They differ on the CP437/UTF8 level.

The widgets URI is on the character level and my point was about naming it URI (octet-level, whereas IRI operates clearly on character level).

My comments to the details:

P&C addresses the transcoding from CP437 to UTF8 [4] ( however, only as SHOULD, so maybe it should be also SHALL? This was not raised yet and it is probably late now):
" For the sake of comparison and matching, it is recommended that a user agent treat all Zip-relative paths as [UTF-8]."

The "problematic" character in your case is 'ñ', U+00F1.
In CP437 it is has the value 0xA4, in ISO-8859-1 it is 0xF1.
In UTF8 this character is encoded as the sequence of the following octets: 0xC3 0xB1.

The assumption of P&C seems to be that everything gets converted to UTF8.
The only issue is that this is an assumption.
My case of IRI and your cases with file name are similar with regard to this assumption.

Specifically in case of IRI we have the issue of pct encoding, in your cases we have "just" character-set transcoding.

I hope it is clearer now.

Thanks,
Marcin


[1] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0644.html

[2] http://lists.w3.org/Archives/Public/public-i18n-core/2009JulSep/0065.html

[3] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0339.html

[4] http://www.w3.org/TR/widgets/#zip-relative-paths

[5] http://www.w3.org/TR/widgets/#the-content-element


Marcin Hanclik
ACCESS Systems Germany GmbH
Tel: +49-208-8290-6452  |  Fax: +49-208-8290-6465
Mobile: +49-163-8290-646
E-Mail: marcin.hanclik@access-company.com

-----Original Message-----
From: marcosscaceres@gmail.com [mailto:marcosscaceres@gmail.com] On Behalf Of Marcos Caceres
Sent: Friday, September 04, 2009 11:11 AM
To: Marcin Hanclik
Cc: Robin Berjon; public-webapps WG
Subject: Re: [Widget URI] Internationalization, widget IRI?

On Thu, Sep 3, 2009 at 1:32 PM, Marcin
Hanclik<Marcin.Hanclik@access-company.com> wrote:
> Hi Robin,
>
> Thanks for your comments.
>
> I believe the terminology could be clarified once the IRI/URI issue from P&C gets solved in I18N, hopefully together with HREF and all related stuff.
>
> +1 for simplification.
>

I'm still not understanding the problem in the P&C spec.

Let me try to walk through a simple widget. Marcin, pretend I'm 9
years old and explain the problem to me in the most simplest of terms
possible (i.e., don't cite me URI/IRI spec stuff because all that
stuff makes no sense, just talk to me about bytes... I'm one those
smarty 9 year-olds, who knows about bytes, but as a consequence gets
pushed around by bullies...:)).

USE CASE 1
1. I have a widget called foo.wgt. The widget contains 2 files:
mañana.html and config.xml.
2. The file names of both files are encoded in the zip archive as
UTF-8 (explicitly marked as such by the presence of a flag).
3. In the config doc, which is encoded in iso-8859-1, it says:
   <content src="mañana.html"/>
4. The UA reads the value of src attribute and converts it to UTF-8.
5. The UA matches the string that represents the value of src to the
"mañana.html" file entry.
6. done?


USE CASE 2
1. I have a widget called foo.wgt. The widget contains 2 files:
mañana.html and config.xml.
2. The file names of both files are encoded in the zip archive as
CP-437 (explicitly marked as such).
2.1 The UA maps all the files names in the zip archive to UTF-8 equivalents.
3. In the config doc, which is encoded in iso-8859-1, it says:
   <content src="mañana.html"/>
4. The UA reads the value of the src attribute and converts it to UTF-8.
5. The UA matches the string that represents the value of src to the
"mañana.html" file entry.
6. done?


--
Marcos Caceres
http://datadriven.com.au


________________________________________

Access Systems Germany GmbH
Essener Strasse 5  |  D-46047 Oberhausen
HRB 13548 Amtsgericht Duisburg
Geschaeftsfuehrer: Michel Piquemal, Tomonori Watanabe, Yusuke Kanda

www.access-company.com

CONFIDENTIALITY NOTICE
This e-mail and any attachments hereto may contain information that is privileged or confidential, and is intended for use only by the
individual or entity to which it is addressed. Any disclosure, copying or distribution of the information by anyone else is strictly prohibited.
If you have received this document in error, please notify us promptly by responding to this e-mail. Thank you.
Received on Monday, 7 September 2009 11:28:35 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:49:33 GMT