RE: [P&C] Low-level internationalization, XML deserialization, IRI or URI, IRI normalization from Marcin Hanclik on 2009-08-13 (public-webapps@w3.org from July to September 2009)

From: Marcin Hanclik <Marcin.Hanclik@access-company.com>
Date: Thu, 13 Aug 2009 14:31:52 +0200
To: "marcosc@opera.com" <marcosc@opera.com>
CC: "public-webapps@w3.org" <public-webapps@w3.org>, "connolly@w3.org" <connolly@w3.org>, "public-iri@w3.org" <public-iri@w3.org>
Message-ID: <FAA1D89C5BAF1142A74AF116630A9F2C2890C657C9@OBEEX01.obe.access-company.com>
Hi Marcos,

Thanks for your comments.

>>Which developers in particular did you have in mind?
I meant developers who could develop widgets "by hand", without any automation.
(These may the same people as here:
http://lists.w3.org/Archives/Public/public-device-apis/2009Apr/0013.html)

>>How do you
>>propose we conduct this survey?
I do not know.
I thought that some developers read this mail thread and could simply respond if they care.
If not, then maybe it is early enough to simply specify normalization, mandatory usage of UTF-8 or anything else clarifying the P&C spec?

>>Maybe we can ask BONDI to conduct a
>>survey as part of their upcoming Widget coding thing?
Good idea.

>>Sure. Please feel free to start this discussion with them.
I will draft an email.

>>I'm ok to add this. So, in Step 7, for a given attribute (x):
>>
>>1. if the value of attribute x is not UTF-8, let value-x be the value
>>of attribute x encoded as UTF 8.
>>2. normalize value-x by applying section 5 of RFC3987.

OK.
Ad 1. US-ASCII is UTF-8, so for me 2. = 1.


>>WRT 2 above, that is not clear enough for me. I will need help to
>>specify that properly.
I think 2. can be left as is, 1. could be removed.
I will put this into the email to i18n and I will help you in specifying this fragment.

Thanks.

Kind regards,
Marcin

Marcin Hanclik
ACCESS Systems Germany GmbH
Tel: +49-208-8290-6452  |  Fax: +49-208-8290-6465
Mobile: +49-163-8290-646
E-Mail: marcin.hanclik@access-company.com

-----Original Message-----
From: marcosscaceres@gmail.com [mailto:marcosscaceres@gmail.com] On Behalf Of Marcos Caceres
Sent: Friday, August 07, 2009 2:00 PM
To: Marcin Hanclik
Cc: public-webapps@w3.org; connolly@w3.org; public-iri@w3.org
Subject: Re: [P&C] Low-level internationalization, XML deserialization, IRI or URI, IRI normalization

On Tue, Jul 28, 2009 at 12:27 PM, Marcin
Hanclik<Marcin.Hanclik@access-company.com> wrote:
> Hi Marcos,
>
>>>Yeah, that seems reasonable. I've added it.
> I have not seen your change, I do not know where to look for it.

See http://dev.w3.org/2006/waf/widgets/Overview_TSE.html#iri-attribute


> Anyway, I think the actual problem remains unsolved.
>
> Maybe we should ask the developers?

Which developers in particular did you have in mind? How do you
propose we conduct this survey? Maybe we can ask BONDI to conduct a
survey as part of their upcoming Widget coding thing?

> The target of widgets seems to be to address the most possible audience of developers and create a simple solution mainly for them. WUA may get a bit complex, I think.
>

I don't understand the above.

> If e.g. a WUA exposes some nice feature specified as IRI (i.e. with characters outside of US-ASCII), then it mainly affects the developers who potentially may not have Unicode-capable editor.
>

Right.

> config.xml can be specified e.g. in Shift-JIS or some other location specific charset (that people are used to), so Unicode-requirement also seems not to solve the problem, but just imposes something new.
>

UTF-8 just serves as a base line... yes, there will be interop
problems if UAs don't support the encodings used in the wild. However,
that is an implementation detail me thinks. We have prescribed at
least one encoding to be present (all UA MUST support UTF-8), which I
think is a good basis for interop.

> Also maybe we should ask i18n people?

Maybe we should. Please do.

> So we may want to specify the following (one of, more ideas welcome):
>
> a) the definition of feature names SHOULD/MUST use only US-ASCII characters. IMHO, this would mean that we actually assume that feature names are URIs, not IRIs.
>

I don't like this. Authors will do whatever they want and implementers
of features should be able to call features whatever they want. I
think the best we could do here is just provide an authoring guideline
saying, "use US-ASCII for feature names".

> Rationale/advantage: simplicity
> Disadvantage: we are not i18n anymore

right.

> b) Specify in P&C that the IRI/URI - once retrieved from configuration document - must be normalized according to the algorithm specified in RFC3987 section 5.
>

I'm ok to add this. So, in Step 7, for a given attribute (x):

1. if the value of attribute x is not UTF-8, let value-x be the value
of attribute x encoded as UTF 8.
2. normalize value-x by applying section 5 of RFC3987.

WRT 2 above, that is not clear enough for me. I will need help to
specify that properly.

> Section 5.1 http://tools.ietf.org/html/rfc3987#section-5.1 says:
> "Applications using IRIs as identity tokens with no relationship to a
>   protocol MUST use the Simple String Comparison (see section 5.3.1)."
> It may be valid for P&C.

Yes, that would be valid for widget@id, for instance. As it would be
valid for feature@name.

> My preference is b), but I think that prior to a potential update of the P&C, we need some discussion, as said e.g. with i18n.
>

Sure. Please feel free to start this discussion with them.

> Additionally - but this may be out of the scope of P&C - we may have to specify how to compare the feature names (and probably the other attributes) with the IRIs/features implemented in the WUA.
>

We already went through this in the past. Comparison was going to be 1
to 1 (uri identifiers are treated as opaque strings or namespace
identifiers, hance use literal comparisons). No normalization is done.

Kind regards,
Marcos
--
Marcos Caceres
http://datadriven.com.au


________________________________________

Access Systems Germany GmbH
Essener Strasse 5  |  D-46047 Oberhausen
HRB 13548 Amtsgericht Duisburg
Geschaeftsfuehrer: Michel Piquemal, Tomonori Watanabe, Yusuke Kanda

www.access-company.com

CONFIDENTIALITY NOTICE
This e-mail and any attachments hereto may contain information that is privileged or confidential, and is intended for use only by the
individual or entity to which it is addressed. Any disclosure, copying or distribution of the information by anyone else is strictly prohibited.
If you have received this document in error, please notify us promptly by responding to this e-mail. Thank you.
Received on Thursday, 13 August 2009 12:33:22 UTC