Re: [P&C] Low-level internationalization, XML deserialization, IRI or URI, IRI normalization

On Tue, Jul 28, 2009 at 12:27 PM, Marcin
Hanclik<> wrote:
> Hi Marcos,
>>>Yeah, that seems reasonable. I've added it.
> I have not seen your change, I do not know where to look for it.


> Anyway, I think the actual problem remains unsolved.
> Maybe we should ask the developers?

Which developers in particular did you have in mind? How do you
propose we conduct this survey? Maybe we can ask BONDI to conduct a
survey as part of their upcoming Widget coding thing?

> The target of widgets seems to be to address the most possible audience of developers and create a simple solution mainly for them. WUA may get a bit complex, I think.

I don't understand the above.

> If e.g. a WUA exposes some nice feature specified as IRI (i.e. with characters outside of US-ASCII), then it mainly affects the developers who potentially may not have Unicode-capable editor.


> config.xml can be specified e.g. in Shift-JIS or some other location specific charset (that people are used to), so Unicode-requirement also seems not to solve the problem, but just imposes something new.

UTF-8 just serves as a base line... yes, there will be interop
problems if UAs don't support the encodings used in the wild. However,
that is an implementation detail me thinks. We have prescribed at
least one encoding to be present (all UA MUST support UTF-8), which I
think is a good basis for interop.

> Also maybe we should ask i18n people?

Maybe we should. Please do.

> So we may want to specify the following (one of, more ideas welcome):
> a) the definition of feature names SHOULD/MUST use only US-ASCII characters. IMHO, this would mean that we actually assume that feature names are URIs, not IRIs.

I don't like this. Authors will do whatever they want and implementers
of features should be able to call features whatever they want. I
think the best we could do here is just provide an authoring guideline
saying, "use US-ASCII for feature names".

> Rationale/advantage: simplicity
> Disadvantage: we are not i18n anymore


> b) Specify in P&C that the IRI/URI - once retrieved from configuration document - must be normalized according to the algorithm specified in RFC3987 section 5.

I'm ok to add this. So, in Step 7, for a given attribute (x):

1. if the value of attribute x is not UTF-8, let value-x be the value
of attribute x encoded as UTF 8.
2. normalize value-x by applying section 5 of RFC3987.

WRT 2 above, that is not clear enough for me. I will need help to
specify that properly.

> Section 5.1 says:
> "Applications using IRIs as identity tokens with no relationship to a
>   protocol MUST use the Simple String Comparison (see section 5.3.1)."
> It may be valid for P&C.

Yes, that would be valid for widget@id, for instance. As it would be
valid for feature@name.

> My preference is b), but I think that prior to a potential update of the P&C, we need some discussion, as said e.g. with i18n.

Sure. Please feel free to start this discussion with them.

> Additionally - but this may be out of the scope of P&C - we may have to specify how to compare the feature names (and probably the other attributes) with the IRIs/features implemented in the WUA.

We already went through this in the past. Comparison was going to be 1
to 1 (uri identifiers are treated as opaque strings or namespace
identifiers, hance use literal comparisons). No normalization is done.

Kind regards,
Marcos Caceres

Received on Friday, 7 August 2009 12:01:22 UTC