RE: Comments on "Guidelines and Registration Procedures for new URI Schemes"

Hi Roy, et al,

This is the W3C I18N Core WG's response to your message. 

Roy Fielding wrote:

> > [1] The passage "In particular, the mapping should describe the
> > mechanisms for encoding binary or character strings within valid
> > character sequences in a URI.". There is already a mapping mechanism
> > in rfc 3987, sec. 3.1. It should be made sure in your document that
> > the mechanism you are describing is compatible with the rfc 3987
> > mechanism. One reason for this is the role of UTF-8, which is handled
> > in the mechanism of rfc 3987.
> 
> Felix, all of your comments are requesting that the document defining
> the URI scheme registry should have dependencies on the IRI RFC.
> That is neither appropriate nor necessary, since IRI already defines
> the mapping from URI to IRI in 3987. 

It is not the mapping from URI to IRI that concerns us here, but rather the reverse. In the item above, the draft is recommending (good!) that schemes specify the mapping from character strings to URIs, and, in section 2.6, the draft requires that compatibility with IRI be considered (good!). The draft already makes a normative reference to IRI there and it would be useful, in our opinion, to reference IRI in the other places cited in our comments.

For the security related comments, it might be overkill to mention 3987 in section 2.7 (which does not attempt to mention an exhaustive list of places to look, thank goodness). A good source for character spoofing information might be Unicode Technical Report #36, in any case [1].

> RFC 3987 is not at the same
> level of standardization as URIs: it is a new technology that is
> defined as a mapping from URIs, not something that determines the
> requirements for URIs. 

But the URI scheme registry is at a lower level of standardization than even IRI is at, no? And new URI schemes should really consider IRI ramifications. What would the reason be for allowing *new* URI schemes to be registered that allow (or require!) IRI-incompatible mappings from character strings to URI?

The draft in question raises the issue of character encodings. It just doesn't consistently cite IRI as a resource for mapping text to URI. We think that promoting adoption of URI schemes that are wholly compatible with IRI is a good thing and one way to help ensure this would be by (more strongly) recommending the use of UTF-8 via 3987 Section 3.1.

> Introducing dependencies on new technology
> RFCs is unwise given that the actual requirements for URI schemes
> are already defined in a full standard.

Again, we don't think this is what we're requesting. What we're saying is, effectively: if you wish to register a new URI scheme then you really should make it compatible with IRI. URI, for very good technical and historical reasons, does not mandate anything in relation to IRI. But we don't think there is a reason not to recommend that new URI schemes adopt RFC 3987 as the process for mapping characters to URIs (and think there are very good reasons, on the contrary, to actively recommend it).

Thus we think that our comments #1 and #2 ought to be incorporated (and #3 be given serious consideration).

Best Regards,

Addison

[1] http://www.unicode.org/reports/tr36/

Addison P. Phillips
Globalization Architect, Quest Software
Chair, W3C Internationalization Core Working Group

Internationalization is not a feature.
It is an architecture. 

Received on Tuesday, 6 September 2005 16:27:56 UTC