RE: Comments on "Guidelines and Registration Procedures for new URI Schemes" from Martin Duerst on 2005-09-06 (public-i18n-core@w3.org from July to September 2005)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Wed, 07 Sep 2005 06:39:06 +0900
To: Ted Hardie <hardie@qualcomm.com>, "Addison Phillips" <addison.phillips@quest.com>, "Roy T. Fielding" <fielding@gbiv.com>, "Felix Sasaki" <fsasaki@w3.org>
Cc: <iesg@ietf.org>, <tony+urireg@maillennium.att.com>, <uri@w3.org>, <LMM@acm.org>, <public-i18n-core@w3.org>
Message-Id: <6.0.0.20.2.20050907061023.04e2aa20@localhost>
Hello Ted, Roy, Addison, others,

I somehow agree with both sides. When I read the draft, I clearly
got the impression that IRIs should be mentioned more clearly.
But I also agree that this has to be done carefully.

More below.

At 02:56 05/09/07, Ted Hardie wrote:
 >
 >At 9:27 AM -0700 9/6/05, Addison Phillips wrote:
 >>
 >>But the URI scheme registry is at a lower level of standardization than
 >even IRI is at, no?
 >
 >The current proposal is that the registry document be a BCP, which is
 >somewhat outside
 >the proposed-draft-full stairstep.  As you know, that gets complicated, but
 >"lower level of
 >standardization" doesn't sound right.
 >
 >>And new URI schemes should really consider IRI ramifications.
 >
 >I believe that the draft does ask them to consider IRI ramifications.
 >
 >>What would the reason be for allowing *new* URI schemes to be registered
 >that allow (or require!) IRI-incompatible mappings from character strings 
to URI?
 >
 >How could this happen?  All URIs are IRIs.  If they have specified a
 >mapping from character
 >strings to URIs that is valid, it is automatically valid as an IRI.

Well, yes, there are basically two ways you can be compatible with IRIs.
One is that any URI is an IRI, so whatever character encoding you
choose, IRIs will be able to handle that as %-escapes.

The second is that you choose UTF-8 to encode your characters, and
then IRIs actually allow you to use the original characters.

The main problem with the current sentence:
"URI scheme definitions SHOULD be compatible with that specification."
may not be the SHOULD. (I agree with Ted that trying to force people
with a MUST may not be the right thing, especially in the context and
history of URI scheme registration.)

The main problem of the sentence above is that it doesn't say in which
way new URI schemes SHOULD be compatible with the IRI spec. That's
something that definitely should be fixed, because I'm assuming that
what is meant is the second way, not the first one (with would be
just an empty statement).

So what about changing this sentence to something like:
URI scheme definitions SHOULD be compatible with that specification,
which primarily means to use UTF-8 [5] for encoding text fields.

[I'm open to different wordings that say the same thing, of course.]

This has the additional beneficial side-effect of citing reference
[5], which otherwise goes un-cited. It also brings up the most
important aspect of serious (second variant above) IRI compatibility,
namely the use of UTF-8. Unfortunately, the IRI spec, like most
specs, requires quite some time to read, and mentioning UTF-8
will help a lot of people who just read these guidelines, the
same way many other sections of the guidelines mention other
salient points that help with URI conformance.


 >>The draft in question raises the issue of character encodings. It just
 >doesn't consistently cite IRI as a resource for mapping text to URI. We
 >think that promoting adoption of URI schemes that are wholly compatible
 >with IRI is a good thing and one way
 >to help ensure this would be by (more strongly) recommending the use of
 >UTF-8 via 3987 Section 3.1.
 >
 ><snip>
 >
 >>
 >>Again, we don't think this is what we're requesting. What we're saying is,
 >effectively: if you wish to register a new URI scheme then you really
 >should make it compatible with IRI. URI, for very good technical and
 >historical reasons, does not mandate a
 >nything in relation to IRI. But we don't think there is a reason not to
 >recommend that new URI schemes adopt RFC 3987 as the process for mapping
 >characters to URIs (and think there are very good reasons, on the contrary,
 >to actively recommend it).
 >
 >The problem we're facing, though, is that every effort in the past to be
 >prescriptive about
 >URI schemes during the registration process has resulted in folks minting
 >URI schemes
 >without registration.  Some of those turned out to be in conflict, and some
 >not really
 >valid URIs.  Trying to get this down to a process that folks will actually use
 >and which will actually result in syntactically valid URIs is the key goal
 >(at least for
 >me, personally).  Making folks aware of the IRI work and that there are
 >parts of it
 >they may re-use makes lots of sense to me; there will likely be schemes
 >registered by folks who weren't aware of it.  But I'm very reluctant to
 >strengthen this:
 >
 >>[3] Sec. 2.6, you write: "URI scheme definitions SHOULD be compatible with
 >that specification.".
 >>It would be good to have a MUST here.
 >
 >If folks wish to register syntactically valid URI schemes that don't meet
 >that MUST, I think
 >we need to let them.

Yes. A SHOULD is a clear RECOMMENDATION for using IRI compatibility.
It also makes sense because on a lower level, the URI spec doesn't
require to use US-ASCII for encoding character data into URIs.
It would be perfectly okay according to the URI spec if somebody
use EBCDIC, and got a lot of %-encodings. Of course the URI spec
recommends against that, but doesn't forbid that.

 >the IRI  documents make clear that those can be
 >treated as IRIs, and
 >any effort to force further compliance seems likely to get us back to the
 >previous problem.

Yes. More important to say exactly what we mean by complicance.

Regards,    Martin.


 >Certainly, they should be give the pointers and asked to consider it, but
 >if they still say
 >"no", I don't think using the registration mechanism to seek compliance
 >will succeed.  It
 >has not, I believe, so far.
 >
 >Just my personal opinion,
 >			regards,
 >				Ted Hardie
Received on Tuesday, 6 September 2005 22:18:00 UTC