Re: feature negotiation syntax

On Mon, 12 May 1997, Koen Holtman wrote:

> Larry Masinter:
> >
> >I suggest avoiding any of the complexities of URI comparison
> >and make feature tag comparison be exact, octet-by-octet.
> 
> Yes, that would simplify things without loosing any of the power.
> I'll put it in the next version.  I do assume that with
> octet-by-octed, you mean after interpretation of % escapes.  I think
> it is a good idea to allow these, especially in the tag values.
> 
> >Given the enormous flamage around UTF8-URLs, I think
> >you might be in trouble unless you specify very carefully
> >exactly which subset of URIs you're actually going to allow.
> 
> I'm taking the PEP approach of allowing *any* URI.  Do you expect this
> to cause flamage?

I don't expect this to cause flamage, but it can very well
be expected that a *limitation* of values will create flamage.

This probably won't happen soon, because it usually takes some
time for people to use new technology in localized contexts.
Then after they use it localized, it takes some more time
for people to realize that the various localizing solutions
don't really fit together well. Thinking ahead will pay off!

In terms of the problem with the %-escapes, this is indeed
a problem of generic URI comparison. But discussion in the
URN group has shown that it is not related to internationalization.
Non-ASCII characters in URIs use the upper half of the 8-bit
range, and for this range, %HH is a pure transfer encoding.

If headers are strictly 7-bit, then it will always be %HH
for these cases (then you only have to worry about %ab vs.
%AB,...). If headers can contain 8-bit (the warnings already
can, and this is the direction I think we should move to),
then we can specify 8-bit always for transfer, and don't
need %HH for these cases, and can indeed compare on octets.
If we allow both (that's what usually happens in practice,
even if it's not in the spec) then we can just normalize
on one or the other.

The problem with %HH is the syntactically significant
ASCII characters, for which %HH is a true escaping
rather than a transport mechanism. This doesn't affect
internationalization.

And of course, in accordance with the URN syntax, the
URL process draft, and other work in this area, the
mapping/encoding from characters to octets should be
UTF-8.

Regards,	Martin.

Received on Wednesday, 14 May 1997 09:16:51 UTC