Re: I-D ACTION:draft-fenner-literal-zone-02.txt from Martin Duerst on 2005-11-08 (uri@w3.org from November 2005)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Tue, 08 Nov 2005 19:22:44 +0900
To: JINMEI Tatuya / 神明達哉 <jinmei@isl.rdc.toshiba.co.jp>
Cc: Bill Fenner <fenner@research.att.com>, ipv6@ietf.org, uri@w3.org, "Roy T. Fielding" <fielding@gbiv.com>
Message-Id: <6.0.0.20.2.20051108184213.06abb260@localhost>
Hello Tatsuya,

I think Roy Fielding has expressed the URI side of this
story way more succinctly than I could ever do. I fully
agree with him. Below a few additional points.

At 11:17 05/11/08, JINMEI Tatuya / 神明達哉 wrote:
 >>>>>> On Mon, 07 Nov 2005 19:04:13 +0900,
 >>>>>> Martin Duerst <duerst@it.aoyama.ac.jp> said:
 >
 >>> It would be very confusing for the user to see they can simply reuse
 >>> the output of the diagnostic tool in some cases and they need to
 >>> convert the output in some other cases.
 >
 >> An additional idea would be to change some of the tools such as
 >> ping6 to accept and use '+' rather than '%'. Given the software
 >> counts for URI-processing software and IPv6 software, that's
 >> probably much easier than trying to force the non-escaping
 >> '%' into URI syntax (already a full standard).
 >
 >IMO (admitting YMMV), URI-processing software and IPv6 software are
 >both so deployed that we cannot simply make "this one is not fully
 >deployed so fixing this side should be easier".  I indeed made a
 >similar argument about a year ago:
 >http://www1.ietf.org/mail-archive/web/ipv6/current/msg03987.html
 >
 >In addition, while I might buy this argument if the proposed syntax in
 >draft-fenner-... could avoid forcing special processing in
 >URI-processing software, it actually doesn't.  The fact is that
 >"URI-processing software" will need modification anyway, whether we
 >adopt the draft-fenner-... syntax or just allow the RFC4007 format.

Yes. But as Roy has explained, it's the effect of this syntax
on URI-processing software that isn't updated that is the main
concern. We can't expect a user to know which software is updated
and which is not.

 >Meanwhile, requiring the existing tools that understand the RFC4007
 >'%' format to support '+' effectively means deprecating the current
 >description of RFC4007 and updating the RFC itself, since this is
 >exactly the case when the proposed format defined in RFC4007 is
 >expected to be used.

Well, it wouldn't be the first RFC to be updated. The URI spec
was updated several times. And if zone ids in URIs are not
an interoperability issue, then zone ids in other places
shouldn't be an interoperability issue either.

 >On the other hand, I'm not sure whether the 'special processing'
 >required for the URI-processing software means requiring of the URI
 >standard itself.  If we regard this as a user interface issue for
 >applications (see below), can't we regard the conversion from
 >"http://[fe80::abcd%fxp0]/" to "http://[fe80::abcd]/ within the
 >application as a "pre-processing before URI-processing", without
 >breaking the URI standard?  (I'm afraid this 'wording trick' is
 >actually not acceptable by the URI community, but I'll see
 >anyway...)

Well, There are indeed some processing steps that happen in
that way. The best example I know is that it's possible to
put a space e.g. in a src attribute in an <img> tag, and
browsers will just convert that to %20. Similar in the
address/location bar of a browser. But that's something
that can happen on an uniform base, with any URI.

What you are asking for would be much more special, and
would require careful parsing. And it would mean that it
has to be added to *every* URI processor, otherwise the
'%' will confuse the further processing of the URI. But
adding it to every processor isn't really possible, of
course. Please note that '%' is the only character that
has a special function in every part of an URI.

If it's only about changing "http://[fe80::abcd%fxp0]/"
to "http://[fe80::abcd]/", I don't see why the user
can't do that. And how many users are there actually
who will use raw ipv6 addresses with zone identifiers
in URIs? I'm all for making it easy for users, if there's
really lots of them.

What we probably could do as a compromize would be to
keep the 'gethostbyname' interface at using '%', if that
is already strongly established (the new URI implementations
can easily convert from an '+' in an URI to a '%' in a
'gethostbyname', in particular if our draft tells them
do do that). On the other hand, visible notation would
then move to use the '+' for overall user convenience.
I haven't seen any inherent arguments against the '+'
from your side, and I haven't overlooked anything.
If this is true, then the situation is actually quite
asymetric:

RFC 4007 uses '%', but any other character would work
as well (and libraries could easily accept more than
one character, if that was necessary for backwards
compatibility). On the other hand, RFC 3986 already
uses '%' for something else, so that character is
no longer available. Many others would work, indeed
'+' is only one example, if you want another one,
that might work, too.

 >> Also, this is not a matter of formality, it is a matter of
 >> deployment. What if something like "http://[v1.fe80::abcd%fxp0]/"
 >> suddenly gets converted into "http://[v1.fe80::abcd<0x0F>xp0]/"
 >> (<0x0F> standing for a 0F byte, which is Shift In).
 >
 >This would be bad, of course.  But I don't think that matter much
 >because "http://[v1.fe80::abcd+fxp0]/" doesn't work either with
 >today's URI parsers.

If "doesn't work" means "maybe doesn't resolve", then yes.
This is true even for something like http://www.ietf.org.
The network is never perfect. But as Roy described, for
'%', we are looking at a much more varied, and trubling,
pattern of failure.
Same argument for various schemes: No URI resolver is
required to understand all schemes (how could it).


Regards,   Martin.
Received on Tuesday, 8 November 2005 10:26:45 UTC