<unwise> VCHARs (was: RFC 2822 email addresses in tag URIs)

Roy T. Fielding wrote:

>> can I now submit the RfC 3986 appendix D.2> problems as erratum ?

> No, it isn't an error.

IBTD, 2396 <uric> and its predecessor 1738 <xchar> used to be
"any ASCII you might find in an URL".  Excl. '#', because the
fragments were not considered to be a part of the URL.

But 3986 D2 removed ! ' ( ) * from its new <uric> set instead
of adding # [ ].

Similar 2396 <mark> and its predecessor 1738 <safe> + <extra>
used to be the same as <unreserved> minus ALPHA + DIGIT.

3986 D2 kept <mark> as is instead of moving ! ' ( ) * to <uric>

> You are not supposed to use the terms in appendix D for new
> specifications.

That's clear, they are used to interpret pre-3986 URI-schemes.

> intentionally conservative to avoid the creation of bad URIs.

But it's not more okay to use ! ' ( ) * freely in URLs.  And an
old rule "%-encode everything that's no <uric>" would miss the
special case IPv6address.

> Why don't you simply use the current rules in the body of
> 3986?

Now that's exactly what I want, but there's no list of the new
non-<uric> VCHARs, that's the old 2396 <delims> + <unwise> and
its predecessor 1738 <unsafe>.

> Just ignore appendix D for new specifications.

That doesn't help to port old specifications, especially some
missing schemes of 1738.  It's also relevant for 2368bis, the
last draft still used <uric>, or I confuse it with another I-D.

If I got it right then the following nine VCHARs in addition to
'%' always have to be %-encoded under 3986:  " < > \ ^ ` { | }

That's not mentioned anymore in 3986.  It requires a script for
set operations on VCHAR to determine the nine "ugly" characters,
formerly known as <delims> + <unwise> or <unsafe>.

                              Bye, Frank

Received on Friday, 14 October 2005 21:42:13 UTC