Re: Rewrite of feature tag syntax rules

On Wed, 14 May 1997, Larry Masinter wrote:

> It's a common difficulty with URL processing, to the point
> where we wrote a special section in the revised URL draft
> on "when to escape and when not to escape".

This is a very good section. Implicitly, it says "don't
escape if you don't have to.". Maybe that should be made
more explicit, in an attempt to make equivalence easier.
(of course with due wording for special cases such as "~").


> But the short
> answer is that you cannot 'unescape' a URL except when you are
> parsing it into its component parts. Thus, URL equivalence
> using %XX = <character represented by that byte> is unacceptable. 

Straight as it stands, it is unacceptable. But with some
modification, it would work rather well. Divide octets
in two categories, reserved and non-reserved. For reserved
octets, do case equivalence on the X in %XX. For the
non-reserved, use %XX = <octet represented by that byte>.
The "reserved" category could be the broad collection of
characters possibly reserved according to the URL syntax
draft. The rest would be non-reserved.

There are of course failure cases, but it is extremely
rare that somebody writes a "/" as %2F in a context
where it can but doesn't need to be escaped. The only
actual case I have seen both %HH and direct encoding
is "~", which is a very special case and not reserved.

The failure cases, if they really turn, would weed out
themselves naturally. There is no reason to restrict
ourselves and limit functionality.

Regards,	Martin.

Received on Thursday, 15 May 1997 11:48:43 UTC