Re: draft-ietf-http-state-man-mec-03: $Version and path from Foteos Macrides on 1997-09-23 (ietf-http-wg@w3.org from July to September 1997)

From: Foteos Macrides <MACRIDES@sci.wfbr.edu>
Date: Tue, 23 Sep 1997 13:59:10 -0500 (EST)
To: dmk@research.bell-labs.com
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com, http-state@research.bell-labs.com
Message-Id: <01INZFVN2ICI002N2Y@SCI.WFBR.EDU>
dmk@research.bell-labs.com (Dave Kristol) wrote:
>Gisle Aas <aas@bergen.sn.no> wrote:
>
>> dmk@research.bell-labs.com (Dave Kristol) writes:
>> 
>> > Gisle Aas <aas@bergen.sn.no> wrote on 19 Sep 1997 11:59:27 +0200:
>> > 
>> >   > what I have done now is to let URL-encoded chars and unencoded chars
>> >   > match and then let "%2F" and "/" be the exception.  Perhaps ";" should
>> >   > be special too?
>> > 
>> > I'm not sure I understand what you're saying.  It sounds like you're saying
>> > the % escapes wouldn't be interpreted, except sometimes.  So
>> > 	Path="/%66oo"
>> > and	Path="/foo" 
>> > would be different, but
>> > 	Path="/foo%2F"
>> > and	Path="/foo/"
>> > would be the same.
>> 
>> No, I meant the opposite.  "%2F" denotes a literal slash in a path
>> segment.  "/" is the path segment separator.  They are not the same
>> thing and I don't think they should match (but "%2f" and "%2F" is the
>> same thing).  "%66" and "f" is encodings for the same unreserved
>> character, so they match.
>> 
>> draft-fielding-url-syntax-07.txt says:
>> 
>> | 2.3. Unreserved Characters
>> |
>> |   Data characters which are allowed in a URL but do not have a reserved
>> |   purpose are called unreserved.  These include upper and lower case
>> |   letters, decimal digits, and a limited set of punctuation marks and
>> |   symbols.
>> [...]
>> | 4.3.2. Path Component
>> |
>> |   The path component contains data, specific to the site (or the scheme
>> |   if there is no site component), identifying the resource within the
>> |   scope of that scheme and site.
>> |
>> |      path          = [ "/" ] path_segments
>> |
>> |      path_segments = segment *( "/" segment )
>> |      segment       = *pchar *( ";" param )
>> |      param         = *pchar
>> |
>> |      pchar         = unreserved | escaped | ":" | "@" | "&" | "=" | "+"
>> [...]
>> Which might suggest that "/", ";", "=", and "?" should not match their
>> URL-encoded versions.
>>
>> [DMK's inappropriate citation of RFC 822 deleted]
>
>Okay, I see your point now.  I think what you're saying, in fact, is
>any character that is not a pchar must be escaped, and that an explicit
>non-pchar does not match its escaped form.
>
>So now we need words to say that.  How about these, for 4.3.4, under
>Path Selection:
>
>Two paths match if
>	- respective characters match exactly; or
>	- respective characters are represented by % escapes, and the values
>		of the % escapes are equal; or
>	- one path has an explicit character, the second path has a % escape,
>		the % escape evaluates to the character, and the character
>		is a pchar (RFC 2068).
>Otherwise the paths do not match.

	You're getting into the can of worms with which the MHTML-WG has
been struggling, and would be unwise, IHMO, to take it on explicily in
the cookie specs as well, rather than leaving it as now, with an implicit
definition of "match" as "whatever the current HTTP specs define that
to mean for the path field of http[s] URLs (presently in Section 3.2.3
of RFC 2068 and the -08 revision draft)".

	Technically, semi-colons should be hex escaped when they are
not parameter delimiters, but a URL parameter presently is defined
only for ftp URLs (;type=[A, I, or D]) and virutally all deployed UAs
treat any unescaped semi-colons "in"/"near" http[s] URL path fields
as ordinary characters belonging to the immediately preceding element.
Note also, that Section 3.2.3, through the -08 draft, specifies parameter
handling in http URLs according to RFC 1738 and RFC 1808, but the -07
URL draft (as in its preceding -0n versions) has changed that in
relation to "current practice", i.e., it now describes the apparant
resolving behavior of deployed UAs which in fact are not paying any
attention to semi-colons as parameter delimiters in http[s] URLs
(because there aren't any parameters defined for that scheme :).
Presumeably, the HTTP/1.1 draft's Section 3.2.3 will be updated
homologously before it moves on to Draft Standard.

	The -07 URL draft also has wording changes to make clear
that URLs are sequences of characters, not necessarily octets, in
anticipation of explicit standards for internationalized URLs.  When
the latter standards are forthcoming (perhaps with parameters to
indicate charsets) the matching and parameter handling rules might
need further revisions.  So, again, it MAY be better to handle the
issue of path matching via indirect reference to HTTP specs, rather
than trying to spell it out explicitly within the cookie specs. :)

				Fote

=========================================================================
 Foteos Macrides            Worcester Foundation for Biomedical Research
 MACRIDES@SCI.WFBR.EDU         222 Maple Avenue, Shrewsbury, MA 01545
=========================================================================
Received on Tuesday, 23 September 1997 11:07:27 UTC