Re: Use of ";" in relative URLs: procedural issue?

Some background... I introduced in RFC 1808 a definition for how to
handle ";" within URLs according to how it was implemented in libwww-perl.
My testing prior to the URL syntax revision, as seen at
<http://www.ics.uci.edu/~fielding/url/>, revealed that none of the
other major applications of URLs implemented ";" according to RFC 1808.
So, I removed the distinction from the new URL syntax document.

The current definition of URLs is that ";" is treated as part of the
path segment for the purpose of relative URL parsing.

>> Semicolons were introduced to allow elements to be specified by name
>> rather than position, for spaces which were best seen as matrices
>> rather than trees.  In this case it is only sensible for relative
>> URls which start with ";" to take a set of attribute values which
>> are different.  This implies
> 
>>  1. attributes can only occur once (unless you have a syntax for
>>     removing a particular occurrence) and
>>  2. a missed value is equivalent to an unspecified value (so you can
>>     remove an occurrence by setting its value to empty)
>>  3. attributes are unordered
> 
> This is quite attractive, and is important for some schemes.  No one
> seems to implement this currently, though, so introducing it at this
> point seems like we would at least have to start over at "Proposed
> Standard", and should only do so if the consensus of the community is
> that this is the right thing to do.

I think that is the crux of the problem.  While it may seem like a good
idea if everyone implements it, a relative URL which made use of that
property cannot be used until almost all consumers of that relative URL
(the client applications) are capable of resolving it consistently.
Since that is not the case for all of today's applications, we will have
a significant lag between specifying that behavior and actually being
able to use it on public systems.

Unlike a new protocol, we cannot make design decisions by fiat.  Before
we can change the algorithm, we will need to get buy-in from the people
who produce the software.  That is what Larry is doing.

Speaking as a developer, the proposed change would add considerable
complexity to what is currently a trivial algorithm.  In short, if the
URL reference began with ";", the client would need to extract the
parameters from the base URL, hash them into a matrix (may be a simple list),
do the same with the reference parameters and place them into the same
matrix (overriding any that duplicate those of the base URL), and then
regen the URL from the matrix.  Aside from code bloat, I would not
anticipate any significant decrease in performance, since there would
be no difference from the existing algorithm for all URL references that
did not start with ";", as is the case with almost all current references.

My opinion is that I would like to see a need for this change before
trying to make it a standard.  In other words, I have yet to see a
compelling application espoused that would require the use of unordered
attribute-value pairs within a URL-using application, other than FORMs
entry which already has a different syntax.  Combining that with the
inability to use such a feature until all current software is upgraded,
my current opinion is that we should not make the change.

.....Roy

p.s. <uri@bunyip.com> is the correct place to discuss this.  It was part
     of the agreement on proposing the URL WG that the new list would
     not be used to argue over the generic syntax, but instead be limited
     to scheme-specific standardization issues.

Received on Monday, 3 February 1997 22:41:31 UTC