3 last issues with the URI syntax

Patrik =?iso-8859-1?Q?F=E4ltstr=F6m?= (paf@swip.net)
Sat, 11 Jul 1998 10:55:46 +0200


Message-Id: <v04011706b1ccd6b73a32@[193.12.104.1]>
Date: Sat, 11 Jul 1998 10:55:46 +0200
To: Larry Masinter <masinter@parc.xerox.com>,
From: Patrik =?iso-8859-1?Q?F=E4ltstr=F6m?=  <paf@swip.net>
Cc: uri@Bunyip.Com
Subject: 3 last issues with the URI syntax

I resend this message, as I did send it to the completely wrong addresses
last time, the 29th of june. I am sorry for the delay because of this, and
thank Daniel (who received the mail) for pointing this out to me.


These are my last questions regarding the document, and the changes are
minor. I have also tried as much as possible to suggest new text. We talk
about one clearification, one change in the grammar (there was a bug in the
grammar) and then one question I have regarding how the HTTP protocol
actually works (regarding the use of the "foo/../bar" construction in
relative and absolute URIs which seem to (according to Larry) differ
between implementations. I just want this paper to be extremely clear in
how it should be handled, so we can bash the clients that does not follow
the rules in this paper.

     Patrik


After talking with several different people over and over again, like a
mediator, I think I see consensus with the URI syntax paper
draft-fielding-uri-syntax-03 with only three changes. The changes are for
clearification of three things. Only one of these might be controversial,
but I'll explain what the problem is, and what my suggestion is.

(1) The "Generic URI Syntax"

>3. URI Syntactic Components
>
>   The URI syntax is dependent upon the scheme.  In general, absolute
>   URI are written as follows:
>
>      <scheme>:<scheme-specific-part>
>
>   An absolute URI contains the name of the scheme being used (<scheme>)
>   followed by a colon (":") and then a string (the <scheme-specific-
>   part>) whose interpretation depends on the scheme.
>
>   The URI syntax does not require that the scheme-specific-part have
>   any general structure or set of semantics which is common among all
>   URI.  However, a subset of URI do share a common syntax for
>   representing hierarchical relationships within the namespace.  This
>   "generic URI" syntax consists of a sequence of four main components:
>
>      <scheme>://<authority><path>?<query>

I think this would be clearer if, after this paragraph, a new paragraph is
inserted which points directly to what part of the grammar other URI
schemes must support. Something like the following:

    URI schemes which does not follow this syntax have to follow the
    following syntax, where the scheme-specific-part does not begin with
    a '/':

       <scheme>:<uric-no-slash> *<uric>


(2) The specification of <query> in the grammar.

>3.4. Query Component
>
>   The query component is a string of information to be interpreted by
>   the resource.
>
>      query         = *uric
>
>   Within a query component, the characters ";", "/", "?", ":", "@",
>   "&", "=", "+", ",", and "$" are reserved.

My suggestion is that query should be specified not with *uric, but instead
a group of characters which actually is allowed. According to what I can
see the rule should be instead:

       query         = * (unreserved | escaped)


(3) relative-path reference

This is a nightmare. According to the current draft, and _some_
implementations, the constructions '.' and '..' are treated differently in
different cituations. The problem is that implementations does not work
according to the draft, which because of that have to be more explicit, so
implementations can be said being compliant or not. This is the problem:

Case A:
   Base: whatever
   URI: foo://bar/a/../b
   Resulting URI: foo://bar/a/../b

Case B:
   Base: foo://bar
   URI: ./a/../b
   Resulting URI: foo://bar/b

This is how the current draft is written, and according to Larry, this is
how some implementations handle the case. Some other browsers do the
following:

Case C:
   Base: whatever
   URI: foo://bar/a/../b
   Resulting URI: foo://bar/b

I.e. some implementations treat the '.' and '..' different as soon as the
scheme and URI uses the generic URI syntax, and not only inside a relative
URI!

I personally can not see any logic in treating '.' and '..' differently if
the URI is relative regarding a base or not. I.e. the '.' and '..' should
be handled the same way, i.e. case B and C above are the correct way of
parsing a URI (or A and a case of B where the '..' is not treated special
of course).

BUT, what I now want is very explicit text stating how this is to be
handled. I suggest the following text in section 4:

   The syntax for relative URI is a shortened form of that for absolute
   URI, where some prefix of the URI is missing and certain path
   components ("." and "..") have a special meaning when interpreting a
   relative path.  The relative URI syntax is defined in Section 5.

To change to:

   The syntax for relative URI is a shortened form of that for absolute
   URI, where some prefix of the URI is missing and certain path
   components ("." and "..") have a special meaning when, and only when,
   interpreting a relative path.  The relative URI syntax is defined
   in Section 5.

I.e. the suggested text does not only state that the "." and ".." are to be
treated when a relative URI is to be used, but also that it is to be
resolved only in that case.

Larry, Roy, can you resolve these three problems (or answer explicitely (3)
so I can argue for the IESG) and I will take this up on the IESG agenda on
July 16?

     Patrik