suggestions for path component clarification

Hello,

This is in reference to
http://cvs.apache.org/viewcvs.cgi/*checkout*/ietf-uri/rev-2002/rfc2396bis.ht 
ml?rev=1.64

Appendix D.2 says

   'All references to "opaque" URIs have been replaced with
   a better description of how the path component may be
   opaque to hierarchy'

yet section 3.3 starts off with

   "The path component contains hierarchical data..."

The next paragraph says

   "The path consists of a sequence of path segments separated
   by a slash ("/") character."

but then goes on to say

   "the ... path may be empty (zero length) or opaque..."

IMHO this is not doing a very good job of defining the path component,
nor is it educating the reader that "path" in this spec doesn't always
mean the kind of path that pretty much everyone, even my mom, thinks
of when one refers to a path.

I never liked overloading the term "path" to mean "an absolute or relative
URL path, or maybe a URN opaque part". It seems like we're (well, you're)
trying to retain URL-specific terminology when you talk about URIs
generically, even if the URI is a URN where such terminology is
inappropriate, if not completely unintuitive.

As such, in 4Suite I chose to implement a URI parser and validator in a
way that distinguishes between a path and an opaque part as separate URI
components. It's not too late to change my implementation (that's why I
am here), but I do want to register my opposition to the direction the
spec is moving.

At least, if the ambiguous semantics of "path" are to be set in stone, I
would much rather see a replacement of the term, rather than overloading
this one that is pretty much universally understood to have a narrower
meaning than the one you are defining here.

Short of that, I suggest starting section 3.3 with something like

   The path component of a URI, along with data in the optional query
   component, serves to identify a resource within the scope of that URI's
   scheme and naming authority (if any).

   The path typically contains hierarchical data in the form of path
   segments separated by a slash ("/") character, but may sometimes be
   empty (zero length) or opaque (not containing any "/" separators).
   A path is always defined for a URI, regardless of whether or not it
   is hierarchical, empty, or opaque.

   There is no specific "path" syntax production in the generic URI
   syntax. Instead, the path component is that part of the parsed URI
   string matching either the abs-path or the rel-path production,
   since they are mutually exclusive for any given URI and can be
   parsed as a single component. The rel-path production encompasses
   relative, empty and opaque paths. The path is terminated by the
   first question-mark ("?") or number-sign ("#") character, or by the
   end of the URI.

   In a hierarchical path, the path segments "." and ".." are defined
   for relative reference within the path name hierarchy.
   [...]

- Mike

Received on Thursday, 25 September 2003 08:35:57 UTC