W3C home > Mailing lists > Public > uri@w3.org > June 2003

non-hierarchical URIs and square brackets

From: Graham Klyne <gk@ninebynine.org>
Date: Thu, 19 Jun 2003 10:45:40 +0100
Message-Id: <5.1.0.14.2.20030619093000.00bc5048@127.0.0.1>
To: uri@w3.org

This message is prompted by some questions/comments I received from someone 
recently.  I've not yet read the -03 draft in detail, so I may have 
overlooked some material.

...

(1) hierarchical and non-hierarchical URIs

I notice that in -03 the opaque-part syntax distinction has been 
dropped.  My concern is that it may now not be clear when 
relative-to-absolute URI conversion should take part of hierarchical '/' 
characters in the URIs.  Previously, my understanding was that an algorithm 
would look at the path component of the base URI and, if it starts with a 
'/', assume the base and relative URIs are hierarchical and apply the 
path-merging logic;  otherwise, the opaque-part in the base URI is used in 
all-or-nothing fashion depending on what is present in the "relative" URI.

Section 3 says "a non-hierarchical path will be treated as opaque data by a 
generic URI parser", but it's not clear at this point what constitutes a 
"non-hierarchical path".

Section 3.3 says " A path is always defined for a URI, though the defined 
path may be empty (zero length) or opaque (not containing any "/" 
delimiters)", which suggests that an opaque path may not contain *any* 
un-escaped '/' characters.  This seems like an onerous restriction, and in 
conflict with existing URI scheme usage; e.g. news: according to IANA is 
currently specified by RFC 1738, and has:
[[
    A <newsgroup-name> is a period-delimited hierarchical name, such as
    "comp.infosystems.www.misc". A <message-id> corresponds to the
    Message-ID of section 2.1.5 of RFC 1036, without the enclosing "<"
    and ">"; it takes the form <unique>@<full_domain_name>.  A message
    identifier may be distinguished from a news group name by the
    presence of the commercial at "@" character. No additional characters
    are reserved within the components of a news URL.
]]
-- http://www.faqs.org/rfcs/rfc1738.html

or mid:, defined by RFC 2392, which is clearly non-hierarchical, but:
[[
      mid-url       = "mid" ":" message-id [ "/" content-id ]
]]
-- http://www.faqs.org/rfcs/rfc2392.html

Other non-hier URI schemes using '/' are:
service:  http://www.faqs.org/rfcs/rfc2609.html




My suggestion would be to add a brief comment in section 3 clarifying the 
intent (as to what constitutes a hierarchical URI).  My preference is that 
an absolute URI with net-path or abs-path form is hierarchical, otherwise not.

...

(2) square brackets

Is it necessary for square brackets to be reserved outside the net-path 
component?  I personally use them quite often in fragment identifiers for 
references.  My correspondent had another use for them in the path 
component of a URI scheme.  I think there are several instances of them 
occurring (unescaped) in message-IDs, and the mid: spec doesn't require 
them to be escaped.  I think this also applies to the ldap: scheme.
-- http://www.faqs.org/rfcs/rfc2392.html
-- http://www.faqs.org/rfcs/rfc2255.html

...

#g



-------------------
Graham Klyne
<GK@NineByNine.org>
PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E
Received on Thursday, 19 June 2003 07:58:31 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 13 January 2011 12:15:31 GMT