Colons in URIs (rfc2396) for Universal Media URNs in URLs from Aaron E. Walsh on 2000-08-18 (uri@w3.org from August 2000)

From: Aaron E. Walsh <aaron@mantiscorp.com>
Date: Fri, 18 Aug 2000 17:07:25 -0400
To: uri@w3.org
Message-ID: <399DA58D.6BE4EEB4@mantiscorp.com>
Hello everyone,

I've just recently joined the uri@w3.org list, and so apologize in
advance if this subject has been been discussed before (I reviewed the
archives but couldn't find an answer, although I imagine it's been
tacked before).

I wonder if un-escaped colons are legal in the path portion of a URL so
long as they're not in the scheme or domain? For example:

http://www.web3dmedia.com/urn:web3d:media:/textures/nature/grass_1.jpg
http://www.officetowers.com/urn:web3d:media:/textures/nature/rocks_3.jpg

I ask because our Web3D Universal Media Working Group uses URNs for
media referencing, which we'd like to extend to single URL environments
such as standard HTML browsers and authoring tools (our use of URNs is
based on the VRML97 ISO standard, which supports multiple URLs/URNs).
The two URLs above show how we might embed a URN into a URL so that the
media can be resolved via http (over the net) by products that don't
understand Universal Media (just like a normal URL) while also giving
products that understand our system what they need to know in order to
fetch the media from the user's local system (the URN identifier
"urn:web3d:media:" is the key; this tells Universal Media products that
a piece of media is likely be be locally resident and so they'll attempt
to resolve it locally first before trying the Web).

I chair the Universal Media Working within the Web3D Consortium, and I'd
like to extend our media system to URL/URI environments *without*
conflicting with standard use of URIs/URLs. If you'd like some
background on our work before commenting please feel free to visit our
site at:

  http://www.web3dmedia.com/UniversalMedia/

To see how we deal with URNs, you can read our recommended practice:
 http://www.web3dmedia.com/UniversalMedia/course/
  see "VRML, URNs and Universal Media Recommended Practice Proposal"


Below my signature is a message I recently sent to our group regarding
using colons in URLs so that our URNs can be used by a wider audience.
I've since read rfc2396 again and would like to know if it's possible to
include colons in URLs without escaping them, like these:

http://www.web3dmedia.com/urn:web3d:media:/textures/nature/grass_1.jpg
http://www.officetowers.com/urn:web3d:media:/textures/nature/rocks_3.jpg

Is this legal so long at the colon appears in the path (as above) and
not in the scheme or domain?

With thanks for your advice,
Aaron
-- 
---------------------------------------------------------------------
Aaron E. Walsh   http://www.mantiscorp.com/people/aew/   617.350.7119
---------------------------------------------------------------------

Subject: Colons in URIs (rfc2396) for Universal Media URNs in URLs
        Date:  Wed, 16 Aug 2000 15:34:02 -0400
        From:  "Aaron E. Walsh" <aaron@mantiscorp.com>
         To:  Universal Media Working Group <media@web3d.org>
         CC:  vrml list <www-vrml@web3d.org>,
              Content Working Group <content@web3d.org>

Hello everyone,

I haven't read rfc2396 in great detail yet, but after a few quick passes
it looks like our use of colons in URLs for the purpose or transparently
encoding URNs is legal as long as we escape them (escape the colons). I
wonder if anyone else comes to the same conclusion, or different?
Following is are three excerpts (sections 2.2, 3, and from rfc2396 that
made me think we're ok as long as our colons are escaped (look for !!!!!
at the very bottom of this message where I explain what makes me think
so).

Please note that I plan to ask for comment from the W3C URI group as
well (after more investigation), but would first prefer to have feedback
from our own community.

---->

Excerpts from "Uniform Resource Identifiers (URI): Generic Syntax"
See:  http://www.ietf.org/rfc/rfc2396.txt

---------------------------
2.2. Reserved Characters
---------------------------
Many URI include components consisting of or delimited by, certain
special characters.  These characters are called "reserved", since
their usage within the URI component is limited to their reserved
purpose.  If the data for a URI component would conflict with the
reserved purpose, then the conflicting data must be escaped before
forming the URI.

  reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
                    "$" | ","

The "reserved" syntax class above refers to those characters that are
allowed within a URI, but which may not be allowed within a
particular component of the generic URI syntax; they are used as
delimiters of the components described in Section 3.

---------------------------
3. URI Syntactic Components
---------------------------
The URI syntax is dependent upon the scheme.  In general, absolute
URI are written as follows:

      <scheme>:<scheme-specific-part>

An absolute URI contains the name of the scheme being used (<scheme>)
followed by a colon (":") and then a string (the <scheme-specific-
part>) whose interpretation depends on the scheme.

The URI syntax does not require that the scheme-specific-part have
any general structure or set of semantics which is common among all
URI.  However, a subset of URI do share a common syntax for
representing hierarchical relationships within the namespace.  This
"generic URI" syntax consists of a sequence of four main components:

      <scheme>://<authority><path>?<query>

each of which, except <scheme>, may be absent from a particular URI.
For example, some URI schemes do not allow an <authority> component,
and others do not use a <query> component.

      absoluteURI   = scheme ":" ( hier_part | opaque_part )

URI that are hierarchical in nature use the slash "/" character for
separating hierarchical components.  For some file systems, a "/"
character (used to denote the hierarchical structure of a URI) is the
delimiter used to construct a file name hierarchy, and thus the URI
path will look similar to a file pathname.  This does NOT imply that
the resource is a file or that the URI maps to an actual filesystem
pathname.

      hier_part     = ( net_path | abs_path ) [ "?" query ]

      net_path      = "//" authority [ abs_path ]

      abs_path      = "/"  path_segments

URI that do not make use of the slash "/" character for separating
hierarchical components are considered opaque by the generic URI
parser.

 opaque_part   = uric_no_slash *uric

 uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" |
                      "&" | "=" | "+" | "$" | ","

We use the term <path> to refer to both the <abs_path> and
<opaque_part> constructs, since they are mutually exclusive for any
given URI and can be parsed as a single component.

---------------------------
3.3. Path Component
---------------------------
The path component contains data, specific to the authority (or the
scheme if there is no authority component), identifying the resource
within the scope of that scheme and authority.

      path          = [ abs_path | opaque_part ]

      path_segments = segment *( "/" segment )
      segment       = *pchar *( ";" param )
      param         = *pchar

      pchar         = unreserved | escaped |
                      ":" | "@" | "&" | "=" | "+" | "$" | ","

The path may consist of a sequence of path segments separated by a
single slash "/" character.  Within a path segment, the characters
"/", ";", "=", and "?" are reserved.  Each path segment may include a
sequence of parameters, indicated by the semicolon ";" character.
The parameters are not significant to the parsing of relative
references.

!!!!! If I read the above correctly, in particular the frist 2 sentences
of the very last paragraph immediately above (after considering the
previous material) then ":" is not a reserved character when used within
a path segment.  But the following definition implies that our colons
should be escaped, I think:

      pchar         = unreserved | escaped |
                      ":" | "@" | "&" | "=" | "+" | "$" | ","

Do others gather the same from the RFC and these sections? If not,
what's your take on our use of colons in URLs when used to transparently
embed a URN like so:


http://www.officetowers.com/urn:web3d:media:/textures/nature/rocks_3.jpg
http://www.web3dmedia.com/urn:web3d:media:/textures/nature/grass_1.jpg

Comments?
Aaron
---------------------------------------------------------------------
Aaron E. Walsh   http://www.mantiscorp.com/people/aew/   617.350.7119
---------------------------------------------------------------------
Received on Friday, 18 August 2000 16:58:30 UTC