- From: Aaron E. Walsh <aaron@mantiscorp.com>
- Date: Fri, 18 Aug 2000 17:07:25 -0400
- To: uri@w3.org
Hello everyone, I've just recently joined the uri@w3.org list, and so apologize in advance if this subject has been been discussed before (I reviewed the archives but couldn't find an answer, although I imagine it's been tacked before). I wonder if un-escaped colons are legal in the path portion of a URL so long as they're not in the scheme or domain? For example: http://www.web3dmedia.com/urn:web3d:media:/textures/nature/grass_1.jpg http://www.officetowers.com/urn:web3d:media:/textures/nature/rocks_3.jpg I ask because our Web3D Universal Media Working Group uses URNs for media referencing, which we'd like to extend to single URL environments such as standard HTML browsers and authoring tools (our use of URNs is based on the VRML97 ISO standard, which supports multiple URLs/URNs). The two URLs above show how we might embed a URN into a URL so that the media can be resolved via http (over the net) by products that don't understand Universal Media (just like a normal URL) while also giving products that understand our system what they need to know in order to fetch the media from the user's local system (the URN identifier "urn:web3d:media:" is the key; this tells Universal Media products that a piece of media is likely be be locally resident and so they'll attempt to resolve it locally first before trying the Web). I chair the Universal Media Working within the Web3D Consortium, and I'd like to extend our media system to URL/URI environments *without* conflicting with standard use of URIs/URLs. If you'd like some background on our work before commenting please feel free to visit our site at: http://www.web3dmedia.com/UniversalMedia/ To see how we deal with URNs, you can read our recommended practice: http://www.web3dmedia.com/UniversalMedia/course/ see "VRML, URNs and Universal Media Recommended Practice Proposal" Below my signature is a message I recently sent to our group regarding using colons in URLs so that our URNs can be used by a wider audience. I've since read rfc2396 again and would like to know if it's possible to include colons in URLs without escaping them, like these: http://www.web3dmedia.com/urn:web3d:media:/textures/nature/grass_1.jpg http://www.officetowers.com/urn:web3d:media:/textures/nature/rocks_3.jpg Is this legal so long at the colon appears in the path (as above) and not in the scheme or domain? With thanks for your advice, Aaron -- --------------------------------------------------------------------- Aaron E. Walsh http://www.mantiscorp.com/people/aew/ 617.350.7119 --------------------------------------------------------------------- Subject: Colons in URIs (rfc2396) for Universal Media URNs in URLs Date: Wed, 16 Aug 2000 15:34:02 -0400 From: "Aaron E. Walsh" <aaron@mantiscorp.com> To: Universal Media Working Group <media@web3d.org> CC: vrml list <www-vrml@web3d.org>, Content Working Group <content@web3d.org> Hello everyone, I haven't read rfc2396 in great detail yet, but after a few quick passes it looks like our use of colons in URLs for the purpose or transparently encoding URNs is legal as long as we escape them (escape the colons). I wonder if anyone else comes to the same conclusion, or different? Following is are three excerpts (sections 2.2, 3, and from rfc2396 that made me think we're ok as long as our colons are escaped (look for !!!!! at the very bottom of this message where I explain what makes me think so). Please note that I plan to ask for comment from the W3C URI group as well (after more investigation), but would first prefer to have feedback from our own community. ----> Excerpts from "Uniform Resource Identifiers (URI): Generic Syntax" See: http://www.ietf.org/rfc/rfc2396.txt --------------------------- 2.2. Reserved Characters --------------------------- Many URI include components consisting of or delimited by, certain special characters. These characters are called "reserved", since their usage within the URI component is limited to their reserved purpose. If the data for a URI component would conflict with the reserved purpose, then the conflicting data must be escaped before forming the URI. reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | "," The "reserved" syntax class above refers to those characters that are allowed within a URI, but which may not be allowed within a particular component of the generic URI syntax; they are used as delimiters of the components described in Section 3. --------------------------- 3. URI Syntactic Components --------------------------- The URI syntax is dependent upon the scheme. In general, absolute URI are written as follows: <scheme>:<scheme-specific-part> An absolute URI contains the name of the scheme being used (<scheme>) followed by a colon (":") and then a string (the <scheme-specific- part>) whose interpretation depends on the scheme. The URI syntax does not require that the scheme-specific-part have any general structure or set of semantics which is common among all URI. However, a subset of URI do share a common syntax for representing hierarchical relationships within the namespace. This "generic URI" syntax consists of a sequence of four main components: <scheme>://<authority><path>?<query> each of which, except <scheme>, may be absent from a particular URI. For example, some URI schemes do not allow an <authority> component, and others do not use a <query> component. absoluteURI = scheme ":" ( hier_part | opaque_part ) URI that are hierarchical in nature use the slash "/" character for separating hierarchical components. For some file systems, a "/" character (used to denote the hierarchical structure of a URI) is the delimiter used to construct a file name hierarchy, and thus the URI path will look similar to a file pathname. This does NOT imply that the resource is a file or that the URI maps to an actual filesystem pathname. hier_part = ( net_path | abs_path ) [ "?" query ] net_path = "//" authority [ abs_path ] abs_path = "/" path_segments URI that do not make use of the slash "/" character for separating hierarchical components are considered opaque by the generic URI parser. opaque_part = uric_no_slash *uric uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | "," We use the term <path> to refer to both the <abs_path> and <opaque_part> constructs, since they are mutually exclusive for any given URI and can be parsed as a single component. --------------------------- 3.3. Path Component --------------------------- The path component contains data, specific to the authority (or the scheme if there is no authority component), identifying the resource within the scope of that scheme and authority. path = [ abs_path | opaque_part ] path_segments = segment *( "/" segment ) segment = *pchar *( ";" param ) param = *pchar pchar = unreserved | escaped | ":" | "@" | "&" | "=" | "+" | "$" | "," The path may consist of a sequence of path segments separated by a single slash "/" character. Within a path segment, the characters "/", ";", "=", and "?" are reserved. Each path segment may include a sequence of parameters, indicated by the semicolon ";" character. The parameters are not significant to the parsing of relative references. !!!!! If I read the above correctly, in particular the frist 2 sentences of the very last paragraph immediately above (after considering the previous material) then ":" is not a reserved character when used within a path segment. But the following definition implies that our colons should be escaped, I think: pchar = unreserved | escaped | ":" | "@" | "&" | "=" | "+" | "$" | "," Do others gather the same from the RFC and these sections? If not, what's your take on our use of colons in URLs when used to transparently embed a URN like so: http://www.officetowers.com/urn:web3d:media:/textures/nature/rocks_3.jpg http://www.web3dmedia.com/urn:web3d:media:/textures/nature/grass_1.jpg Comments? Aaron --------------------------------------------------------------------- Aaron E. Walsh http://www.mantiscorp.com/people/aew/ 617.350.7119 ---------------------------------------------------------------------
Received on Friday, 18 August 2000 16:58:30 UTC