W3C home > Mailing lists > Public > uri@w3.org > January 2001

Re: Generic Support for New URI Schemes with Apache?

From: Al Gilman <asgilman@iamdigex.net>
Date: Mon, 22 Jan 2001 23:22:59 -0500
Message-Id: <200101230419.XAA339912@smtp1.mail.iamworld.net>
To: Aaron Swartz <aswartz@swartzfam.com>, <duerst@w3.org>, Rob Cameron <cameron@cs.sfu.ca>, <uri@w3.org>
At 08:38 PM 2001-01-22 -0600, Aaron Swartz wrote:
>Martin Duerst <duerst@w3.org> wrote:
>
>> This is really a case where we should work with the browser
>> providers to fix the problem, rather than hacking around it.
>
>Is there an easy regexp or something to detect new URI schemes from relative
>links? Do we just ban colons in URIs? (I believe there was a recent question
>about that...)
>

You just ban relative truncations that start with something that looks like
'term:'.
There is always a somewhat longer form that avoids starting with this path
segment if you have a colon in there.  It can be as simple as prepending "./"
as noted in the RFC.


   Authors should be aware that a path segment which contains a colon
   character cannot be used as the first segment of a relative URI path
   (e.g., "this:that"), because it would be mistaken for a scheme name.

   It is therefore necessary to precede such segments with other
   segments (e.g., "./this:that") in order for them to be referenced as
   a relative path.

 <http://www.faqs.org/rfcs/rfc2396.html>http://www.faqs.org/rfcs/rfc2396.html

>I actually need such detection for some software I'm working on.
>

See also

B. Parsing a URI Reference with a Regular Expression

   As described in Section 4.3, the generic URI syntax is not sufficient
   to disambiguate the components of some forms of URI.  Since the
   "greedy algorithm" described in that section is identical to the
   disambiguation method used by POSIX regular expressions, it is
   natural and commonplace to use a regular expression for parsing the
   potential four components and fragment identifier of a URI reference.

   The following line is the regular expression for breaking-down a URI
   reference into its components.

      ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
       12            3  4          5       6  7        8 9


from the same source

Al
>-- 
>[ Aaron Swartz | me@aaronsw.com |
<http://www.aaronsw.com/>http://www.aaronsw.com ]
>  
Received on Monday, 22 January 2001 23:13:37 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:02 UTC