- From: Al Gilman <asgilman@iamdigex.net>
- Date: Mon, 22 Jan 2001 23:22:59 -0500
- To: Aaron Swartz <aswartz@swartzfam.com>, <duerst@w3.org>, Rob Cameron <cameron@cs.sfu.ca>, <uri@w3.org>
At 08:38 PM 2001-01-22 -0600, Aaron Swartz wrote: >Martin Duerst <duerst@w3.org> wrote: > >> This is really a case where we should work with the browser >> providers to fix the problem, rather than hacking around it. > >Is there an easy regexp or something to detect new URI schemes from relative >links? Do we just ban colons in URIs? (I believe there was a recent question >about that...) > You just ban relative truncations that start with something that looks like 'term:'. There is always a somewhat longer form that avoids starting with this path segment if you have a colon in there. It can be as simple as prepending "./" as noted in the RFC. Authors should be aware that a path segment which contains a colon character cannot be used as the first segment of a relative URI path (e.g., "this:that"), because it would be mistaken for a scheme name. It is therefore necessary to precede such segments with other segments (e.g., "./this:that") in order for them to be referenced as a relative path. <http://www.faqs.org/rfcs/rfc2396.html>http://www.faqs.org/rfcs/rfc2396.html >I actually need such detection for some software I'm working on. > See also B. Parsing a URI Reference with a Regular Expression As described in Section 4.3, the generic URI syntax is not sufficient to disambiguate the components of some forms of URI. Since the "greedy algorithm" described in that section is identical to the disambiguation method used by POSIX regular expressions, it is natural and commonplace to use a regular expression for parsing the potential four components and fragment identifier of a URI reference. The following line is the regular expression for breaking-down a URI reference into its components. ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? 12 3 4 5 6 7 8 9 from the same source Al >-- >[ Aaron Swartz | me@aaronsw.com | <http://www.aaronsw.com/>http://www.aaronsw.com ] >
Received on Monday, 22 January 2001 23:13:37 UTC