URL Patterns

[ Greetings to all. I've been away from this list for a while now due to
workload, so I apologize in
  advance if this has been discussed before. I did go back over the last
year's mail digests and
  didn't see anything, but in case I missed it, I would appreciate pointers
to any relevant
  discussions -- Ramin. ]

I have a need to support, for lack of a better term, "URL Patterns." Given
a URL Pattern and
a valid URL, the matching mechanism would return a "true" or "false" reply
to the question
"does this URL match the pattern?"

Here's an example (assuming the "*" asterisk character is a wildcard "match
any number"
character):

Given a URL Pattern:

      http://*.nasa.gov/*.html

The following URL's might match:

http://arc.nasa.gov/index.html
http://queue.nasa.gov/foo/bar/file.html

But the following would not:

http://arc.nasa.gov/
http://nasa.gov/file.html
etc...

This is just the tip of the proverbial iceberg. The pattern matching
mechanism may want to go
beyond just dumb character matching. For example, the "*" character,
depending on where it
appears may want to match an entire section of a URL. For example, given
the above pattern,
we may want to match "http://user:password@arc.nasa.gov/file.html" since
the "user:password"
portion is technically still part of the "login" portion of the URL. The
syntax of the pattern
meta-characters might be a little "tortured" considering the characters
already used up in RFC's
1738/1808.

Another issue that would need to be resolved: would the above pattern match
something like
"http://arc.nasa.gov/file.html?query=world#Section1" or not? Hmm... The
smart pattern matcher
might go ahead and say "true" whereas the dumb character matcher would say
"false." I'm much
more interested in the smart matcher.

Obviously, there are a number of other questions that need to be addressed:
the pattern syntax,
the semantics of what constitutes a match, the scope of the pattern, and
the relevance of relative URL's and scheme-dependencies, etc...

I won't overload this message too much. If work has been done on this
already, please point me in the
right direction. If not, I would be happy to initiate a discussion and
share some thoughts that have been
discussed at this end.

Cheers,

Ramin Firoozye
ramin@walkaboutsoft.com

Received on Saturday, 10 February 1996 21:03:51 UTC