URI Pattern Syntax

There being no URI group this week I thought I would send this here. 
If there is interest I'll turn ii into a RFC or something.

There are a lot of specs which IMHO would benefit from a 
pattern matching proceedure. Here is my proposal:

--------------52BF623163DE
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="WD-uri-matches.html"

URI Pattern Syntax

Dr Phillip M. Hallam-Baker
World Wide Web Consortium

Abstract

It is often convenient to refer to a group of URIs based on a syntactic
match. This note describes a generic syntax for defining such matches which
may be used as a common basis for applications requiring variable degrees of
flexibility,

Introduction

The World Wide Web uses URIs [RFC1630] to identify resources. It is often
usefull to be able to define a set of URIs on the basis of a simple
syntactic expression such as a prefix string or using wildcards. Depending
upon the application it may be desirable to chose between providing
expressabiliy and simplicity of implementation.

Syntax

although wildcard matching is of general use the URI specification does not
anticipate it and hence the scheme described in this specification is only
applicable to HTTP. It is hoped that future URI specifications would
anticipate the needs of wildcard matching however.

The sequence %* is used to indicate a wildcard character this character
sequence is not legal in a HTTP URI. Thus the pattern:

http://www.w3.org/pub/%*

Matches the URIs

http://www.w3.org/pub/WWW
http://www.w3.org/pub/WWW/TR/WD-uri-patterns.html
http://www.w3.org/pub/WWW/TR/WD-uri-patterns
http://www.w3.org/pub/WWW/People/Hallam

but not

http://www.w3.org/WWW/TR/WD-uri-patterns

Level 0 Specification - Prefix Matching.

The most common need for template matching is to specify a prefix.

In the level 0 specification a wildcard is only valid at the end of the
pattern. Implementations need only match on a prefix therefore. The
following are valid level 0 patterns:

http://www.w3.org/pub/%*
http://www.w3.org/p%*
http://www.w3.org/pub/

Level 1 Specification - Prefix and Suffix Matching.

Many URIs have a type dependent suffix. It is therefore convenient to allow
such URIs to be specified.

In the Level 1 specification a template may contain no more than one
wildcard. Implementations must match both prefix and suffix. The following
are valid level 1 patterns:

http://www.w3.org/pub/%*
http://www.w3.org/p%*
http://www.w3.org/pub/
http://www.w3.org/pub/%*.html
http:%*.html

Level 2 Specification - Multiple Wildcards

In some cases the power of arbitrary wildcard matching is usefull.

In the Level 2 specification a template may contain any number of wildcards.
Implementations must match both prefix and suffix. The following are valid
level 2 patterns:

http://www.w3.org/pub/%*
http://www.w3.org/p%*
http://www.w3.org/pub/
http://www.w3.org/pub/%*.html
http://www.w3.org/%*WWW/%*.html

References

[RFC1737]
     K. Sollins, L. Masinter Functional Requirements for Uniform Resource
     Names
[RFC1630]
     T. Berners-Lee Universal Resource Identifiers in WWW

--------------52BF623163DE--

Received on Friday, 23 February 1996 11:48:34 UTC