Re: URI Pattern Syntax from hallam@w3.org on 1996-02-24 (ietf-http-wg@w3.org from January to March 1996)

From: <hallam@w3.org>
Date: Fri, 23 Feb 96 19:24:50 -0500
To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Cc: hallam@w3.org
Message-Id: <9602240024.AA15330@zorch.w3.org>
I would be very happy for us to have a very expressive URI template syntax 
avaliable. But I would still like the simpler forms to be avaliable as well. I 
can imagine that in many cases a spec will not wish to permit the full 
flexibility of regular expressions because they imply a heavier overhead.


I'm also nervous of the complexity of regular expressions. Most users can't work 
out regular expressions in my experience :-( 

I'm also keen that URIs be a subset of templates. This is not possible if people 
want to use * and other characters which are valid in URIs. I chose %* because 
it is an illegal character sequence and no browser has any business generating 
these as output, no document should include them.


Ari gives as an example:

>and this shell expression:
>
>	http://*.netscape.com*
>
>is not the same as RE:
>
>	http://[^.:/]*\.netscape\.com*

It would simplify a lot of things if we had a wildcard which intelligently used 
the structure of the URI. John Mallery suggested a lexicaly scoped match. It 
would wildcard only within a segment of a URI. Call it $ for now.

http://%$.netscape.com/

Would by definition only match against dns names with a .netscape.com suffix.


The structural rather than the syntactic approach has other advantages, it would 
allow URLs to be automatically regarded as equivalent if they matched on the 
case of the dns address. Ie we would also match WWW.NETSCAPE.COM because DNS is 
case insensitive.


do people really need full REs or are they only going to use them to achieve the 
structurally based matches?

Note that this suggestion also ties in with Dave K.'s suggestion.


I thought about putting in a switch to turn off case sensitivity, call it %#

http://www.w3.org/%#pub/WWW%*

would match

http://www.w3.org/pub/WWW/FRED
http://www.w3.org/pub/WWW/Fred
http://www.w3.org/PUB/www/Fred

And so on. Oh that the Web had not been made case sensitive!

Ever tried to tell someone how to access a W3C URL?

"Capital W Capital W Capital W Slash people with a Capital P"
		AGGGHHHHHHH!

It would be nice if there was a way to turn off case sensitivity inside the 
servers... :-)


Current scorecard:

Structural Matches

%*	Syntactic wildcard, obvious semantics
%?	Structural wildcard - can we achieve DMK and PHB matching with
	same char?
%#	Turn off case sensitivity

Should we include [...] and [^...] analogues as well?


I will look into the question of URLs, legal characters an so on. My beleif is 
that * is a perfectly legal URL character :-( [ and ] may not be but are 
effectively off limits due to a very large number of VMS servers which will 
break if people attempt to coopt them. { and } look very tempting indeed :-)

Its easy to read the spec and decide if a char is legal, that does not mean that 
we won't end up with lots of things breaking. While I'm not known for tollerance 
of broken implementations I think we should seriously consider the issue. 
Perhaps we could ask the AltaVista folks nicelly to run a check to see just how 
many URLs out there might be affected. If they discovered that only 200 URLs 
actually included a * then we could coopt the character even if it was not a 
reserved character. If we found 200,000 then thats an awfull lot of irate users 
and we should look at the dns name of their sites to decide if they matter:-)

	Phill
Received on Friday, 23 February 1996 16:27:16 UTC