- From: <hallam@w3.org>
- Date: Fri, 23 Feb 96 19:24:50 -0500
- To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
- Cc: hallam@w3.org
I would be very happy for us to have a very expressive URI template syntax avaliable. But I would still like the simpler forms to be avaliable as well. I can imagine that in many cases a spec will not wish to permit the full flexibility of regular expressions because they imply a heavier overhead. I'm also nervous of the complexity of regular expressions. Most users can't work out regular expressions in my experience :-( I'm also keen that URIs be a subset of templates. This is not possible if people want to use * and other characters which are valid in URIs. I chose %* because it is an illegal character sequence and no browser has any business generating these as output, no document should include them. Ari gives as an example: >and this shell expression: > > http://*.netscape.com* > >is not the same as RE: > > http://[^.:/]*\.netscape\.com* It would simplify a lot of things if we had a wildcard which intelligently used the structure of the URI. John Mallery suggested a lexicaly scoped match. It would wildcard only within a segment of a URI. Call it $ for now. http://%$.netscape.com/ Would by definition only match against dns names with a .netscape.com suffix. The structural rather than the syntactic approach has other advantages, it would allow URLs to be automatically regarded as equivalent if they matched on the case of the dns address. Ie we would also match WWW.NETSCAPE.COM because DNS is case insensitive. do people really need full REs or are they only going to use them to achieve the structurally based matches? Note that this suggestion also ties in with Dave K.'s suggestion. I thought about putting in a switch to turn off case sensitivity, call it %# http://www.w3.org/%#pub/WWW%* would match http://www.w3.org/pub/WWW/FRED http://www.w3.org/pub/WWW/Fred http://www.w3.org/PUB/www/Fred And so on. Oh that the Web had not been made case sensitive! Ever tried to tell someone how to access a W3C URL? "Capital W Capital W Capital W Slash people with a Capital P" AGGGHHHHHHH! It would be nice if there was a way to turn off case sensitivity inside the servers... :-) Current scorecard: Structural Matches %* Syntactic wildcard, obvious semantics %? Structural wildcard - can we achieve DMK and PHB matching with same char? %# Turn off case sensitivity Should we include [...] and [^...] analogues as well? I will look into the question of URLs, legal characters an so on. My beleif is that * is a perfectly legal URL character :-( [ and ] may not be but are effectively off limits due to a very large number of VMS servers which will break if people attempt to coopt them. { and } look very tempting indeed :-) Its easy to read the spec and decide if a char is legal, that does not mean that we won't end up with lots of things breaking. While I'm not known for tollerance of broken implementations I think we should seriously consider the issue. Perhaps we could ask the AltaVista folks nicelly to run a check to see just how many URLs out there might be affected. If they discovered that only 200 URLs actually included a * then we could coopt the character even if it was not a reserved character. If we found 200,000 then thats an awfull lot of irate users and we should look at the dns name of their sites to decide if they matter:-) Phill
Received on Friday, 23 February 1996 16:27:16 UTC