- From: Mark Nottingham <mnot@akamai.com>
- Date: Fri, 10 Aug 2001 20:43:42 -0700
- To: Aaron Swartz <aswartz@upclink.com>
- Cc: uri@w3.org
You remind me - I logged a bug with Python a while back regarding their urlparse module; it represents a URI as a six-tuple ( scheme, authority, path, parameters, query, fragment ) As I read 2396, parameters can occur as a suffix to *each* path segment, not only on the final one. Guido's response was "great! fix it!", but I haven't had time to make a go of it. URI list, is this reading correct? Any opinions as to what data structure is best to represent a parsed URI with? (I'm thinking just make it a 5-tuple, taking out 'parameters', and optionally making 'path' a list of 2-tuples, but that's probably not useful in a lot of scenarios). On Wed, Aug 08, 2001 at 04:15:20PM -0500, Aaron Swartz wrote: > Below is a test suite for Python based on Appendix C of RFC2396. > Python's built-in urlparse module fails the test suite in a > number of instances. I used a similar test suite to validate my > Tcl URI parser. > > Enjoy, > -- > [ "Aaron Swartz" ; <mailto:me@aaronsw.com> ; <http://www.aaronsw.com/> ] > > import urlparse > > base = 'http://a/b/c/d;p?q' > > assert urlparse.urljoin(base, 'g:h') == 'g:h' > assert urlparse.urljoin(base, 'g') == 'http://a/b/c/g' > assert urlparse.urljoin(base, './g') == 'http://a/b/c/g' > assert urlparse.urljoin(base, 'g/') == 'http://a/b/c/g/' > assert urlparse.urljoin(base, '/g') == 'http://a/g' > assert urlparse.urljoin(base, '//g') == 'http://g' > assert urlparse.urljoin(base, '?y') == 'http://a/b/c/?y' > assert urlparse.urljoin(base, 'g?y') == 'http://a/b/c/g?y' > assert urlparse.urljoin(base, '#s') == 'http://a/b/c/d;p?q#s' > assert urlparse.urljoin(base, 'g#s') == 'http://a/b/c/g#s' > assert urlparse.urljoin(base, 'g?y#s') == 'http://a/b/c/g?y#s' > assert urlparse.urljoin(base, ';x') == 'http://a/b/c/;x' > assert urlparse.urljoin(base, 'g;x') == 'http://a/b/c/g;x' > assert urlparse.urljoin(base, 'g;x?y#s') == 'http://a/b/c/g;x?y#s' > assert urlparse.urljoin(base, '.') == 'http://a/b/c/' > assert urlparse.urljoin(base, './') == 'http://a/b/c/' > assert urlparse.urljoin(base, '..') == 'http://a/b/' > assert urlparse.urljoin(base, '../') == 'http://a/b/' > assert urlparse.urljoin(base, '../g') == 'http://a/b/g' > assert urlparse.urljoin(base, '../..') == 'http://a/' > assert urlparse.urljoin(base, '../../') == 'http://a/' > assert urlparse.urljoin(base, '../../g') == 'http://a/g' > > assert urlparse.urljoin(base, '') == base > > assert urlparse.urljoin(base, '../../../g') == 'http://a/../g' > assert urlparse.urljoin(base, '../../../../g') == 'http://a/../../g' > > assert urlparse.urljoin(base, '/./g') == 'http://a/./g' > assert urlparse.urljoin(base, '/../g') == 'http://a/../g' > assert urlparse.urljoin(base, 'g.') == 'http://a/b/c/g.' > assert urlparse.urljoin(base, '.g') == 'http://a/b/c/.g' > assert urlparse.urljoin(base, 'g..') == 'http://a/b/c/g..' > assert urlparse.urljoin(base, '..g') == 'http://a/b/c/..g' > > assert urlparse.urljoin(base, './../g') == 'http://a/b/g' > assert urlparse.urljoin(base, './g/.') == 'http://a/b/c/g/' > assert urlparse.urljoin(base, 'g/./h') == 'http://a/b/c/g/h' > assert urlparse.urljoin(base, 'g/../h') == 'http://a/b/c/h' > assert urlparse.urljoin(base, 'g;x=1/./y') == > 'http://a/b/c/g;x=1/y' > assert urlparse.urljoin(base, 'g;x=1/../y') == 'http://a/b/c/y' > > assert urlparse.urljoin(base, 'g?y/./x') == > 'http://a/b/c/g?y/./x' > assert urlparse.urljoin(base, 'g?y/../x') == > 'http://a/b/c/g?y/../x' > assert urlparse.urljoin(base, 'g#s/./x') == > 'http://a/b/c/g#s/./x' > assert urlparse.urljoin(base, 'g#s/../x') == > 'http://a/b/c/g#s/../x' > -- Mark Nottingham, Research Scientist Akamai Technologies (San Mateo, CA USA)
Received on Friday, 10 August 2001 23:43:48 UTC