implementation report: Ft.Lib.Uri in 4Suite

I am the primary author of the Python module Ft.Lib.Uri, which is part of
4Suite, a somewhat popular XML and RDF processing toolkit.

At first, the Ft.Lib.Uri module existed mainly just to make available a URI
resolver class that would perform the typical duties of resolution of URI
references to absolute form, and retrieval of resource representations, all
according to the published RFC 2396 spec.

The implementation relied heavily upon both public and undocumented APIs in
Python's core urllib module. Over time, it became evident that urllib, even as
of Python 2.3, is way out of date, not even implementing RFC 2396 properly. It
also has a number of quirks that I have been work around as I discover them.

As the implementation matured, it became evident that I was going to have to
ditch urllib altogether and reimplement various functions, such as
percent-encoding (urllib.quote), from scratch. I off the reimplementation as
long as I could, instead concentrating on keeping up with the RFC 2396bis
drafts. But then, a couple weeks ago, after I responded to urllib bug reports
on the Python project page on SourceForge, I ended up persuading someone to
update the urllib.unquote function with a similar, slightly better function
from the urlparse module.

Both urlparse and urllib are still languishing in a pre-RFC 2396 world, but
since there does seem to be some interest in improving it, I decided to stop
working around its quirks, and instead start phasing it out as a dependency of
Ft.Lib.Uri.

As I started working on implementing the percent-encoding/decoding stuff, and
making functions deal smartly with both Unicode and encoded strings, it became
evident that RFC 2396bis, as of draft 04, left many decisions up to me, I
think perhaps moreso than intended, so I will be posting here in a bit with
more on that.

I regret that I am about to leave on vacation and will not have time to bring
Ft.Lib.Uri up to date with draft 05 until after I get back in mid-May. The
module's functions, are, however, compliant with draft 04, if you get the
current implementation from anonymous CVS. The files you will want to look at
are 4Suite/Ft/Lib/Uri.py and 4Suite/test/Lib/test_uri.py. The regression tests
work in an old homegrown framework (predating pyunit), but anyone familiar
with Python should be able to figure out what the tests do, if not exactly how
they work.

To fully test it out, on Unix, if you have Python 2.2 or higher (and CVS and a
C compiler), do this:

cvs -d:pserver:anonymous@cvs.4suite.org:/var/local/cvsroot login
(no password, just hit enter at the prompt)
cvs -d:pserver:anonymous@cvs.4suite.org:/var/local/cvsroot get 4Suite
cd 4Suite

To build and install in the normal locations, then as root, just do
python setup.py install

To instead build and install to a folder in your home dir
(e.g., /home/mike/pythonlibs), do
python setup.py config --home=/home/mike/pythonlibs
python setup.py install
...and then make sure that folder is in your PYTHONPATH environment variable.

Once installed, go to wherever the test suites ended up. In these examples
 they'll be something like /usr/local/lib/4Suite/tests or
 /home/mike/pythonlibs/lib/4Suite/tests. Then run
python test.py -v Lib/test_uri.py

The only failures should be in OsPathToUri with platform 'nt', reflecting a
bit of indecision on my part as to whether it's prudent to treat "/" as
synonymous with "\" when the "/" is given in a path that is purportedly for
Windows.

4Suite info:
http://4suite.org/

browse 4Suite CVS:
http://cvs.4suite.org/

Ft.Lib.Uri latest version:
http://cvs.4suite.org/cgi-bin/viewcvs.cgi/~checkout~/4Suite/Ft/Lib/Uri.py?rev=HEAD&content-type=text/plain

test_uri.py latest version:
http://cvs.4suite.org/cgi-bin/viewcvs.cgi/~checkout~/4Suite/test/Lib/test_uri.py?rev=HEAD&content-type=text/plain


Please email me with any questions and I'll get to them when I get back from
vacation.

-Mike

Received on Tuesday, 20 April 2004 01:55:58 UTC