- From: Phil Archer <phil@philarcher.org>
- Date: Mon, 06 Apr 2009 17:22:25 +0100
- To: Thomas Roessler <tlr@w3.org>
- CC: Public POWDER <public-powderwg@w3.org>
Thomas,
Having got the docs published on Friday and all the e-mails sent today,
I turned to revising my implementation of the canonicalisation steps
(http://i-sieve.com/cgi-bin/canon.cgi). My short term aim is to create a
standalone tool that demonstrates the canonicalisation steps for
candidate IRIs and POWDER doc data so we can play around, create some
test data etc.
OK, so first things first. I think I now see where my confusion has been
wrt. %decoding a +/space transliteration - it's all in the form/cgi
stuff. The form parsing script I use includes these lines:
tr/+/ /;
s/%(..)/pack("c",hex($1))/ge;
which does the transliteration and then the %decoding, which is correct
and that's why it's in the doc and so firmly fixed in my head.
However... I see where you're concerned and correct too - all that does
is to make sure that the encoding that the browser does is reversed so
if I put in
http://example.com/staff/Fran%c3%a7ois
I get out...
http://example.com/staff/Fran%c3%a7ois
Hmmm... not the desired outcome. It's necessary _after_ the initial form
decoding to _then_ do a second round of % decoding to get to what we
actually want which is the c cedila in the name.
OK, so in terms of the spec, I can see that the line in section 2.1.4.1
that says:
Percent encoded triples are converted into the characters they
represent... is correct and should stay. But
+ characters in the query string are converted to spaces
is not correct and should go (an IRI with me%20+%20you in the query
string would map to one with 3 spaces which is wrong).
Right, moving on through the canonicalisation steps.
I found the Perl module that does the LibIDN stuff and got the i-sieve
hosting company to install that successfully. So that something like this:
€ürö.example.com.?me+you=them,finally=this
is canonicalised properly to
http://xn--r-1gaq1653a.example.com/?me+you=them,finally=this
Whoopee!
Aha... but I missed something. I had to get the hosting company to
install a library to normalise the string to Form C as well and that
took a little longer. OK, now it is in place so these work with or
without normalisation to Form C:
http://example.com/staff/Fran%c3%a7ois
http://example.com/my%20doc.doc
http://www.example.com/foo/his%2Fhers
See for yourself at http://i-sieve.com/cgi-bin/canon.cgi
Now try
€ürö.example.com.?me+you=them,finally=this
That works too, right? That's because I've made the normalisation to
Form C optional. Switch it on and the ToASCII function fails (verbose
output is switched on in case of an error).
Now, it would help enormously if I could test whether the Form C thing
is working properly or whether I need to do something more to the code
to make it work properly.
Something you can help with perhaps please?
Phil.
--
Phil Archer
http://philarcher.org/www@20/
i-sieve technologies | W3C Mobile Web Initiative
Making Sense of the Buzz | www.w3.org/Mobile
Received on Monday, 6 April 2009 16:23:02 UTC