- From: Phil Archer <phil@philarcher.org>
- Date: Mon, 06 Apr 2009 17:22:25 +0100
- To: Thomas Roessler <tlr@w3.org>
- CC: Public POWDER <public-powderwg@w3.org>
Thomas, Having got the docs published on Friday and all the e-mails sent today, I turned to revising my implementation of the canonicalisation steps (http://i-sieve.com/cgi-bin/canon.cgi). My short term aim is to create a standalone tool that demonstrates the canonicalisation steps for candidate IRIs and POWDER doc data so we can play around, create some test data etc. OK, so first things first. I think I now see where my confusion has been wrt. %decoding a +/space transliteration - it's all in the form/cgi stuff. The form parsing script I use includes these lines: tr/+/ /; s/%(..)/pack("c",hex($1))/ge; which does the transliteration and then the %decoding, which is correct and that's why it's in the doc and so firmly fixed in my head. However... I see where you're concerned and correct too - all that does is to make sure that the encoding that the browser does is reversed so if I put in http://example.com/staff/Fran%c3%a7ois I get out... http://example.com/staff/Fran%c3%a7ois Hmmm... not the desired outcome. It's necessary _after_ the initial form decoding to _then_ do a second round of % decoding to get to what we actually want which is the c cedila in the name. OK, so in terms of the spec, I can see that the line in section 2.1.4.1 that says: Percent encoded triples are converted into the characters they represent... is correct and should stay. But + characters in the query string are converted to spaces is not correct and should go (an IRI with me%20+%20you in the query string would map to one with 3 spaces which is wrong). Right, moving on through the canonicalisation steps. I found the Perl module that does the LibIDN stuff and got the i-sieve hosting company to install that successfully. So that something like this: €ürö.example.com.?me+you=them,finally=this is canonicalised properly to http://xn--r-1gaq1653a.example.com/?me+you=them,finally=this Whoopee! Aha... but I missed something. I had to get the hosting company to install a library to normalise the string to Form C as well and that took a little longer. OK, now it is in place so these work with or without normalisation to Form C: http://example.com/staff/Fran%c3%a7ois http://example.com/my%20doc.doc http://www.example.com/foo/his%2Fhers See for yourself at http://i-sieve.com/cgi-bin/canon.cgi Now try €ürö.example.com.?me+you=them,finally=this That works too, right? That's because I've made the normalisation to Form C optional. Switch it on and the ToASCII function fails (verbose output is switched on in case of an error). Now, it would help enormously if I could test whether the Form C thing is working properly or whether I need to do something more to the code to make it work properly. Something you can help with perhaps please? Phil. -- Phil Archer http://philarcher.org/www@20/ i-sieve technologies | W3C Mobile Web Initiative Making Sense of the Buzz | www.w3.org/Mobile
Received on Monday, 6 April 2009 16:23:02 UTC