- From: Ashok Malhotra <ashok.malhotra@oracle.com>
- Date: Tue, 13 Dec 2005 12:35:16 -0800
- To: Dan Connolly <connolly@w3.org>
- CC: www-tag@w3.org, w3c-xsl-query@w3c.org
Dan: Thanks! We seem agreed on escaping the # mark in encode-for-uri. Mike had also made a suggestions re. iri-to-uri. Any thoughts on this? Details below: =========================================================== We would expect to find the spec for iri-to-uri() in RFC3987, and sure enough, it's there. What it says is that every character in "ucschar" or "iprivate" must be %-encoded. That's defined like this: ucschar = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD / %xD0000-DFFFD / %xE1000-EFFFD iprivate = %xE000-F8FF / %xF0000-FFFFD / %x100000-10FFFD which is pretty much the same as saying "non-ASCII characters" (and thus overlaps rather with escape-html-uri()). Since we now have a function called iri-to-uri(), it would seem that it ought to do what the IRI spec says. All the best, Ashok > -----Original Message----- > From: Dan Connolly [mailto:connolly@w3.org] > Sent: Thursday, December 08, 2005 7:07 AM > To: ashok.malhotra@oracle.com > Cc: www-tag@w3.org > Subject: Re: FW: Escaping the # mark > > On Wed, 2005-12-07 at 11:00 -0800, Ashok Malhotra wrote: > > > The current definitions are in sections 7.4.10 and 7.4.11 in > > > http://www.w3.org/TR/xpath-functions/ > [...] > > > > > > Currently encode-for-uri() does NOT escape a "#" sign. > > > > > > This seems contrary to the purpose of the function, > > to wit: "This function should be used to process an xs:string > to be used as a path segment in a URI." > > yes, encode-for-uri("the #1 soft-drink") should be > 'the%20%231%20soft-drink' > > I just did a quick test with python: > > >>> urllib.quote("the #1 soft-drink") > 'the%20%231%20soft-drink' > > [...] > > > > > > Note: I was alerted to the oddity of the current spec by the test > > > results for fn-encode-for-uri1args-1 and related tests. > > > The Saxon implementation currently does escape "#". > > Ah. good to see these details showing up in testing. > > > > Having looked at this, we should then look at the > > > iri-to-uri() list as well. > [... I skipped this stuff; it doesn't seem to be relevant, > given the subject line.] > > > -- > Dan Connolly, W3C http://www.w3.org/People/Connolly/ > D3C2 887B 0F92 6005 C541 0875 0F91 96DE 6E52 C29E > > >
Received on Tuesday, 13 December 2005 20:39:56 UTC