- From: Ashok Malhotra <ashok.malhotra@oracle.com>
- Date: Tue, 13 Dec 2005 12:35:16 -0800
- To: Dan Connolly <connolly@w3.org>
- CC: www-tag@w3.org, w3c-xsl-query@w3c.org
Dan:
Thanks! We seem agreed on escaping the # mark in encode-for-uri.
Mike had also made a suggestions re. iri-to-uri. Any thoughts on this?
Details below:
===========================================================
We would expect to find the spec for iri-to-uri() in RFC3987, and sure
enough, it's there. What it says is that every character in "ucschar" or
"iprivate" must be %-encoded. That's defined like this:
ucschar = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
/ %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
/ %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
/ %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
/ %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
/ %xD0000-DFFFD / %xE1000-EFFFD
iprivate = %xE000-F8FF / %xF0000-FFFFD / %x100000-10FFFD
which is pretty much the same as saying "non-ASCII characters" (and thus
overlaps rather with escape-html-uri()).
Since we now have a function called iri-to-uri(), it would seem that it
ought to do what the IRI spec says.
All the best, Ashok
> -----Original Message-----
> From: Dan Connolly [mailto:connolly@w3.org]
> Sent: Thursday, December 08, 2005 7:07 AM
> To: ashok.malhotra@oracle.com
> Cc: www-tag@w3.org
> Subject: Re: FW: Escaping the # mark
>
> On Wed, 2005-12-07 at 11:00 -0800, Ashok Malhotra wrote:
> > > The current definitions are in sections 7.4.10 and 7.4.11 in
> > > http://www.w3.org/TR/xpath-functions/
> [...]
> > >
> > > Currently encode-for-uri() does NOT escape a "#" sign.
> > >
> > > This seems contrary to the purpose of the function,
>
> to wit: "This function should be used to process an xs:string
> to be used as a path segment in a URI."
>
> yes, encode-for-uri("the #1 soft-drink") should be
> 'the%20%231%20soft-drink'
>
> I just did a quick test with python:
>
> >>> urllib.quote("the #1 soft-drink")
> 'the%20%231%20soft-drink'
>
> [...]
> > >
> > > Note: I was alerted to the oddity of the current spec by the test
> > > results for fn-encode-for-uri1args-1 and related tests.
> > > The Saxon implementation currently does escape "#".
>
> Ah. good to see these details showing up in testing.
>
> > > Having looked at this, we should then look at the
> > > iri-to-uri() list as well.
> [... I skipped this stuff; it doesn't seem to be relevant,
> given the subject line.]
>
>
> --
> Dan Connolly, W3C http://www.w3.org/People/Connolly/
> D3C2 887B 0F92 6005 C541 0875 0F91 96DE 6E52 C29E
>
>
>
Received on Tuesday, 13 December 2005 20:39:56 UTC