W3C home > Mailing lists > Public > www-tag@w3.org > December 2005

RE: FW: Escaping the # mark

From: Ashok Malhotra <ashok.malhotra@oracle.com>
Date: Tue, 13 Dec 2005 12:35:16 -0800
To: Dan Connolly <connolly@w3.org>
CC: www-tag@w3.org, w3c-xsl-query@w3c.org
Message-ID: <20051213123517103.00000003008@amalhotr-pc>

Dan:
Thanks! We seem agreed on escaping the # mark in encode-for-uri.

Mike had also made a suggestions re. iri-to-uri.  Any thoughts on this?
Details below:

===========================================================
We would expect to find the spec for iri-to-uri() in RFC3987, and sure
enough, it's there. What it says is that every character in "ucschar" or
"iprivate" must be %-encoded. That's defined like this:

ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
                  / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                  / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                  / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                  / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                  / %xD0000-DFFFD / %xE1000-EFFFD

   iprivate       = %xE000-F8FF / %xF0000-FFFFD / %x100000-10FFFD

which is pretty much the same as saying "non-ASCII characters" (and thus
overlaps rather with escape-html-uri()).

Since we now have a function called iri-to-uri(), it would seem that it
ought to do what the IRI spec says.

All the best, Ashok
 

> -----Original Message-----
> From: Dan Connolly [mailto:connolly@w3.org] 
> Sent: Thursday, December 08, 2005 7:07 AM
> To: ashok.malhotra@oracle.com
> Cc: www-tag@w3.org
> Subject: Re: FW: Escaping the # mark
> 
> On Wed, 2005-12-07 at 11:00 -0800, Ashok Malhotra wrote:
> > > The current definitions are in sections 7.4.10 and 7.4.11 in 
> > > http://www.w3.org/TR/xpath-functions/
> [...]
> > > 
> > > Currently encode-for-uri() does NOT escape a "#" sign.
> > > 
> > > This seems contrary to the purpose of the function,
> 
> to wit: "This function should be used to process an xs:string 
> to be used as a path segment in a URI."
> 
> yes, encode-for-uri("the #1 soft-drink") should be
>   'the%20%231%20soft-drink'
> 
> I just did a quick test with python:
> 
> >>> urllib.quote("the #1 soft-drink")
> 'the%20%231%20soft-drink'
> 
> [...]
> > > 
> > > Note: I was alerted to the oddity of the current spec by the test 
> > > results for fn-encode-for-uri1args-1 and related tests.
> > > The Saxon implementation currently does escape "#".
> 
> Ah. good to see these details showing up in testing.
> 
> > > Having looked at this, we should then look at the
> > > iri-to-uri() list as well.
> [... I skipped this stuff; it doesn't seem to be relevant, 
> given the subject line.]
> 
> 
> --
> Dan Connolly, W3C http://www.w3.org/People/Connolly/
> D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
> 
> 
> 
Received on Tuesday, 13 December 2005 20:39:56 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:56:10 UTC