RE: XML Schema WG comments on Functions and Operators

RFC 2396 says 

   Implementers should be careful not to
   escape or unescape the same string more than once, since unescaping
   an already unescaped string might lead to misinterpreting a percent
   data character as another escaped character, or vice versa in the
   case of escaping an already escaped string.

The rule we have included is designed to implement this guidance as best we
can. There is no answer that will suit everyone here, because the design of
the escape mechanism is frankly lousy. I think it's important that
escape-uri() should be convergent in the sense that when applied to its own
output it has no effect. RFC 2396 waffles about the problems, but we have to
make a decision one way or the other, and I think the way we have defined it
will cause less trouble than if we defined it the other way.

XML Linking doesn't have this problem because it only allows you to escape a
wannabe-URI once.

Michael Kay


> -----Original Message-----
> From: Xan Gregg [mailto:xan.gregg@jmp.com] 
> Sent: 08 October 2003 15:14
> To: Ashok Malhotra
> Cc: C. M. Sperberg-McQueen; public-qt-comments@w3.org; Kay, 
> Michael; W3C XML Schema IG
> Subject: Re: XML Schema WG comments on Functions and Operators
> 
> 
>  From XML Schema comments, section 2.8:
> >>     In particular, some members of the XML Schema WG were surprised
> >> to see
> >>     that your algorithm escapes the percent sign in some 
> cases but not
> >>     others; this does not seem to be a feature of the 
> algorithm given 
> >> by
> >>     XML Linking and by the Character Model.
> 
>  From Ashok:
> > ...A little later RFC 2396 says
> >
> > " Because the percent "%" character always has the reserved 
> purpose of
> >    being the escape indicator, it must be escaped as "%25" 
> in order to
> >    be used as data within a URI."
> >
> > Our reading of this rule is that the % must be escaped unless it is
> > the start of an escape sequence %HH.
> >
> > This reading of 2396 was the basis of the rule in the F&O which says
> >
> > ".... The PERCENT SIGN "%" character itself is escaped only if it is
> > not followed by two hexadecimal digits (that is, 0-9, a-f and A-F)."
> 
> I think the group's concern about percent was that the 
> algorithm treats 
> all occurrences of %HH as pre-escaped characters which means 
> that some 
> strings containing percent cannot be escaped by fn:escape-uri().  
> Consider the two resource names:
> 
>     10%GOOD.HTML
>     10%BAD.HTML
> 
> fn:escape-uri() will change the former to "10%25GOOD.HTML", but the 
> latter will remain unchanged and won't work when fed to some 
> unescaping 
> processor.  This is a pretty unlikely case, and maybe the F&O 
> intentionally does not handle it, preferring to assume that the 
> incoming string to the escape-uri function is already escaped to some 
> degree. (Maybe the F&O function should be called 
> "fn:escape-uri-further".)
> 
> As I understand it, both the XML Linking specification and RFC 2396 
> would have the percent converted to "%25" in both names of my example.
> 
> xan
> 

Received on Wednesday, 8 October 2003 19:24:46 UTC