Re: escape-uri-attributes from Werner Donné on 2002-01-28 (xsl-editors@w3.org from January to March 2002)

From: Werner Donné <werner.donne@re.be>
Date: Mon, 28 Jan 2002 11:56:12 +0100
To: Jeni Tennison <jeni@jenitennison.com>
CC: xsl-editors@w3.org
Message-ID: <3C552E4C.30608@re.be>
Hi Jeni,

I think only XLink allows partially escaped URIs, not XML Schema.
Section 3.2.17 of XML Schema Part2 mandates adherence to RFC 2396
and RFC 2732. It only refers to XLink for the escaping procedure.

IMHO it is disagreeable that XLink allows unescaped URIs for the href
attribute. It serves no purpose and only brings confusion. In general it
doesn't even work, because a URI resolver can't always escape autonomously
all URI segments. It can only interpret against the URI alphabet, while the
reserved characters may differ per segment depending on the medium that
generates the URI. In other words, knowledge of the medium is sometimes
needed to determine what to escape. In some cases this may be more
restrictive than what is allowed by the URI alphabet. There has been a
post about this in the XLink list:
http://lists.w3.org/Archives/Public/www-xml-linking-comments/2001JulSep/0032.html

Double escaping of the percent sign is always the consequence of a URI
analysis error. One must examine if special characters are used in their
normal URI function before escaping them. So if a valid escape sequence is
encountered it must be left alone, even if the generator meant otherwise.
The latter would have to escape the percent sign himself in that case.

Regards,

Werner.

Jeni Tennison wrote:

> Hi Werner,
> 
> 
>>As a consequence, I think URIs should always be serialised in
>>escaped form, otherwise you're not serialising URIs. I therefore
>>fail to see to the use of the escape-uri-attributes attribute. Also
>>in HTML and XHTML URIs must be in escaped form, otherwise they are
>>not URIs and validation should fail. Only the processor of the HTML
>>or XHTML file may unescape an URI during some kind of
>>interpretation.
>>
> 
> Hmm... this is a tricky area. The XLink Rec and the XML Schema
> Datatypes Rec imply that partially-escaped URIs can be the lexical
> values of xs:anyURI attributes (e.g. xlink:href). By partially
> escaped, I mean that XML Schema and XLink will accept URIs in which
> non-ASCII characters and some disallowed characters are not escaped.
> See http://www.w3.org/TR/xlink/#link-locators for an exact
> description.
> 
> I think that the effect of the escaping rules from XLink and XML
> Schema is that if you read in a partially-escaped xs:anyURI value,
> then escaped it according to those same rules, then you'd get the same
> xs:anyURI value - you don't get double-escaping problems because %
> isn't amongst the disallowed characters that are escaped according to
> the XLink rules.
> 
> I didn't have a clear idea of why escape-uri-attributes was required
> for HTML and XHTML output methods in the first place (I just thought
> that if it's relevant for them then it's probably relevant for the XML
> output method too). Then I thought it might be to avoid
> double-escaping. But now I don't understand again - the XSLT 2.0 WD
> only talks about escaping non-ASCII characters (not all disallowed
> characters), so presumably processors shouldn't escape % signs anyway,
> and double-escaping isn't an issue.
> 
> So, I think I agree with you. I can't see any reason why all URIs
> serialised by XSLT shouldn't be escaped according to the rules in the
> XLink Rec, and don't understand the requirement for the
> escape-uri-attributes attribute.
> 
> Perhaps one of the WG could explain?
> 
> Cheers,
> 
> Jeni
> 
> ---
> Jeni Tennison
> http://www.jenitennison.com/
> 
> 
> 


-- 
Werner Donné  --  Re BVBA
Engelbeekstraat 8
B-3300 Tienen
tel: (+32) 486 425803	e-mail: werner.donne@re.be
Received on Monday, 28 January 2002 05:56:32 UTC