W3C home > Mailing lists > Public > xsl-editors@w3.org > April to June 2000

Re: attribute value escaping

From: Scott Boag/CAM/Lotus <Scott_Boag@lotus.com>
Date: Mon, 12 Jun 2000 12:04:25 -0400
To: xsl-editors@w3.org
Cc: w3c-xsl-wg@w3.org
Message-ID: <OF420ABEEB.803F8D64-ON852568FC.0057D01D@lotus.com>

> If the answer is no, as I think it should be

I think it should be no also.  The part in
http://www.w3.org/TR/xslt#section-HTML-Output-Method where it says "The
html output method should escape non-ASCII characters in URI attribute
values using the method recommended in Section B.2.1 of the HTML 4.0
Recommendation." is highly problematic, from my experience, and this line
should be deprecated.


                    Mike Brown                                                                                             
                    <mike@skew.org>              To:     xsl-list@mulberrytech.com                                         
                    Sent by:                     cc:     xsl-editors@w3.org, (bcc: Scott Boag/CAM/Lotus)                   
                    owner-xsl-list@mulber        Subject:     Re: attribute value escaping                                 
                    06/09/2000 08:59 PM                                                                                    
                    Please respond to                                                                                      

Mike Kay wrote:
> The only characters I escape are non-ASCII characters (specifically,
> characters not in the range 32 to 126) plus space and "%".

I think the only room for leeway here is in the XSLT spec's statement that
the HTML output method outputs HTML that "conforms to" the Recommendation.
Since the HTML DTDs are part of the Recommendation, and they all state
that certain attribute values (like img src) must be URIs conforming to
RFC 2396, then the question is, should the XSL processor try to ensure
that such attribute values conform to RFC 2396?

If the answer is yes, then not only should non-ASCII characters be escaped
as per the spec, but spaces should be as well. "%" should be escaped if it
is not followed by a pair of hex characters from [012345679ABCDEF]. This
of course is problematic because it assumes that anything that looks like
an escape sequence must be one. All other characters, including "`" like I
mentioned earlier in this thread, would not be safe to escape, per RFC
2396 sec. 2.4.2., which emphasizes that URIs are by definition "already

If the answer is no, as I think it should be, then only the non-ASCII
characters should be escaped, and spaces, "%" etc should be left alone,
because it is the document author's job to make it be a valid URI.

Obviously this will not help the poor guy trying to write JHTML that
(apparently) allows an img src to be script data. Even if the HTML spec
allowed an img src to be script data, the XSLT spec's omission of "script
data"-type attribute values from the no-escaping-for-<script>-and-<style>
clause would still cause problems.

I am cc'ing xsl-editors@w3.org because I'd like to know the answer to the
yes/no question above, re: escaping URI-type attribute values in the HTML
output method, and because I would like to know if "script data"-type
attribute values should get the same no-escaping treatment as <script> and
<style> elements.

   - Mike
Mike J. Brown, software engineer at         My XML/XSL resources:
webb.net in Denver, Colorado, USA           http://www.skew.org/xml/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
Received on Monday, 12 June 2000 12:27:29 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:44:18 UTC