Re: HTML output method: SCRIPT and STYLE escaping

Kay, Michael wrote:
> > XSLT 2.0 section 20.3 says:
> > 
> >   The html output method should not perform escaping for the 
> > content of
> >   the script and style elements.
> > 
> > According to HTML 4.01 section B.3.2, "</" within the content 
> > of a SCRIPT or STYLE slement should be escaped.
> 
> Thanks for alerting me to this (non-normative) appendix.

  'The following notes are informative, not normative. Despite the
  appearance of words such as "must" and "should", all requirements
  in this section appear elsewhere in the specification.'

To me this is saying "we're designating this section as informative even
though it reads like it's normative. The reason we're doing this is because
there's nothing new here; all requirements listed here are also listed 
elsewhere." So the appendix is just further explanation of requirements that 
are more formally defined elsewhere. For example,

  "HTML is an SGML application..."
  (http://www.w3.org/TR/REC-html40/conform.html#h-4.2)

...therefore it is implicit that an HTML document must conform to SGML's 
restrictions.

I don't think we are in disagreement over what constitutes "correct" vs
"widely-accepted but incorrect" HTML. I think you are just misinterpreting my
recommendations (that the spec acknowledge these issues and encourage the
output of correct HTML) being a call to make failure to output correct HTML be
regarded as an error or conformance issue for an XSLT processor. I'm sorry
that I wasn't more clear.

> 
> HTML SCRIPT elements do cause some serious problems. The fact is, users
> frequently do produce the cited "illegal" example:
> 
>     <SCRIPT type="text/javascript">
>       document.write ("<EM>This won't work</EM>")
>     </SCRIPT>
> 
> and they seem to get away with it. What's more, they produce it in two
> different ways, both of which generally work:
> 
> (1) as an element:
> 
>     <xsl:template match="x">
>     <SCRIPT type="text/javascript">
>       document.write ("<EM>This won't work</EM>")
>     </SCRIPT>
>     </xsl:template>
> 
> (2) as characters:
> 
>     <xsl:template match="x">
>     <SCRIPT type="text/javascript">
>       document.write ("&lt;EM&gt;This won't work&lt;/EM&gt;")
>     </SCRIPT>
>     </xsl:template>

The fact that they get away with either of these is testament to the lenience
of HTML user agents in accepting what is, strictly speaking, illegal HTML. It
is much the same as with the characters #x80-#x9F being disallowed, even by
reference, according to HTML's SGML declaration, yet being accepted in common 
practice. The XSLT spec's failure to encourage HTML generators to produce 
correct output just perpetuates the situation.

I feel that an XSLT processor should not be forbidden from producing strictly
conforming SGML HTML, rather than just widely-accepted tag soup; it should be
encouraged, but not required, to make the output as correct as possible.

"strict in what you produce, lenient in what you accept".. or something like 
that.

> > 
> > It also indicates that unencodable characters appearing in a 
> > SCRIPT or STYLE element "must" be escaped according to the 
> > conventions of the script or style language's syntax -- they 
> > cannot be written as character references. This is a bit 
> > unreasonable to impose on an XSLT processor; nevertheless, it 
> > should be acknowledged. Processors should be given the option 
> > of performing such escaping, and a fallback behavior (of 
> > emitting a character reference anyway, I presume) should be defined.
> > 
> 
> I think the right solution for XSLT is that we should pass on this
> requirement to use language-specific escaping to the user. We should include
> an example showing that the correct way to write the above is:
> 
> (3)
>     <xsl:template match="x">
>     <SCRIPT type="text/javascript">
>       document.write ("&lt;EM&gt;This will work&lt;\/EM&gt;")
>     </SCRIPT>
>     </xsl:template>
> 
> I don't think we should make either (1) or (2) an error, even though they
> result in illegal HTML - there are many, many ways to produce illegal HTML
> as the output of an XSLT stylesheet.

If you do provide an example, then it should also demonstrate the unencodable
character issue. The paragraph quoted above is saying that it is illegal to do
something like this in a us-ascii encoded document:

<script type="text/javascript">
  document.write('&#161;Hola!');
</script>

You must do this (I think this is right):

<script type="text/javascript">
  document.write('\u00A1Hola!');
</script>

I think it is probably safest to just recommend that any non-ASCII characters
in a script or style element be written according to the script or style
language's syntax.

Mike

-- 
  Mike J. Brown   |  http://skew.org/~mike/resume/
  Denver, CO, USA |  http://skew.org/xml/

Received on Thursday, 13 February 2003 13:39:25 UTC