Re: Validator should not be interpreting <SCRIPT> tag contents at all from Terje Bless on 2001-02-08 (www-validator@w3.org from February 2001)

From: Terje Bless <link@tss.no>
Date: Thu, 8 Feb 2001 10:50:30 +0100
To: Bryce Nesbitt <bryce@obviously.com>
cc: www-validator@w3.org
Message-ID: <20010208105237-r01010600-d89e5f9b@10.0.0.2>

On 06.02.01 at 10:41, Bryce Nesbitt <bryce@obviously.com> wrote:

>I've read the FAQ.  I understand the problem.  But still, why does this
>validator interpret the contents of a <SCRIPT></SCRIPT> container?

Because it's part of the document. The contents of SCRIPT is the SGML token
CDATA and is defined, largely, as follows:

# CDATA is a sequence of characters from the document character set and
# may include character entities. User agents should interpret attribute
# values as follows:
# 
# * Replace character entities with characters,
# * Ignore line feeds,
# * Replace each carriage return or tab with a single space.
# * User agents may ignore leading and trailing white space in CDATA
#   attribute values (e.g., "   myval   " may be interpreted as "myval").
# 
# Authors should not declare attribute values with leading or trailing
# white space. For some HTML 4 attributes with CDATA attribute values,
# the specification imposes further constraints on the set of legal
# values for the attribute that may not be expressed by the DTD.
# 
# Although the STYLE and SCRIPT elements use CDATA for their data model,
# for these elements, CDATA must be handled differently by user agents.
# Markup and entities must be treated as raw text and passed to the
# application as is. The first occurrence of the character sequence "</"
# (end-tag open delimiter) is treated as terminating the end of the
# element¹s content. In valid documents, this would be the end tag for
# the element.

>The interpreter can't know what type of script is inside the tag, and
>can't correctly interpret it. The validatior is not aware of the syntax
>rules that apply inside <SCRIPT>.  Yet, the validator tries, resulting in
>all sorts of spurious errors.

Nope. There are two levels of "interpretation" going on here. First the
content of the SCRIPT goes through HTML "interpretation" rules and then,
afterwards, it gets fed through the relevant script language's
"interpretation" rules.

>I suggest that the validator would be much more useful if it ignored
>everything inside <SCRIPT></SCRIPT>.  Cluttering the JavaScript code with
>\, as in: document.write("<\/P>"); To hide it from the validator is not a
>realistic solution.

What happens here is that the "\" is an escape sequence for the HTML
processor. It sees the backslash and knows the following forward slash
doesn't signify the end of the SCRIPT section. It then continues on until
it reaches the real end of the SCRIPT section. The resulting block of data
is then fed to the parser for the script language in question.

Think of it this way: you start out with a script written in, say,
JavaScript. This has it's own syntax rules. You then embed the script into
a HTML document. This necessitates some encoding to ensure the syntax rules
for HTML are followed. Fortunately, the applicable syntax rules are very
few. About the only one you'll run into trouble with regularly is that the
sequence "</" will terminate the containing block. To avoid this, you'll
need to make sure your original JavaScript does not contain this sequence
by using the JavaScript escape character to hide it.

Received on Thursday, 8 February 2001 04:52:50 UTC