W3C home > Mailing lists > Public > www-amaya@w3.org > April to June 2001

RE: Parsing of "<pre><samp> black && white </samp></pre>"

From: Dave J Woolley <david.woolley@bts.co.uk>
Date: Thu, 26 Apr 2001 11:45:56 +0100
Message-ID: <81E4A2BC03CED111845100104B62AFB50102A695@STAGECOACH>
To: www-amaya@w3.org
> From:	Vladimir G Ivanovic [SMTP:ivanovic@parc.xerox.com]
> The definition of SAMP is "Designates sample output from program,
> scripts, etc." (http://www.w3.org/TR/html401/struct/text.html). Program
> code frequently contains `&' characters.
The only definition of SAMP that matters, from the HTML 4
specfication is:

<!ENTITY % phrase "EM | STRONG | DFN | CODE |
<!ELEMENT (%fontstyle;|%phrase;) - - (%inline;)*>
	[DJW:]  Note that the content model allows any inline
	markup (anchors, emphasis, etc.), not just entities.

> Both of these definitions imply to me that unescaped, literal `&'
> characters are NOT to be interpreted, but rather output without change,
> i.e. literally.
	Only the formal DTD fragment matters.

	There used to be an XAMP element, but this was deprecated
	in HTML 2.0 as it is impossible to describe in SGML as only
	an exact match on the end tag is allowed wherease CDATA allows
	a match on </

> BTW, Netscape 4.76 (Linux), Opera 5.0b7 and Mozilla 0.8b1 all render my
> test case as 
	[DJW:]  And this is one of the reasons why invalid HTML
	is so common.

>     <script type="text/html">
>     <![CDATA[
>     black && white
>     ]]>
	[DJW:]  HTML imposes additional constraints over SGML; one
	of which is that marked sections are not permitted - or at least
	you cannot expect browsers to recognize them.

> I'm not so sure. I don't know if the browser folks decided not to
> implement the standard, didn't implement it correctly, or if your
> interpretation of the HTML standard is wrong. My inclination is to think
> it's bug in Amaya and HTML Tidy.
	[DJW:]  Amaya is correct.  It wouldn't surprise me if nsgmls
	(the core of the validator) is also right, but someone would have
	to buy me a copy of the ISO standard for SGML to allow me to
	look up whether && is an allowed escape for & in parsed text. 
	It is not an allowed escape in HTML, but the validator only checks
	against the DTD, not any additional constraints.

--------------------------- DISCLAIMER ---------------------------------
Any views expressed in this message are those of the individual sender,
except where the sender specifically states them to be the views of BTS.
Received on Thursday, 26 April 2001 06:46:00 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:30:33 UTC