Re: On ampersands.

"Shane P. McCarron" wrote:
> 
> Gerald Oskoboiny wrote:
> > There are some cases where the ampersands don't need to be
> > escaped, like: <p>foo & bar</p>, or <a href="foo&_bar">
> >
> 
> I don't think I agree.  In SGML, an ampersand always introduces an
> entity reference.  If you want to actually use an ampersand, you are
> required to use &amp;. I don't see any way around this requirement.

Okay...  The XML specification is pretty clear on this, and is available
on-line at http://www.w3.org/TR/REC-xml

it says:

The ampersand character (&) and the left angle bracket (<) may appear in
their literal form only when used as markup delimiters, or within a
comment, a processing instruction, or a CDATA section. They are also
legal within the literal entity value of an internal entity declaration;
see "4.3.2 Well-Formed Parsed Entities". If they are needed elsewhere,
they must be escaped using either numeric character references or the
strings "&amp;" and "&lt;"
respectively. The right angle bracket (>) may be represented using the
string "&gt;", and must, for compatibility, be escaped using "&gt;" or a
character reference when it appears in the string "]]>" in content, when
that string is not marking the end of a CDATA section. 

From this I conclude that any use of an ampersand in the PCDATA sections
of a document, or in other words in the text of a document, must be to
introduce a general entity reference.  This is true in all instances
where most people might use it. The exception would be a CDATA section
(<[CDATA[ stuff ]]>).  You might use a CDATA section to delimit
javascript code in a document so that it is not processed by the XML
processor, for example.
--
Shane P. McCarron                  phone: +1 763 786-8160
ApTest                               fax: +1 763 786-8180
                                  mobile: +1 612 799-6942
                                  e-mail: shane@aptest.com

Received on Thursday, 6 July 2000 11:38:09 UTC