Re: Is markup allowed in attribute values?

Murray Altheim (murray@spyglass.com)
Wed, 26 Jun 1996 20:01:04 -0500


Message-Id: <v02110102adf767a53f47@[140.186.34.50]>
Date: Wed, 26 Jun 1996 20:01:04 -0500
To: Arnoud "Galactus" Engelfriet <galactus@stack.urc.tue.nl>
From: murray@spyglass.com (Murray Altheim)
Subject: Re: Is markup allowed in attribute values?
Cc: amc@cs.wustl.edu (Adam M. Costello), www-html@w3.org

Arnoud "Galactus" Engelfriet <galactus@stack.urc.tue.nl> writes:
>Adam M. Costello <amc@cs.wustl.edu> writes:
>> Is markup allowed in attribute values?  The particular case I'm thinking
>
>No, it's not. An attribute value may only contain character
>data. That is, literal text with -sometimes- entities.

Yes, it is actually.

First, note that not all attributes are declared CDATA. I'm not sure what
you mean by "sometimes" entities. Most HTML attributes can contain
entities; the question is whether or not they will be processed (ie.,
replaced).

>> I've read the HTML 2.0 spec, even tried reading the DTD (I don't
>> actually know SGML), but it doesn't seem to say one way or another.
>
>Actually the spec does, you just have to know SGML to understand it =).
>In the DTD you can see the permitted contents of a tag. For example,
>this is the line for META:
>
><!ELEMENT META - O EMPTY -- Generic Metainformation -->
><!ATTLIST META
>        http-equiv  NAME    #IMPLIED  -- HTTP response header name  --
>        name        NAME    #IMPLIED  -- metainformation name       --
>        content     CDATA   #REQUIRED -- associated information     --
>        >
>
>The "content" is defined as CDATA, which means you can only put
>character data in there, no entities or markup.

Nope. It's not that CDATA attributes can't contain markup characters, it's
that they aren't interpreted.

It's perfectly legitimate to use markup characters wherever the attribute
has been declared as CDATA in the DTD. The example META element validates
just fine.

Here's a few types of attribute declarations you might find in a DTD:

    NAME    The attribute contains a valid SGML NAME, which in HTML
            consists of a valid name start character (a-z,A-Z) followed
            by up to 71 alpha, numeric, hyphen and/or period characters.
            No spaces allowed. The length is set by NAMELEN in the SGML
            declaration.

    NAMES   A space-delimited list of NAME tokens. The CLASS attribute
            in i18n and the expired HTML 3.0 draft are declared as NAMES.

    ID      A unique NAME. There are no ID attributes in the HTML 2.0
            DTD, but the ID attribute in i18n is declared ID.

    CDATA   Character Data that allows all valid SGML characters,
            which should not be interpreted by the parser.

    RCDATA  Similar to CDATA except that general and character entity
            replacements should occur.

    PCDATA  Parsed Character Data, allowing all valid SGML characters.
            Within PCDATA, all markup (including start and end tags,
            character and entity references, comments) is recognized
            and processed accordingly.

There are more formal definitions, but that should give you a good idea.
Note that there might be some confusion here in HTML: most of the places
NAME occurs as an attribute name in the HTML DTD, it is actually declared
as CDATA, not NAME. And sometimes the declaration changes between different
HTML DTDs.

And regardless of "legality", there may be browsers that have difficulty
with attributes containing markup. Because the META example is a
non-displayed element in HEAD, there probably won't be too many
consequences, but using markup-laden attributes within elements in BODY
content may cause more obvious display problems.

To give you an idea of how well a browser understands attribute literals,
add Adam's META element to the HEAD of a document, then open and view
source in Netscape. Even though the element is valid markup (check a
validation tool to be sure), the source flashes the META element as if
something were wrong. This may give you a little indication on how much or
little is being understood by the parser, and how much is being guessed. No
slur on Netscape -- most non-SGML browsers do the same thing or worse.

Murray

```````````````````````````````````````````````````````````````````````````````
     Murray Altheim, Program Manager
     Spyglass, Inc., Cambridge, Massachusetts
     email: <mailto:murray@spyglass.com>
     http:  <http://www.stonehand.com/murray/murray.html>