- From: Murray Altheim <murray@spyglass.com>
- Date: Wed, 26 Jun 1996 20:01:04 -0500
- To: Arnoud "Galactus" Engelfriet <galactus@stack.urc.tue.nl>
- Cc: amc@cs.wustl.edu (Adam M. Costello), www-html@w3.org
Arnoud "Galactus" Engelfriet <galactus@stack.urc.tue.nl> writes: >Adam M. Costello <amc@cs.wustl.edu> writes: >> Is markup allowed in attribute values? The particular case I'm thinking > >No, it's not. An attribute value may only contain character >data. That is, literal text with -sometimes- entities. Yes, it is actually. First, note that not all attributes are declared CDATA. I'm not sure what you mean by "sometimes" entities. Most HTML attributes can contain entities; the question is whether or not they will be processed (ie., replaced). >> I've read the HTML 2.0 spec, even tried reading the DTD (I don't >> actually know SGML), but it doesn't seem to say one way or another. > >Actually the spec does, you just have to know SGML to understand it =). >In the DTD you can see the permitted contents of a tag. For example, >this is the line for META: > ><!ELEMENT META - O EMPTY -- Generic Metainformation --> ><!ATTLIST META > http-equiv NAME #IMPLIED -- HTTP response header name -- > name NAME #IMPLIED -- metainformation name -- > content CDATA #REQUIRED -- associated information -- > > > >The "content" is defined as CDATA, which means you can only put >character data in there, no entities or markup. Nope. It's not that CDATA attributes can't contain markup characters, it's that they aren't interpreted. It's perfectly legitimate to use markup characters wherever the attribute has been declared as CDATA in the DTD. The example META element validates just fine. Here's a few types of attribute declarations you might find in a DTD: NAME The attribute contains a valid SGML NAME, which in HTML consists of a valid name start character (a-z,A-Z) followed by up to 71 alpha, numeric, hyphen and/or period characters. No spaces allowed. The length is set by NAMELEN in the SGML declaration. NAMES A space-delimited list of NAME tokens. The CLASS attribute in i18n and the expired HTML 3.0 draft are declared as NAMES. ID A unique NAME. There are no ID attributes in the HTML 2.0 DTD, but the ID attribute in i18n is declared ID. CDATA Character Data that allows all valid SGML characters, which should not be interpreted by the parser. RCDATA Similar to CDATA except that general and character entity replacements should occur. PCDATA Parsed Character Data, allowing all valid SGML characters. Within PCDATA, all markup (including start and end tags, character and entity references, comments) is recognized and processed accordingly. There are more formal definitions, but that should give you a good idea. Note that there might be some confusion here in HTML: most of the places NAME occurs as an attribute name in the HTML DTD, it is actually declared as CDATA, not NAME. And sometimes the declaration changes between different HTML DTDs. And regardless of "legality", there may be browsers that have difficulty with attributes containing markup. Because the META example is a non-displayed element in HEAD, there probably won't be too many consequences, but using markup-laden attributes within elements in BODY content may cause more obvious display problems. To give you an idea of how well a browser understands attribute literals, add Adam's META element to the HEAD of a document, then open and view source in Netscape. Even though the element is valid markup (check a validation tool to be sure), the source flashes the META element as if something were wrong. This may give you a little indication on how much or little is being understood by the parser, and how much is being guessed. No slur on Netscape -- most non-SGML browsers do the same thing or worse. Murray ``````````````````````````````````````````````````````````````````````````````` Murray Altheim, Program Manager Spyglass, Inc., Cambridge, Massachusetts email: <mailto:murray@spyglass.com> http: <http://www.stonehand.com/murray/murray.html>
Received on Wednesday, 26 June 1996 20:14:06 UTC