- From: Murray Altheim <murray@spyglass.com>
- Date: Fri, 28 Jun 1996 18:50:57 -0500
- To: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
- Cc: www-html@w3.org
Paul Prescod <papresco@calum.csclub.uwaterloo.ca> writes: >At 08:01 PM 6/26/96 -0500, Murray Altheim wrote: >>First, note that not all attributes are declared CDATA. I'm not sure what >>you mean by "sometimes" entities. Most HTML attributes can contain >>entities; the question is whether or not they will be processed (ie., >>replaced). > >I'm getting confused by this discussion, so let me see if I can clarify. >_ALL_ SGML/HTML attributes may have entity references in them. _ALL_ >SGML/HTML attributes allow entity expansion/processing/replacement (choose >your favourite term). Paul, Sorry to cause any confusion -- you're "mostly" correct. Goldfarb makes a point about this being confusing, and I likewise get confused when the discussion is not precise. The reason I didn't state "always" is that there are several instances of attributes declared as NAME, NAMES or ID in various HTML DTDs, and in those cases ampersand and semicolon characters are disallowed. Because there is no "reasonable" instance of general or character entities resolving to valid NAME, I made the statement. I'll try to explain what I mean by this below. In the process of parsing an "attribute value literal" (the text you typed between quote marks), the parser derives an "attribute value". Any general or character entities in the _attribute value literal_ are resolved (ie., "expanded/processed/replaced") at this point. Attribute value literals _can_ contain general or character entities. BUT, if in parsing the attribute value literal, the derived attribute value doesn't fit the declared value of the attribute, the markup is invalid. As an example, note that the "NAME" attribute in META is declared as NAME,: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN" [ <!ENTITY foo "KEYWORDS"> ]> <HTML> <HEAD> <META NAME="&foo;" CONTENT="mexico,canada,usa"> ... In parsing the attribute value literal "&foo;", the derived attribute value is "KEYWORDS" (not including the quote marks). This is so far valid HTML markup. If you were so inclined, you could even declare &foo; as "KEYWORDS", since E is replaced by "E". But had the general entity &foo; been declared as "KÉYWORDS" (where É is E with acute accent), the derived attribute value would not be a valid NAME, since the E+acute character is not an allowed NAME character. Likewise, declaring &foo; as "KEYWORDS_FRENCH" would be invalid, since the underscore is not a valid NAME character. In essence, the result of replacing entities in attributes declared as NAME must result in a valid NAME. Since there's no good reason to use numeric entity references for valid NAME characters, I assume that the author would be using a numeric or ISO character reference (such as "É" or "É"), which would result in an invalid NAME. Since general entity replacement doesn't occur in mainstream browsers, my example above doesn't work either. So in no "reasonable" instance can entities occur in HTML attributes declared as NAME, NAMES, or ID [1]. Technically (in true SGML-conformant HTML) they can, if their replacement results in a valid NAME. But I don't see this occurring in mainstream HTML. Hence my statement that attributes declared as NAME, NAMES or ID can't contain entities. [...] >So, as I understand it, entity markup is _always_ allowed in attributes. I'm not clear on the term "entity markup", but I'm assuming you mean the presence of general or character entities such as &foo; or É. Given the discussion above, yes, entities are always allowed within attribute value literals, but their replacement must result in an attribute value that conforms to the attribute declared value in the DTD or DTD subset. >Less than and greater than symbols are _never_ interpreted as markup within >attributes (just as they are not in "replacable character data) so it is >impossible to put elements in attributes although it is possible (in fact >quite easy) to put less than and greater than characters in attributes. In this case, technically, a general entity might resolve to a literal containing markup. If the attribute was declared as CDATA, the markup wouldn't be interpreted; if RCDATA, the markup would be interpreted. But given that general entities are declared in a DTD subset, an SGML feature that isn't supported in mainstream HTML, and that there are no declared RCDATA attributes in any HTML DTD I'm aware of, your statement is pretty safe for current HTML practice, but I wouldn't go so far as to say NEVER. I have quite a number of SGML/HTML documents that do this type of thing. Murray [1] Some examples in HTML-i18n would be HTTP-EQUIV and NAME in META, %linktype;, the ID and CLASS attributes. [p.s. One mistake I made in the last message: technically, PCDATA is not a attribute declared value, but a reserved name. The #PCDATA keyword is used to indicated content occurring in a context in which text is parsed and markup is recognized.] ``````````````````````````````````````````````````````````````````````````````` Murray Altheim, Program Manager Spyglass, Inc., Cambridge, Massachusetts email: <mailto:murray@spyglass.com> http: <http://www.stonehand.com/murray/murray.html>
Received on Saturday, 29 June 1996 01:37:27 UTC