- From: Rick Jelliffe <ricko@gate.sinica.edu.tw>
- Date: Tue, 20 Feb 2001 19:24:19 +0800 (CST)
- To: html-tidy <html-tidy@w3.org>
On Tue, 20 Feb 2001, Martyn J Shaw wrote: > I've seen this argument before somewhere. Is the full answer that > the HTML 4.01 spec says that href attribute takes a URI as an value and > a URI is of type CDATA and that a user agent should replace > character entities in CDATA sections? It is useful to be able to put character references in attributes values, because it helps multilingual and non-English HTML. The SGML (RCS) and XML rules are that any attribute value which should have the characters "&" followed by a legitimate SGML (RCS) namestart character (a-zA-Z.-) must be marked up to prevent false delimiter recognition. XML goes further than default (RCS) SGML, in that in XML _all_ occurrences of "&" must be entered by character reference (& or the numeric character reference.) HTML is supposed to follow SGML rules: normally there is no problem because attributes values are often keywords or numbers which don't use "&". However, HTML parsers are likely to do anything, which is why XML is so strict in its requirements. From an SGML perspective, it is probably best to say that HTML processors have a particular error-recovery strategy for handling spurious references: they transmit the text of the references as is. (I think HTML is as much an error-recovery strategy as it is a particular language :-) However, this approach is not suitable for XML (and so XHTML) because in XML parsing is not element-dependent but is separate from the DTD. So it is not really anything to do with the attribute being CDATA or not. Actually, the terminology of CDATA is confusing for people: I have heard Charles Goldfarb (inventor of SGML and generalized markup) say it would be better to rename CDATA sections and CDATA elements "CLEARTEXT" rather than CDATA, because they don't recognise any markup (other than the close delimiters as appropriate) while CDATA attributes must have attribute references recognised (as all attributes must.) (In SGML but not XML we have a declared content type for elements RCDATA which allows references but not comments, PIs, start-tags. CDATA attributes act like RCDATA elements...) Hope this is some use. Cheers Rick Jelliffe
Received on Tuesday, 20 February 2001 06:24:26 UTC