- From: Maciej Stachowiak <mjs@apple.com>
- Date: Sat, 31 Oct 2009 20:02:39 -0700
- To: Shelley Powers <shelley.just@gmail.com>
- Cc: Boris Zbarsky <bzbarsky@mit.edu>, Alexey Proskuryakov <ap@webkit.org>, HTML WG <public-html@w3.org>
On Oct 31, 2009, at 7:37 PM, Shelley Powers wrote: > On Sat, Oct 31, 2009 at 8:33 PM, Maciej Stachowiak <mjs@apple.com> > wrote: >> >> On Oct 31, 2009, at 6:22 PM, Shelley Powers wrote: >> >>> >>> >>> Yes, how the browsers work when it comes to DTDs and named entities >>> has come up in the past [1][2]. >>> >>> Case in point, Firefox, Safari, and Chrome don't allow named >>> entities >>> in XHTML+RDFa documents, even though the XHTML+RDFa DTD does >>> reference >>> the named entities. >>> >>> Oops >>> >>> But, still, we manage. We use numeric entities. >> >> I think it's fine to omit named entities from newly minted DTDs. In >> fact, >> probably a good idea since it's the strict XML behavior and nothing >> stops >> you from using an NCR or just a literal unicode character in new >> content. >> >> But browsers need to handle named entities when some specific XHTML >> DTDs are >> present, since there is a body of legacy content that depends on >> having the >> XHTML set of entities. Handling content with the XHTML+RDFa DTD >> does not >> have this constraint. >> > > I can understand, and not. XHTML from the very beginning had rules > having to do with named entities, and this has always been a > constraint. The problem is that content didn't do a good job of sticking to the narrow path of these rules. I suspect this problem comes from a few unusual conditions: (1) XHTML 1.x validators were validating XML processors, and thus respected the entities and did not flag them as errors; (2) chameleon content served as HTML to some UAs but XHTML to others would work fine in HTML mode with entities. I believe this contributed to pressure for browsers to support the standard XHTML named entities in XHTML in some form. On the other hand, as I said, it's not practical for a browser to be a validating XHTML processor. I think it's a problem with the XHTML specs that they made named entity processing so unpredictable. The wisest thing for new content to do is to never use named entities other than the five predefined by XML. In the meantime, we have some old content already using named entities in XHTML, and it works today in Gecko-based and WebKit-based browsers (and thus, in most browsers that support XHTML at all). (I'm not sure what Opera does offhand.) > > Regardless, there is no legacy content for HTML5. HTML5 recommends using no DTD at all for XHTML5 content, or the short HTML5 <!doctype html> doctype. I agree that special entity processing is not necessary (or arguably even desirable) in those cases. However, when an HTML5 UA is faced with content using an XHTML1 DTD (and probably a short whitelist of other DTDs), it should do the special entity handling. This should be defined by a specification. I think that spec could be HTML5, since it strives to define compatible processing for older versions of HTML and XHTML, such that you can implement HTML5 in an existing browser engine without introducing additional mode switches. >> Note: we'd rather not have this behavior in WebKit but we added it >> due to >> compatibility bugs being filed. I expect any XHTML-capable browser >> would >> eventually be pressured to add similar behavior. Non-browser tools >> that >> process XHTML from the Web may also benefit from doing the same >> thing. >> > > But those that don't will respect named entities in RDFa in XHTML, > while browsers don't. You start bending rules, and you add, rather > than remove inconsistencies. As things currently stand, only a validating XML processor would respect any named entities in RDFa+XHTML (using the RDFa+XHTML doctype). I think it is not common for any software other than DTD- based validators to use a validating XML processor. Regards, Maciej
Received on Sunday, 1 November 2009 03:03:21 UTC