- From: Rick Jelliffe <ricko@gate.sinica.edu.tw>
- Date: Fri, 21 May 1999 00:08:48 -0400 (EDT)
- To: <www-html@w3.org>
- Cc: "w3c i18n ig" <w3c-i18n-ig@w3.org>
Here are some comments on XHTML. 1) XHTML 1.0 Last Call Disposition of Comments, http://www.w3.org/MarkUp/Group/1999/xhtml1-lc1-doc-19990506.html, 3.5.LanguageCode Parameter Entity leaves the language code as CDATA. I would be very interested if you (or Mischa or Martin, if they know) would post a little note to w3c-i18n-ig@w3.org to explain this. CDATA indicates that any characters may be allowed, rather than following RFC 1766. Suggestion: Declare lang as NMTOKENS. This is more forgiving than NMTOKEN and allows some variant and incorrect use, but will not confuse programmers and educators that any old text is allowed. 2) XHTML 1.0 Last Call Disposition of Comments, http://www.w3.org/MarkUp/Group/1999/xhtml1-lc1-doc-19990506.html, s2.1.6 SGML Newline Handling Requirements, says Resolution: The document relies upon XML for its definition of whitespace handling. This includes handling of line boundaries. No change to the document is required However, XML 1.0 only allows "preserve" or "default", where "default" is undefined, but may be the SGML behaviour. Because of this, the XHTML draft cannot rely on XML for its definition of whitespace handling for "default". Suggestion: XHTML should follow SGML. 3) Netscape 4.6 still does not support hexadecimal numeric character references in HTML. Suggestion: Put a caution about using Hexadecimal Numeric Character References, that it may not be backward compatible with HTML browsers. 4) Deployed HTML browsers do not nicely allow the following parts of XML: * Internal subset, and hence entities, notations, additional attribute declarations; * Hexadecimal character references (if 3 above); * CDATA marked sections; * PIs; * hence, the XML header (which, as the inventor of it, let me say IMHO *is* a PI, since it provides information to software on how to process something, starting at a point: in the same way, all markup declarations are PIs); * hence, selecting encoding with the XML header. By trying to find a subset of XML which is fairly acceptable to HTML browsers, there is a grave danger of setting a course which will disrupt XML. Of course XML was developed to overcome many of the perceived problems of HTML, not to perpetuate them. It worries me a little that the forces to find this subset may be so strong that HTML-browser compatability becomes a criterion for judging particular XML features. This has already happened to some extent with PIs (notably the use of attributes for Namespace declarations rather than a PI in the header: in that case I think it was a fair call, in that namespace hang off names which hang of attributes, they do not hang off documents, entities or random points, which is where PIs are appropriate). Recommendation: The XHTML effort should split into three parts: * XHTML, a version of HTML 4.0 which allows all XML features and any new W3C technology. Application vendors should be encouraged to support this. It should have the MIME media-type text/xhtml-xml (the "-xml" prefix is one of the current suggestions for the MIME XML group which is finding favour). XHTML should have one extra requirement to XML: the XML header should be mandatory. * WFHTML, a interim version of XHTML which is compatible with generation 4 and 5 browsers. Users should be encouraged to use this HTML syntax. It should have the MIME media type text/html. WFHTML differs from XML in the following ways: i) it only uses elements, data, comments, NCRs, the DOCTYPE declaration and the encoding PI; ii) WF errors do not halt parsing; iii) the XML header is not mandatory; iii) encoding should be determined by the MIME charset; if that is not available the encoding attribute in the XML header may be used; if that is not available, the META tag may be used; if that is not available, guessing may be used. * An XHTML-to-WFHTML transformation recommendation. Webservers should support content-negotiation of XHTML or (WF)HTML. If a document is available as XHTML but not as (WF)HTML, then some on-the-fly, server-side transformation may be provided: a simple application or an XSL stylesheet for example. In other words, transformation from XHTML to WFHTML should be transparent to users and to creators of XHTML data. In particular, a transformation that PIs should be placed in comments: <?xml version="1.0"?> should be <!--<?xml version="1.0"?>-->. This would also discourage the deployment of processors which only accept XML subsets: a disasterous development. It seems to me that, even though this *seems* complicated, it is the only way to reconcile all the different requirements. Furthermore, I think it is technologically sound and practical from a deployment point of view: at most it requires registration of an XHTML handler to webservers not to browsers (i.e., unless XHTML documents are not pre-transformed into WFHTML and content negotiation is used for delivery). This allows people to move to "real" XML, rather than a watered-down version. Rick Jelliffe Academia Sinica Computing Centre Taipei, Taiwan
Received on Friday, 21 May 1999 05:13:34 UTC