- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Sat, 24 May 2003 22:38:43 +0300
- To: www-html@w3.org
On Monday, May 19, 2003, at 19:03 Europe/Helsinki, William F Hammond wrote: > > Ian Hickson <ian@hixie.ch> writes: > >> More to the point, XHTML can't make restrictions on XML parsers beyond >> those of XML. This has to be the case so that arbitrary XML parsers >> can be >> re-used in XHTML environments, otherwise XHTML processors must have >> specialised XML parsers. > > It may be useful for interoperability to require that XHTML be > parsable by arbitrary XML parsers, but it is not inherently impossible > to bring up XHTML as a markup language that, rather than _being_ an > XML application, has a canonically associated XML application. That > way additional requirements can be imposed. It is *possible* to define a markup language called XHTML 2 that isn't an application of XML and only leverages the 'X' for marketing purposes. But does defining XHTML 2 that way have any technical merit? Why would it make sense to give up the ability to use ready-made off-the-shelf XML tools (most importantly XML processors) in exchange of having a larger set of predefined entities--especially considering that the problems the larger set of predefined entities is designed to solve are better solved in a different way? >> In the case of entities, XML says that non-validating parsers need not >> recognise anything outside of the five pre-defined entities. Thus, an >> arbitary non-validating XML parser will probably not recognise the >> XHTML >> entities. By the time the XHTML-specific part of the UA gets >> involved, it >> is likely that the entities are long lost. > > The 5 entities are "amp", "lt", "gt", "quot", and "apos". The list > does NOT include "copy". > > "©" is an interesting case because it is neither used in markup > nor (AFAIK) used natively in association with the language of a > non-ascii locale. The character encoding is part of the concept of "locale" on some platforms which have the design limitation of tightly coupling characters with bytes. However, even those platforms don't inherently make it impossible for applications to write UTF-8 to disk. > It is an exception to the idea that CDATA encoding, > e.g., the processing route from an author's keystroke to UTF-8, should > be locale special. Rather, the processing route from user actions to UTF-8 should depend on the input method in use. I am currently using a keyboard with Apple's Finnish keyboard layout as the logical keyboard layout even though I am writing English. I'm located in Finland and I use an email client that current shows me the UI strings in U.S. English. If I wanted to, I could switch to the Greek keyboard layout and type some Greek letters for use as variable names. Or I could open the Character Palette which allows me to pick characters from the Unicode chart. The language being written is not bound to the input method. The input method is not bound to the UI language. (And yes, Apple's Finnish keyboard layout allows me to type the copyright sign. It takes one modifier key and one ordinary key.) > Named character entities are importantly useful when one needs to > refer to characters outside of those provided natively in one's > locale. Editors that restrict the set of available characters to a subset of Unicode because of the user's locale are ill-suited for editing XML documents. Editors that properly use the Unicode features of Mac OS X and Windows XP don't suffer from that kind of limitation. The main practical problem is that X11 systems aren't quite Unicode-savvy, yet. However, I think it doesn't make sense to work around the limitations of X11 systems at the markup language level. History suggests that it takes *at least* six years for a W3C spec to become implemented widely enough on the client side for authors to use the spec features casually. Six years should be enough for X11 to catch up in terms of Unicode-savviness. > I think it desirable for XHTML user agents to provide some form of > imitation of Mozilla's handling of the 253 character entites defined > in XHTML 1.0. Mozilla's approach isn't forward-compatible with eg. XHTML 1.1.1, because Mozilla's uses a finite list of existing public ids. -- Henri Sivonen hsivonen@iki.fi http://www.iki.fi/hsivonen/
Received on Saturday, 24 May 2003 15:38:53 UTC