- From: Rick Jelliffe <ricko@allette.com.au>
- Date: Fri, 22 Jun 2001 15:39:29 +0800
- To: <www-xml-blueberry-comments@w3.org>
(I posted this also to XML-DEV.) 1. What is XML 1.0's native-language markup for? There is hierarchy of suitability for native-language markup: - free text *must* support native scripts - choices presented to users *must* have it - things which users name *should* have it (e.g. directory and filenames) - things which are made for regional/inhouse/personal use *may* support as much as possible - things which which are named by central authorities and the users does not have control *must* be limited to the national standard characters or official scripts or Latin script or English, depending on circumstances - names for things that are needed by a translingual usership (where these users include alphabet users) *should not* use it - things which are standard keywords *must not* have it (e.g. "ELEMENT" keyword in XML, or "const" in C++) The advent of XML Schemas:Datatypes has changed how we might apply these principles (presuming we accept them). In XML 1.0, the only way of providing enumerations was through a DTD. An enumeration is an XML name. Therefore XML 1.0 names *must* support native-scripts thoroughly. But now we have XML Schemas: Datatypes, and we can use it ourselves to make our own token types. So that removes the only *must* from our list. So I do not believe the proposed Blueberry changes fall into the category of "must" (i.e. if XML is unsuitable for some end-users) but into "should" (i.e. if XML is unsuitable for some programmers) or even "may". Indeed, I believe some of the characters in question *must not* be allowed as name character. The purpose of markup is to allow data to be clear for humans to read. An obscure character, or one which an ordinary programmer (who uses the script involved) will find difficult to read, write, pronounce, comprehend, is positively bad markup. So, I believe there is no current urgency to make the Blueberry changes as far as XML Name characters is concerned. XML Schemas Datatypes allows us to define native-script enumerations, so there is no end-user requirement. Obscure characters are bad markup, so there is no programmer requirement for most of the scripts in question. I would rather the following approach was adopted: An erratum to XML 1.02e should be published saying "It is not a reportable error for a character > U+10000 (e.g. 𐀀) to appear in a name character." This opens the door for a future revision to XML (e.g. a more thoroughgoing one) by reducing cases where new XML documents (with naming rules such as the ones suggested) are rejected by old parsers. Of course, the number of these is likely to be almost 0, so this seems like a case of people creating work for themselves (not that it is a bad thing...it is important for the right message to be given, etc.) The issue of the IBM line-end character is a different issue. Personally, I think it should be magic-ed away by entity management. The "unnessecary translation phases before and after XML parses and generation" are the most straight-forward way out for everyone. If it does not meet IBM's supposed requirement entirely, that is the price of interoperability: XML does not guarantee round-tripping of new line characters (it cannot, because whenever data is send text/*, intermediate proxies can change to local conventions). Cheers Rick Jelliffe
Received on Friday, 22 June 2001 01:43:46 UTC