- From: David G. Durand <dgd@cs.bu.edu>
- Date: Fri, 13 Jun 1997 10:59:17 -0500
- To: w3c-sgml-wg@w3.org
I see that the character set issue is raising its hoary, moss-festooned head again. While I'm not a character set expert, and I don't even play one on TV, I do have a suggestion. It's old, but then the topic is too. We should stick with the decision that the document coded character set (i.e. the character repertoire, numberical character codes, and preferred binary transmission formats) should stay Unicode. We should allow wider character repertoires to be dealt with by processors, as no finite character set ever seems to be enough. In fact we should allow as manya characters as a user can stomach declaring, and ones that are as weird as they want. In fact, we should bring back SDATA, _strictly defined_ as "name for character not represented in UNICODE, and thus not possible to be directly encoded by a literal character of XML syntax. Since just putting strings in SDATA leaves a bit too much freedom, we will make the following constriants. Unregistered characters (like unregistered FPIs, _not_ guaranteed unique) are repesented by SDATA values containing any string of Unicode characters not starting with a left bracket "[" character. ISO 10646 characters are represented by SDATA values containing the official name of the character enclosed in a "[[" "]]" pair or delimiters. SDATA strings delimited by "[" and "]" that are not ISO 10646 character codes (i.e. single brackets instead of a pair) are _reserved syntax_. It is an error to have such an SDATA string in an XML 1.0 document, but they may later be used for encoding registered glyphs when appropriate standards and software exist. This means that Private-Use characters are allowed, but should probably be discouraged, just as other compatibility characters are discouraged in Unicode applications. This should be easy to implement: Applications that don't care to handle non-unicode characters can simply treat SDATA entities as strings. It would probably be nicer to tack on additional text to produce something like "[Undefined character: Humlan Vowel Squiggle right]" I earlier promised not to spontaneously attempt to re-animate the SDATA question, but it is now so apropos of the ongoing discussion that I do so without guilt. -- David _________________________________________ David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com Boston University Computer Science \ Sr. Analyst http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams --------------------------------------------\ http://dynamicDiagrams.com/ MAPA: mapping for the WWW \__________________________
Received on Friday, 13 June 1997 11:02:56 UTC