- From: Uche Ogbuji <uche@ogbuji.net>
- Date: Sun, 9 Sep 2012 16:43:53 -0600
- To: public-microxml@w3.org
- Message-ID: <CAPJCua26Fxrw8_5Hiu=UGUpf=nRyOKheGbXSp2qi4CtBH2vvPg@mail.gmail.com>
On Fri, Sep 7, 2012 at 8:19 PM, John Cowan <cowan@mercury.ccil.org> wrote: > I've added a new issue: 12. Are C1 controls and Unicode non-characters > disallowed? > > In XML 1.0 3e, the following text was added to 2.2, Characters: > > The characters defined in the following ranges are discouraged. They > are either control characters or permanently undefined Unicode > characters: > > [#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDDF], > [#1FFFE-#x1FFFF], [#2FFFE-#x2FFFF], [#3FFFE-#x3FFFF], > [#4FFFE-#x4FFFF], [#5FFFE-#x5FFFF], [#6FFFE-#x6FFFF], > [#7FFFE-#x7FFFF], [#8FFFE-#x8FFFF], [#9FFFE-#x9FFFF], > [#AFFFE-#xAFFFF], [#BFFFE-#xBFFFF], [#CFFFE-#xCFFFF], > [#DFFFE-#xDFFFF], [#EFFFE-#xEFFFF], [#FFFFE-#xFFFFF], > [#10FFFE-#x10FFFF]. > > These codepoints are either not very useful in interchange (the C1 > controls [#x7F-#x84] and [#x86-#x9F], because Unicode doesn't say > what they mean) or are non-characters, code points permanently reserved > from being assigned to characters and meant for internal use only (all > the rest). > > They couldn't be banned from XML 1.0 because of backward compatibility, > but I'd like to consider banning them from MicroXML. > > Comments? > I asked Tony Graham for his thoughts. His response: My first thought is that it's only half a list, since if you're going to > ban [#xFDD0-#xFDDF], then you might as well also ban #xFFFC, OBJECT > REPLACEMENT CHARACTER, since it's meant to be meaningless without the > out-of-stream information about the object it's meant to be replacing, or > ban #xE0000-#xE007F since they're meant for protocols that don't support > markup identification. > > Has anyone gone through UTR #20, "Unicode in XML and other Markup > Languages" (http://www.unicode.org/reports/tr20/) to evaluate its > recommendations w.r.t. want you want from MicroXML? In principle, if you > disallowed all the characters that UTR #20 says browsers should discard, > then everything would be simpler (apart from the MicroXML parsers that > would then have to check that those characters weren't present). > > The C1 controls are difficult, since they aren't well defined. What's > gained, other than purity of approach, if they are banned? > > Personally, I wouldn't like to see [#xFDD0-#xFDDF] banned since I often > use one of those characters in XSLT stylesheets, e.g., when joining > multiple strings together to make a key lookup value, and I'd have to find > a different technique if there was ever a MicroXML-only XSLT processor > that didn't allow those characters. If you searched hard enough, you'd > probably find somebody, somewhere who's using every one of those > characters or the end-of-plane characters for their own internal use, just > like it says on the tin. > > In fact, just last week I was thinking about using characters from > #xE0000-#xE007F to spell 'XSpec' for use as the XSpec-specific namespace > prefix when XSpec munges a users XSpec tests to make the stylesheet that > the framework actually runs (on the grounds that there is unlikely to be a > user's stylesheet that used that particular prefix), so maybe I'd want to > see them retained, too, despite what I said above. > > Hope I haven't muddied the waters too much. > > Regards, > > > Tony Graham tgraham@mentea.net > Consultant http://www.mentea.net > Mentea 13 Kelly's Bay Beach, Skerries, Co. Dublin, Ireland > -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- > XML, XSL-FO and XSLT consulting, training and programming -- Uche Ogbuji http://uche.ogbuji.net Founding Partner, Zepheira http://zepheira.com http://wearekin.org http://www.thenervousbreakdown.com/author/uogbuji/ http://copia.ogbuji.net http://www.linkedin.com/in/ucheogbuji http://twitter.com/uogbuji
Received on Sunday, 9 September 2012 22:44:20 UTC