Re: 12. Are C1 controls and Unicode non-characters disallowed?

Just drawing attention to Tony's "...apart from the MicroXML parsers that
would then have to check that those characters weren't present..."

Is there any way to avoid putting this onus on parsers, etc? I reckon it
puts
a burden too on any code written to take user input data from a web form,
say, and wrap it into MicroXML. It's a headache if a lowly web developer
has to worry about such detail (which they probably won't understand
anyway - such characters are seriously obscure to most ordinary web folk).
Kind of breaks the whole point of having a MicroXML doesn't it?
----
Stephen D Green



On 9 September 2012 23:43, Uche Ogbuji <uche@ogbuji.net> wrote:

> On Fri, Sep 7, 2012 at 8:19 PM, John Cowan <cowan@mercury.ccil.org> wrote:
>
>> I've added a new issue: 12. Are C1 controls and Unicode non-characters
>> disallowed?
>>
>> In XML 1.0 3e, the following text was added to 2.2, Characters:
>>
>>     The characters defined in the following ranges are discouraged. They
>>     are either control characters or permanently undefined Unicode
>>     characters:
>>
>>     [#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDDF],
>>     [#1FFFE-#x1FFFF], [#2FFFE-#x2FFFF], [#3FFFE-#x3FFFF],
>>     [#4FFFE-#x4FFFF], [#5FFFE-#x5FFFF], [#6FFFE-#x6FFFF],
>>     [#7FFFE-#x7FFFF], [#8FFFE-#x8FFFF], [#9FFFE-#x9FFFF],
>>     [#AFFFE-#xAFFFF], [#BFFFE-#xBFFFF], [#CFFFE-#xCFFFF],
>>     [#DFFFE-#xDFFFF], [#EFFFE-#xEFFFF], [#FFFFE-#xFFFFF],
>>     [#10FFFE-#x10FFFF].
>>
>> These codepoints are either not very useful in interchange (the C1
>> controls [#x7F-#x84] and [#x86-#x9F], because Unicode doesn't say
>> what they mean) or are non-characters, code points permanently reserved
>> from being assigned to characters and meant for internal use only (all
>> the rest).
>>
>> They couldn't be banned from XML 1.0 because of backward compatibility,
>> but I'd like to consider banning them from MicroXML.
>>
>> Comments?
>>
>
>
> I asked Tony Graham for his thoughts.  His response:
>
> My first thought is that it's only half a list, since if you're going to
>> ban [#xFDD0-#xFDDF], then you might as well also ban #xFFFC, OBJECT
>> REPLACEMENT CHARACTER, since it's meant to be meaningless without the
>> out-of-stream information about the object it's meant to be replacing, or
>> ban #xE0000-#xE007F since they're meant for protocols that don't support
>> markup identification.
>>
>> Has anyone gone through UTR #20, "Unicode in XML and other Markup
>> Languages" (http://www.unicode.org/reports/tr20/) to evaluate its
>> recommendations w.r.t. want you want from MicroXML?  In principle, if you
>> disallowed all the characters that UTR #20 says browsers should discard,
>> then everything would be simpler (apart from the MicroXML parsers that
>> would then have to check that those characters weren't present).
>>
>> The C1 controls are difficult, since they aren't well defined.  What's
>> gained, other than purity of approach, if they are banned?
>>
>> Personally, I wouldn't like to see [#xFDD0-#xFDDF] banned since I often
>> use one of those characters in XSLT stylesheets, e.g., when joining
>> multiple strings together to make a key lookup value, and I'd have to find
>> a different technique if there was ever a MicroXML-only XSLT processor
>> that didn't allow those characters.  If you searched hard enough, you'd
>> probably find somebody, somewhere who's using every one of those
>> characters or the end-of-plane characters for their own internal use, just
>> like it says on the tin.
>>
>> In fact, just last week I was thinking about using characters from
>> #xE0000-#xE007F to spell 'XSpec' for use as the XSpec-specific namespace
>> prefix when XSpec munges a users XSpec tests to make the stylesheet that
>> the framework actually runs (on the grounds that there is unlikely to be a
>> user's stylesheet that used that particular prefix), so maybe I'd want to
>> see them retained, too, despite what I said above.
>>
>> Hope I haven't muddied the waters too much.
>>
>> Regards,
>>
>>
>> Tony Graham                                   tgraham@mentea.net
>> Consultant                                 http://www.mentea.net
>> Mentea       13 Kelly's Bay Beach, Skerries, Co. Dublin, Ireland
>>  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --
>>     XML, XSL-FO and XSLT consulting, training and programming
>
>
>
> --
> Uche Ogbuji                       http://uche.ogbuji.net
> Founding Partner, Zepheira        http://zepheira.com
> http://wearekin.org
> http://www.thenervousbreakdown.com/author/uogbuji/
> http://copia.ogbuji.net
> http://www.linkedin.com/in/ucheogbuji
> http://twitter.com/uogbuji
>
>

Received on Monday, 10 September 2012 07:44:54 UTC