Re: 12. Are C1 controls and Unicode non-characters disallowed?

David Lee scripsit:

> What do prevailing (yes I know that not precise) XML parsers and
> processors do today when they encounter this range of discouraged
> Unicode characters ?

Nothing special.  Upstream applications may or may not handle them.

> Secondly is it a goal that if 1) A microxml parser successfully
> parses a microxml document then a "off the shelf" XML parser should
> also successfully parse the same document (by "parse" here I mean not
> abort or generate fatal errors)

Yes, that's a goal.

> Conversely 2) An "off the shelf" XML parses a MicroXML document then
> all MicroXML parsers should also parse that document without failure.
> I guess this one is self referential.

By definition, a MicroXML parser parses MicroXML documents and fails to
parse, or partially parses, or parses-with-warnings things that are not
MicroXML documents.

> What I am getting at is 'what would the user see that is bad if we
> allowed these discouraged characters"

The purpose of the non-characters is to provide a range of codepoints
that can be represented as part of Unicode strings but can't appear in
the input to a program, and can therefore be used for internal purposes
such as string termination, string segmentation, or the representation
of application-specific magic.

The existence of XSLT, whose programs are XML documents, somewhat blurs
the distinction between internal and external uses, but I doubt XLST
will ever be ported to MicroXML.

-- 
John Cowan        http://ccil.org/~cowan   cowan@ccil.org
Lope de Vega: "It wonders me I can speak at all.  Some caitiff rogue
did rudely yerk me on the knob, wherefrom my wits yet wander."
An Englishman: "Ay, belike a filchman to the nab'll leave you
crank for a spell." --Harry Turtledove, Ruled Britannia

Received on Monday, 10 September 2012 18:43:17 UTC