RE: The form feed characters and other control codes

> [Original Message]
> From: Jukka K. Korpela <jkorpela@cs.tut.fi>
>
> On Thu, 13 May 2004, Ernest Cline wrote:
>
> > In addition XML 1.1 only allows
> > NEL, the rest of the C1's must be present thru the presence
> > of character references only, so any future XML spec will
> > handle the C1's in a manner you think is appropriate.
>
> <snip>
>
> Anyway, at http://www.w3.org/TR/xml11/#charsets
> I read that
>
> [2]    Char    ::=    [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
> /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.
> */
>
> which means a change: all Ascii controls except the nul character #x0
> are allowed. (I wonder why nul is forbidden. It should be especially
> harmless, ignorable.) And "XML processors MUST accept any character in
> the range specified for Char".
>
> Some characters are listed as "discouraged", but as far as I can see,
> XML 1.1 very much _allows_ the entire C1 Controls range.

Altho the spec does not make it clear at that point. rule [2a]

[2a] RestrictedChar ::=
        [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | [#x86-#x9F]

lists those characters that are allowed only as character references and
not as actual characters in the file, so for example if one wanted to encode
a form feed in the content of some element, you could do it in XML 1.1, but
you would have to use either &#xC; or &#12; but not an actual form feed
character.  The same applies for all of the C1 controls except NEL.

Received on Thursday, 13 May 2004 10:30:27 UTC