W3C home > Mailing lists > Public > www-html@w3.org > May 2004

RE: The form feed characters and other control codes

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Thu, 13 May 2004 16:23:20 +0300 (EEST)
To: www-html@w3.org
Message-ID: <Pine.GSO.4.58.0405131608040.28059@korppi.cs.tut.fi>

On Thu, 13 May 2004, Ernest Cline wrote:

> > [Original Message]
> > From: Jukka K. Korpela <jkorpela@cs.tut.fi>
> >
> > Or is there some imaginable use for C1 Controls in XHTML?
>
> Well, XML uses Unicode for its character set and Unicode
> recognizes the effect of U+0085 NEL as a new line character,
> but the other C1's have no effect in Unicode per se, so the net
> effect is relatively harmless, even in the case of a mistaken
> identity of the character set.

I was primarily thinking about the possibility of trapping some common
errors, like U+0080, which is most probably caused by an attempt to
present the euro sign in a wrong way (character encoding confusion).
While U+0080 has no defined meaning in Unicode, it might actually be
treated as the euro sign by some software, ignored by some, and in the
worst (?) case treated as some control function.

> In addition XML 1.1 only allows
> NEL, the rest of the C1's must be present thru the presence
> of character references only, so any future XML spec will
> handle the C1's in a manner you think is appropriate.

I hadn't realized there is XML 1.1, especially since the XML 1.0 spec
still points to itself as the latest XML specification. I must admit I'm
puzzled. The W3C now recommends two XML's, 1.0 and 1.1, saying that XML
1.0 is the latest but both have the W3C Recommendation status?
The XHTML specifications are currently based on XML 1.0 of course,
but will the future XHTML be XML 1.1 based?

Anyway, at http://www.w3.org/TR/xml11/#charsets
I read that

[2]    Char    ::=    [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.
*/

which means a change: all Ascii controls except the nul character #x0
are allowed. (I wonder why nul is forbidden. It should be especially
harmless, ignorable.) And "XML processors MUST accept any character in
the range specified for Char".

Some characters are listed as "discouraged", but as far as I can see, XML
1.1 very much _allows_ the entire C1 Controls range.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Thursday, 13 May 2004 09:23:22 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:16:00 GMT