W3C home > Mailing lists > Public > www-tag@w3.org > April 2003

Re: internet media types and encoding

From: Norman Walsh <Norman.Walsh@Sun.COM>
Date: Mon, 14 Apr 2003 11:24:01 -0400
To: www-tag@w3.org
Message-ID: <87adetm7am.fsf@nwalsh.com>

Hash: SHA1

/ Tim Bray <tbray@textuality.com> was heard to say:
| Chris Lilley wrote:
|> Unlike Rick I am not making this argument on the basis of the ease of
|> detecting encoding labelling or conversion errors; rather, on the
|> basis of those non-printing characters having no basis being in a
|> marked up document. I mean, start of string? end of guarded area?
| I profoundly agree with Chris here, but I had thought this issue to
| have been long-since decided.  My vision of XML is that element
| content is text, and text is a string of characters, and characters
| have the semantics that Unicode says they have.  Most of the C0 and C1
| control characters have no useful or agreed-upon semantics, and they
| have no place in XML under any circumstances.  Their inclusion
| substantially decreases interoperability.

I think Rick, Chris, Tim, et. al., argue convincingly that the C1
control characters should be excluded from XML 1.1.

But I'd be a little happier, I think, if we could at least acknowledge
that the argument for allowing them isn't completely specious. Imagine
that I have a working production system that has been successfully
(e.g. profitably :-) processing XML documents for years.

I now want to add an Ethiopic element name to one of my documents, so
I change the version to 1.1 and add the name.

And the document becomes not well formed. Ouch.

The argument that the document must have been broken under 1.0 because
the C1 control characters don't have logical meanings isn't going to
be very satisfying to me if I've been processing them "correctly" for

And this is the only incompatibility between 1.0 and 1.1. So I don't
think the argument that we should continue to allow things that are
already allowed so that 1.0 is a proper subset of 1.1 is without

| Do enough of the TAG agree
| that we should take this up officially?  -Tim

And do what? Issue a finding that says C1 control characters should be
excluded from XML 1.1?

                                        Be seeing you,

- -- 
Norman.Walsh@Sun.COM    | One of the great misfortunes of mankind is
XML Standards Architect | that even his good qualities are sometimes
Web Tech. and Standards | useless to him, and that the art of employing
Sun Microsystems, Inc.  | and well directing them is often the latest
                        | fruit of his experience.--Chamfort
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.7 <http://mailcrypt.sourceforge.net/>

Received on Monday, 14 April 2003 11:24:34 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:55:58 UTC