- From: Nick Kew <nick@webthing.com>
- Date: Wed, 4 Dec 2002 20:04:44 +0000 (GMT)
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- cc: public-qa-dev@w3.org
On Wed, 4 Dec 2002, Bjoern Hoehrmann wrote: > > I am :-) See section 4.2.2 of XML 1.0, > http://www.w3.org/TR/REC-xml#dt-sysid > > [...] > * Each disallowed character is converted to UTF-8 [IETF RFC 2279] as > one or more bytes. I don't see any disallowed character under #2.1 of rfc2396 in your testcase. Applying the rather different rules you referenced is going to lead to deeper bugs than this alleged one. Your testcase was declared as iso-8859-1, so escaping as UTF-8 is at best perverse, and breaks commonsense. This is relevant here as OpenSP groks SGML (and on the web in general where agents grok some form of HTML). If your testcase had declared a 16-bit charset, then AFAICS that rule would lead to more brokenness. I'm thinking as I write: what happens if we apply perverse-XML rules when OpenSP's -wxml is in force? This avoids breaking SGML, but I'm not convinced about implementing it. Terje, how are we applying iconv to incoming documents these days? ISTM that any document that is converted to utf-8 before being processed by OpenSP sidesteps this problem altogether (because iconv does the job). -- Nick Kew
Received on Wednesday, 4 December 2002 15:04:47 UTC