- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Wed, 04 Dec 2002 21:23:55 +0100
- To: Nick Kew <nick@webthing.com>
- Cc: public-qa-dev@w3.org
* Nick Kew wrote: >> I am :-) See section 4.2.2 of XML 1.0, >> http://www.w3.org/TR/REC-xml#dt-sysid >> >> [...] >> * Each disallowed character is converted to UTF-8 [IETF RFC 2279] as >> one or more bytes. > >I don't see any disallowed character under #2.1 of rfc2396 in your >testcase. There is a U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS), URIS most not contain non-ASCII characters, see e.g. section 2.4 of RFC 2396. >Your testcase was declared as iso-8859-1, so escaping as UTF-8 is >at best perverse, and breaks commonsense. Well, it's a normative requirement of XML 1.0, and actually using UTF-8 for disallowed characters in URIs *is* commonsense, you'll find a similar section in HTML 4.01 (appendix B.2.1), take a look at http://www.w3.org/International/O-URL-and-ident.html The requirement in XML 1.0 is however only for the system identifier and there are XML processors that implement it the way the recommendation wants it, e.g. http://www.stg.brown.edu/service/xmlvalid/
Received on Wednesday, 4 December 2002 15:23:11 UTC