- From: Harald Tveit Alvestrand <Harald.Alvestrand@maxware.no>
- Date: Wed, 29 Jul 1998 09:02:13 +0200
- To: "Martin J. Duerst" <duerst@w3.org>
- Cc: unicore@unicode.org, Multiple Recipients of Unicore <unicore@unicode.org>, kenw@sybase.com, ietf-charsets@iana.org
At 12:48 29.07.98 +0900, Martin J. Duerst wrote: >At 13:42 98/07/27 +0200, Harald Tveit Alvestrand wrote: > >> The BOM is part of the charset that UTF-16 represents. >> Any application can say anything it wants to *further restricting* >> what characters can apply where; the part we couldn't tolerate >> was if XML insisted upon strings that were *illegal* in the registered >> UTF-16, yet calling the charset "UTF-16". > > >Harald, could you be more precise? > >Of course, if XML says e.g. that a character sequence such as >"<<<<>>>>" is not legal XML, that's its own business. > >But e.g. for the use of the "charset" parameter in transcoding >proxies/gateways for HTTP and email, I'm very affraid that if >one application (e.g. text/abc) requires the BOM to be present, >and another (e.g. text/xyz) requires it to be absent, this will >lead to very undesirable complications. What I was saying is that if XML states that all valid XML documents must start with the BOM, that's no more problematic than if HTML states that all valid HTML documents must start with <!DOCTYPE; this is part of the application, not part of the charset. I'm not saying it's a good idea; I strongly suspect that it's not. But it does not need to have the consent of the charset registration. Harald -- Harald Tveit Alvestrand, Maxware, Norway Harald.Alvestrand@maxware.no --Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Wednesday, 29 July 1998 01:11:15 UTC