- From: Tim Bray <tbray@textuality.com>
- Date: Tue, 22 Oct 1996 14:51:47 -0700
- To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
At 03:05 PM 10/22/96 -0500, David G. Durand" (David G. Durand wrote: >At 11:49 10/22/96, Michael Sperberg-McQueen wrote: >>The biggest drawback I see, however, is that defining XML entities as >>beginning with a MIME header means that no existing SGML parser can >>be used as is on XML documents. >> >>That, for me, is a show-stopper. > >We need to remember that most of the individuals in the world are not using >SGMl software Clearly, Michael and David are not going to convince each other on this. More generally, it seems unlikely that David/Gavin and the SGML ERB are going to convince each other on this. To reiterate, the ERB feels that: o XML parsers should make an aggressive effort to use the right encoding to process text entities. o To do so, they should of course use mime headers, resource forks, docman metadata, smoke signals, whatever they have o It is valuable to include a way for a document, in its own syntax and in its own encoding, to signal what that encoding is; as a reminder to the author, as self-defence against incompetent webmasters and overaggressive conversion services. o we should not gratuitously put things in XML files that will make them unreadable by SGML parsers [the smokescreen about "it's the entity not the file" is just that] As for the argument as to whether picking apart the <?XML at the front of the file can be proven mathematically correct, of course not; nor will it help in the case where the processor has never heard of the encoding being used. But it will work a lot of the time for a lot of standard encodings and enable otherwise-unreadable data to be read. This is a good thing. There is, however, one advantage to using a set of mime headers; if the processor can't read the encoding the entity is in, at least he can report, e.g. "couldn't process this because I don't know Shift-JIS". But I don't think this makes up for the irritation of having to insert a header that's in a different syntax and encoding from the rest of the file. Obviously, external information should be in the correct format for the external delivery vehicle. Internal information should be in the syntax and encoding of the document. Cheers, Tim Bray tbray@textuality.com http://www.textuality.com/ +1-604-488-1167
Received on Tuesday, 22 October 1996 17:53:14 UTC