Re: B.1 and B.2 results

At 03:05 PM 10/22/96 -0500, David G. Durand"  (David G. Durand wrote:
>At 11:49 10/22/96, Michael Sperberg-McQueen wrote:
>>The biggest drawback I see, however, is that defining XML entities as
>>beginning with a MIME header means that no existing SGML parser can
>>be used as is on XML documents.
>>
>>That, for me, is a show-stopper.
>
>We need to remember that most of the individuals in the world are not using
>SGMl software

Clearly, Michael and David are not going to convince each other on this.
More generally, it seems unlikely that David/Gavin and the SGML ERB are
going to convince each other on this.  To reiterate, the ERB feels that:

 o XML parsers should make an aggressive effort to use the right encoding
   to process text entities.  
 o To do so, they should of course use mime headers, resource forks,
   docman metadata, smoke signals, whatever they have
 o It is valuable to include a way for a document, in its own syntax and
   in its own encoding, to signal what that encoding is; as a reminder to
   the author, as self-defence against incompetent webmasters and 
   overaggressive conversion services.
 o we should not gratuitously put things in XML files that will make
   them unreadable by SGML parsers [the smokescreen about "it's the entity
   not the file" is just that]

As for the argument as to whether picking apart the <?XML at the front of
the file can be proven mathematically correct, of course not; nor will it
help in the case where the processor has never heard of the encoding being
used.  But it will work a lot of the time for a lot of standard encodings
and enable otherwise-unreadable data to be read.  This is a good thing.

There is, however, one advantage to using a set of mime headers; if the 
processor can't read the encoding the entity is in, at least he can 
report, e.g. "couldn't process this because I don't know Shift-JIS".

But I don't think this makes up for the irritation of having to insert
a header that's in a different syntax and encoding from the rest of the
file.  Obviously, external information should be in the correct format
for the external delivery vehicle.  Internal information should be in
the syntax and encoding of the document.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-488-1167

Received on Tuesday, 22 October 1996 17:53:14 UTC