- From: David G. Durand <dgd@cs.bu.edu>
- Date: Mon, 21 Oct 1996 18:33:48 -0400
- To: w3c-sgml-wg@w3.org
At 12:05 PM 10/21/96, Gavin Nicol wrote: >>OK, I see now. You are suggesting that we put a MIME header in the >>document in all cases. I think this is an excellent suggestion. > >.... this is *precisely* what my *.mim file format (suggested to >HTML-WG and also out in an expired RFC) *is*. Well, my suggestion is that we put a MIME header when we can't transmit the MIME header information over the channel. We don't want to have to send 2 headers to be XML conformant when going over HTTP. >>Note that many existing web servers (including Apache) cope with >>files containing MIME headers, and may even emit those headers in >>response to an HTPP HEAD request. Apache is said (independently) to >>represent over 30% of all running web servers. > >Right, but the *.mim file format is different to Apache (or at least >the last version I looked at) in that Apache sends the file *verbatim* >and does not necessarily add missing headers... which means that the >author must understand the entire set of required headers. The >proposal I put forth only requires headers that will be overriding >those generated by the server. This would be essential for XML, as we don't want to force applications to maintain HTTP specific information like Content-length, et. al. >As I noted before on this list, and also in HTML-WG, most software >that will be dealing with the WWW will *already* have MIME header >parsers built into them.... probably as a message stream module, so >you can *reuse* that code for the local and distributed case. > >Again, I seem to be talking to myself. Well, perhaps to only a few people. >The headers are in US-ASCII, which is a nuisance of your file is UCS-2 >(your editor would need to have MIME parsing capabilities built in), >which is a boundary case, but an important one. This is one reason I >prefer catalog or FSI based solutions. In most practical situations, >this will not be an overly large concern though. I think we are better off defining our own convention for "self-indetifying files", as there is none in common use. If a common, robust, convention for metadata is implemented, then systems that implement it are entitled to the same slack (omission of redundant header) that we should afford HTTP. Given the facts of life with multibyte encoding, and the desire that files be maximally self-revealing, we should probably use the character-length determination hack I suggested, ratehr than put 8-bit characters at the front of multibyte files. >>At a minimum, you would need >> Mime-version: 1.0 >> Content-type: text/x-xml;version=1.0;charset=utf-8? > >In the *.mim file format, the minimum you would need would be CRLF, >and for non-ISO-8859-1 documents > > Content-type: text/x-xml;charset=shift-jis > >>Instead of requiring the full MIME CR-LF at the end of each line (which >>is a pain to mantain on some platforms, e.g. Mac and Unix), I would >>suggest documenting a format in which >... > >I would just reference the HTTP specs (though HTTP 1.1 is becoming >more restrictive), though I could easily be convinced that strict MIME >compatability be preserved. This is a minor issue. Implementations will implement the "all three conventions" version for a long time, as it's so easy, and implementations are so bad about linenends generally. > >The PI hack is a HACK. It is a header hiding under syntax that will >confuse everyone, or at least cause people to assume that you could do >something clever like: > ><?XML-CHARSET SJIS> >.... ><?XML-CHARSET BIG5> >.... ><?XML-CHARSET UTF8> > >and we all know *that* is totally bogus. Because you can't parse the character set specification, without knowing what character set to parse in... This is the most infamous of the SGML declaration's problems with automatic processing: why revisit it on XML users? -- David RE delenda est. I am not a number. I am an undefined character. _________________________________________ David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com Boston University Computer Science \ Sr. Analyst http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams --------------------------------------------\ http://dynamicDiagrams.com/ MAPA: mapping for the WWW \__________________________ http://www.dynamicdiagrams.com/services_map_main.html
Received on Monday, 21 October 1996 18:34:01 UTC