- From: Gavin Nicol <gtn@ebt.com>
- Date: Tue, 22 Oct 1996 17:34:59 -0400
- To: U35395@UICVM.UIC.EDU
- CC: w3c-sgml-wg@w3.org
>The biggest drawback I see, however, is that defining XML entities as >beginning with a MIME header means that no existing SGML parser can >be used as is on XML documents. Every parser will require either a >prosthetic filter to strip the MIME header off, or a modification to >make it understand and handle the MIME header as a packaging device. >Every one. > >That, for me, is a show-stopper. This depends on whether this will be a generally useful (ie. widely used) feature in the future. >I also think it needs to be in a form that users can produce >using their normal tools, without jumping through hoops; that seems >to mean it needs to be in the same character set it's declaring. Coded character set *and* encoding. >Gavin, and now David, have repeatedly claimed that the PI label >relies on a vicious circle: you have to know what it says to read >it. It's true. You have to sniff at the data, and the sniffing may not always succeed. That's reason #1 for calling it a hack. A more objectional one is that you will *require* people for add to their data (a header pretending to be data). This may seem pedantic, but I find this *semantically* objectionable, or counterintuitive. >Gavin and David have pointed out, correctly, that it is possible to >construct a coded character set for which the PI label is not >unambiguous. This would involve an encoding for which some, but not >all, of the characters A to Z and a to z would share positions with >ASCII or EBCDIC or ISO 10646, while the rest would be rearranged so >as to render it possible to misread an XML character-encoding >declaration without detecting the misreading. This strikes me as a >low-probability development, given the importance of ASCII (er, >I mean ISO 646!), but it is indubitably possible. You miss one important case: the case where there is no ASCII compatability area in the lower 127 code points. This will also fail in that you will be unable to parse it. I forget what exactly they are, but there *are* such encodings in existance (Rick. do you remember of JOHAB is one?) Another case (also of low probability) is having a file that is encoded in a manner that might confuse the sniffing logic (eg. a compressed file who's header looks like the signature for UCS-2). >Losing the entire notion of in-file labels would (a) expose XML >processors to undetectable errors when external metadata is faulty or >missing, (b) allow the user of arbitrary character encodings >(implementor is responsible for getting it right, it's not our >problem), (c) allow us to end this discussion before it crosses the >boundary from the laughable to the intolerable. If you replace "in-file" with "in-data", this is my preferred method. Meta-data should live *beside* the data, not *inside* it. Let's a header a header. A PI by any other name would parse as well...
Received on Tuesday, 22 October 1996 17:36:38 UTC