Re: B.1 and B.2 results
>The biggest drawback I see, however, is that defining XML entities as
>beginning with a MIME header means that no existing SGML parser can
>be used as is on XML documents. Every parser will require either a
>prosthetic filter to strip the MIME header off, or a modification to
>make it understand and handle the MIME header as a packaging device.
>That, for me, is a show-stopper.
This depends on whether this will be a generally useful (ie. widely
used) feature in the future.
>I also think it needs to be in a form that users can produce
>using their normal tools, without jumping through hoops; that seems
>to mean it needs to be in the same character set it's declaring.
Coded character set *and* encoding.
>Gavin, and now David, have repeatedly claimed that the PI label
>relies on a vicious circle: you have to know what it says to read
It's true. You have to sniff at the data, and the sniffing may not
always succeed. That's reason #1 for calling it a hack. A more
objectional one is that you will *require* people for add to
their data (a header pretending to be data). This may seem pedantic,
but I find this *semantically* objectionable, or counterintuitive.
>Gavin and David have pointed out, correctly, that it is possible to
>construct a coded character set for which the PI label is not
>unambiguous. This would involve an encoding for which some, but not
>all, of the characters A to Z and a to z would share positions with
>ASCII or EBCDIC or ISO 10646, while the rest would be rearranged so
>as to render it possible to misread an XML character-encoding
>declaration without detecting the misreading. This strikes me as a
>low-probability development, given the importance of ASCII (er,
>I mean ISO 646!), but it is indubitably possible.
You miss one important case: the case where there is no ASCII
compatability area in the lower 127 code points. This will also fail
in that you will be unable to parse it. I forget what exactly they
are, but there *are* such encodings in existance (Rick. do you
remember of JOHAB is one?)
Another case (also of low probability) is having a file that is
encoded in a manner that might confuse the sniffing logic (eg. a
compressed file who's header looks like the signature for UCS-2).
>Losing the entire notion of in-file labels would (a) expose XML
>processors to undetectable errors when external metadata is faulty or
>missing, (b) allow the user of arbitrary character encodings
>(implementor is responsible for getting it right, it's not our
>problem), (c) allow us to end this discussion before it crosses the
>boundary from the laughable to the intolerable.
If you replace "in-file" with "in-data", this is my preferred
method. Meta-data should live *beside* the data, not *inside* it.
Let's a header a header.
A PI by any other name would parse as well...