re: Determination of Encoding

>>>Consider a proxy server that performs code conversion without rewritting
>>>the PI.  Consider a WWW browser or robot that does not understand XML.
>>>Such browsers or robots certainly exist now and will not disappear in
>>>the near future.  If they save a transfered XML document in a file,
>>>the header information will disappear and the PI will remain incorrect.
>>>Then, an XML parser is likely to fail.
>>
>>Precisely why I way that we must rely on HTTP header. I'm starting to
>>think that Rick's proposal of requiring servers to remove the PI
>>is a good idea.
>
>How will relying on the external header fix matters?  The problem is
>that it is always possible to get a transcoding server that doesn't
>understand the format it's transcoding (one reason sending binary files
>via Bitnet was always such an adventurous experience if one of the nodes
>involved was an ASCII site).

In the context of HTTP, the charset parameter on the Content-Type field
is the only thing that can be used to correctly detect the encoding.

>The best that can be hoped for is to have some chance at noticing that
>there is a discrepancy -- particularly important given the frequency
>with which transcoders garble the data (at least ASCII/EBCDIC
>transcoders do -- perhaps the transcoders for CJK character encodings
>work flawlessly all the time).
>
>To do that, you need to have the PI retained.

Most receiving systems will be able to parse the PI and detect the
difference, sure. The problem is that the trancoding *server* cannot
stop them from getting false negatives unless it rewrites the PI. The
probability of HTTP being changed to require this for XML is
vanishingly small. I believe it to also be vanishingly small for any
MIME based protocol (including email).

Taking the failure cases and making them canonical doesn't remove the
problem: it just increases the number of failures.
 

Received on Monday, 23 June 1997 14:13:32 UTC