re: Determination of Encoding from Gavin Nicol on 1997-06-23 (w3c-sgml-wg@w3.org from June 1997)

From: Gavin Nicol <gtn@eps.inso.com>
Date: Mon, 23 Jun 1997 14:12:50 -0400
To: w3c-sgml-wg@w3.org
Message-Id: <199706231812.OAA01506@nathaniel.eps.inso.com>

>>>Consider a proxy server that performs code conversion without rewritting
>>>the PI.  Consider a WWW browser or robot that does not understand XML.
>>>Such browsers or robots certainly exist now and will not disappear in
>>>the near future.  If they save a transfered XML document in a file,
>>>the header information will disappear and the PI will remain incorrect.
>>>Then, an XML parser is likely to fail.
>>
>>Precisely why I way that we must rely on HTTP header. I'm starting to
>>think that Rick's proposal of requiring servers to remove the PI
>>is a good idea.
>
>How will relying on the external header fix matters?  The problem is
>that it is always possible to get a transcoding server that doesn't
>understand the format it's transcoding (one reason sending binary files
>via Bitnet was always such an adventurous experience if one of the nodes
>involved was an ASCII site).

In the context of HTTP, the charset parameter on the Content-Type field
is the only thing that can be used to correctly detect the encoding.

>The best that can be hoped for is to have some chance at noticing that
>there is a discrepancy -- particularly important given the frequency
>with which transcoders garble the data (at least ASCII/EBCDIC
>transcoders do -- perhaps the transcoders for CJK character encodings
>work flawlessly all the time).
>
>To do that, you need to have the PI retained.

Most receiving systems will be able to parse the PI and detect the
difference, sure. The problem is that the trancoding *server* cannot
stop them from getting false negatives unless it rewrites the PI. The
probability of HTTP being changed to require this for XML is
vanishingly small. I believe it to also be vanishingly small for any
MIME based protocol (including email).

Taking the failure cases and making them canonical doesn't remove the
problem: it just increases the number of failures.

Received on Monday, 23 June 1997 14:13:32 UTC