W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > June 1997

re: Determination of Encoding

From: Gavin Nicol <gtn@eps.inso.com>
Date: Mon, 23 Jun 1997 14:12:50 -0400
Message-Id: <199706231812.OAA01506@nathaniel.eps.inso.com>
To: w3c-sgml-wg@w3.org
>>>Consider a proxy server that performs code conversion without rewritting
>>>the PI.  Consider a WWW browser or robot that does not understand XML.
>>>Such browsers or robots certainly exist now and will not disappear in
>>>the near future.  If they save a transfered XML document in a file,
>>>the header information will disappear and the PI will remain incorrect.
>>>Then, an XML parser is likely to fail.
>>Precisely why I way that we must rely on HTTP header. I'm starting to
>>think that Rick's proposal of requiring servers to remove the PI
>>is a good idea.
>How will relying on the external header fix matters?  The problem is
>that it is always possible to get a transcoding server that doesn't
>understand the format it's transcoding (one reason sending binary files
>via Bitnet was always such an adventurous experience if one of the nodes
>involved was an ASCII site).

In the context of HTTP, the charset parameter on the Content-Type field
is the only thing that can be used to correctly detect the encoding.

>The best that can be hoped for is to have some chance at noticing that
>there is a discrepancy -- particularly important given the frequency
>with which transcoders garble the data (at least ASCII/EBCDIC
>transcoders do -- perhaps the transcoders for CJK character encodings
>work flawlessly all the time).
>To do that, you need to have the PI retained.

Most receiving systems will be able to parse the PI and detect the
difference, sure. The problem is that the trancoding *server* cannot
stop them from getting false negatives unless it rewrites the PI. The
probability of HTTP being changed to require this for XML is
vanishingly small. I believe it to also be vanishingly small for any
MIME based protocol (including email).

Taking the failure cases and making them canonical doesn't remove the
problem: it just increases the number of failures.
Received on Monday, 23 June 1997 14:13:32 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:25:11 UTC