Re: Invasion of the pseudo-people: character encoding in tedious detail

Rick,

Thank you very much for your summary.  I believe that your mail is 
very constructive and provides a sound basis for further discussion.

I am still collecting input from my colleagues in the SGML 
community and the W3C project at Keio and am thus not ready for a full 
proposal.  But allow me to me clarify my point.

Rick Jelliffe writes:
>PSEUDO-MAKOTOSAN
>
>Let me invent another person called Pseudo-Makotosan. He sees the
>need in these terms:
>
>1) there should only be one primary method for a document to describe
>itself; other methods are only in case of failure. PIs are the only way
>to do this.

Yes, what I should have said clearer is that the document itself 
is the most reliable method to describe its encoding.  (This principle 
has been clearly stated by my colleagues such as Hiyama-san and 
Matsuda-san, and none of them members of the W3C ML at Keio disagree.)

Servers and proxy servers must only echo what the 
document says.  Proxy servers with code conversion are disappearing.  
Servers have no reliable information other than the document.  
(In the past, DeleGate servers that always attached "charset=ISO-2022-JP"
caused problem for ASCII documents, said Ishikawa-san at Keio.)

Gavin Nicol writes:

>I think his stance is a bit further afield than that. Seems like they
>want all kinds of autodetection in there.

It is true that I listed many possible hueristics, and that I have not 
made really clear the basic principle.  But my main point is what 
Rick correcly observes.

By the way, I heard from Ishikawa-san that 
RFC 2070 (HTML-I18N) allows the element type "A" to have the CHARSET 
parameter, but the present version of Cougar does not.

Regards,

Murata, Makoto
Fuji Xerox Information Systems
 
Tel: 044-812-7230   Fax: 044-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp

Received on Tuesday, 10 June 1997 00:49:42 UTC