W3C home > Mailing lists > Public > www-tag@w3.org > September 2003

Re: Requesting a revision of RFC3023

From: MURATA Makoto <murata@hokkaido.email.ne.jp>
Date: Thu, 18 Sep 2003 01:30:54 +0900
To: ietf-xml-mime@imc.org
Cc: WWW-Tag <www-tag@w3.org>
Message-Id: <20030918011830.E21F.MURATA@hokkaido.email.ne.jp>

First, Simon and I were asked by the W3C team not to take any action
on RFC 3023.  This is because the MIME type registration procedure was
expected to change (see [1] and [2]).  So, Simon, Dan, and I can't do 
anything right now.

As for the charset parameter, I am still uneasy to disallow or
deprecate it.  But I agree to make "clear that nobody sending a
media-type should send a charset for an XML media-type unless it
REALLY REALLY KNOWS what it's sending," and to deprecate text/xml not
because the charset parameter is harmful but because most XML is not
text for casual users.

I have repeatedly asked (e.g., [3]) what is the position of the TAG on
charset detection for non-XML formats.  The latest version of the TAG
finding document "Client handling of MIME headers" appears to

	(1) non-self-describing data formats should rely on the
            charset parameter, and

	(2) self-describing data formats should introduce their own
	    mechanism for specifying charsets.

This implies that XQuery should introduce its own mechanism (see [4])
and that the compact syntax of RELAX NG should introduce another.
CSS, HTML, and XML already have different mechanisms.  I personally
think that this approach will make the current situation even worse.

Many textual data on the WWW requires charset detection.  For example:

1) plain text, XML, HTML, CSS, XQuery, VBScript, Javascript, JSP, perl, 
   RNG compact schemas,  etc. on the server side,

2) textual data generated by CGI programs, Servlets,  Applets, XSLT 
   stylesheets, etc. on the server side,

3) text typed in forms of HTML and sent as multipart/form-data via 
   HTTP also require charset information.

At present, different technologies have introduced different ad-hoc 
solutions  [5].  As a result, it is VERY HARD  to create well-internationalized 
WWW applications.  You have to use several mechanisms correctly.  Nobody
is trying to provide a generalized solution.

As far as I know, the charset parameter is the only generic mechanism.  I 
know the charset parameter is not working well, but I do not see any other 
generic mechanisms.

[1] http://lists.w3.org/Archives/Public/public-ietf-w3c/2003Jul/0000.html
[2] http://www.ietf.org/internet-drafts/draft-freed-mime-p4-02.txt
[3] http://lists.w3.org/Archives/Public/www-tag/2003Apr/0104.html
[4] http://www.w3.org/TR/xquery/#xquery-encoding
[5] http://www.asahi-net.or.jp/~eb2m-mrt/charsetDetection.html


MURATA Makoto <murata@hokkaido.email.ne.jp>
Received on Friday, 19 September 2003 08:18:35 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:32:39 UTC