Re: internet media types and encoding

MURATA Makoto  wrote:

> Ideally, there should be either

>  1) a single in-band encoding declaration mechanism for all 
>     textual formats

> or

>  2) a single out-band encoding declaration mechanism for 
>     all protocols,


I agree that there should be an in-band encoding-declaration mechanism developed
for other text formats, and that XML's three-stage mechanism (endian/charsize 
signature determination, ASCII/EBCDIC family determination, then reading the 
encoding in minimal literal repertoire) is now a proven approach.  I don't see 
a particular reason for XML to change its syntax. (It doesn't have the problem, 
it shouldn't have to bear any cost.) It is not as important that everyone uses the 
same encoding-header *syntax* as it is that every text format has the
same in-band signalling mechanism available/required, so I hope we won't
be too distracted by XML aspects: XML is the least of it.

The layering problem here is not a W3C one, but in part an IETF one: it goes
to the poverty of text/* and application/*.  The old ways of thinking about text
(that we can all use ASCII, that our data is local and so will all use the same 
encoding, that our data is international and so will all use a single encoding, 
that some other magical application layer will look after encoding) are at their 
expiry dates. 

To amplify Murata-san: W3C should develop a new text format (perhaps called
"itext") which has some XML-ish encoding header in-band, and some of numeric character 
references, newline correction, normalization and C0/C1 code redundancy as well!
And W3C should spearhead IETF to get a new content-type branch established,
"itext/*", available for text/* and application/*+xml to migrate over to itext/*.

>  If all textual formats (including Javascript, Perl, ruby, etc.) had adopted 
> the same mechanism for in-band encoding declarations, the current situation 
> should have been at least more consistent.

When we worked on XML, Gavin Nicol proposed adopting a format similar to
the ASCII MIME header, appendable to the top of any text.  If the requirement
is to reform text/* then Gavin's proposal could be considered too. It ups the ante
by allowing any sort of MIME-ish metadata, which has a great appeal too. (On 
the other hand, now that XML and SOAP are established, that kind of metadata
consideration may not be so important.)

Cheers
Rick Jelliffe

Received on Monday, 21 April 2003 01:59:46 UTC