Re: 3023 update (was Re: Agenda TAG Telcon: 8th Nov 2004)

At 05:01 04/11/09, Chris Lilley wrote:

 >disagreement
 >
 >- charset. Still says optional but strongly recommended, TAG wants
 >optional and only supplied if correct. TAG wants default to be 'see xml
 >encoding declaration'. So, some media types could omit the parameter
 >entirely. However, attempt to register image/svg+xml without a charset
 >(backed up by pointing to TAG findings) met resistance. Still therefore
 >a conflict with TAG findings.
 >http://www.imc.org/ietf-xml-mime/mail-archive/msg00978.html

I think it is very important to distinguish two levels:

1) What is required, recommended, or allowed on the type registration level
2) What is required, recommended, or allowed on the level of each document

The "Good Practice" in the Web Architecture Document, as far as I understand,
refers to 2):

 >>>>>>>>
Good practice: XML and character encodings

In general, a representation provider SHOULD NOT specify the character 
encoding for XML data in protocol headers since the data is self-describing.
 >>>>>>>>

It speaks explicitly about the representation provider, which I can
only interpret as 2).

With respect to 1), the "In general" and the "SHOULD NOT" in the
"Good Practise" seem to imply that there may be valid reasons to
specify the character encoding for XML data in protocol headers.
This in turn implies that it is a good thing to allow the 'charset'
parameter in mime type registrations.

So I don't see any conflict between the "Good Practice" in the
Web Architecture Document and requesting that image/svg+xml allows
a charset parameter. Indeed, in my understanding, this works very
well together. After all, the reasons for why one may want to use
a charset parameter (APIs and databases that do transcoding on
output and set the parameter automatically,...) are very much
orthogonal from the specific mime type.


So back to the RFC 3023 update.

 >- charset. Still says optional but strongly recommended, TAG wants
 >optional and only supplied if correct.

Nobody wants wrong information, anyway. I don't think there is
any disagreement there. I don't think RFC 3023 says "supply it,
even if it's wrong".

I think that on the level of instances (see 2) above), "strongly
recommended" may be too strong. On the level of mime type registrations
(see 1) above), I think it is appropriate to keep "strongly recommended".

The main exception that I can see are formats (mostly used on a protocol
level) that for efficiency and interoperability reasons restrict the
specific format to use only UTF-8. In that case, the mime type registration
can very well say that there is no charset parameter, because it supplies
no additional information. Even generic XML processors will be able to
deal with this, without any misintepretation. And it is hoped that
mime-type specific implementations don't blow up in the case that
the type is served with an accidental 'charset'.


 >TAG wants default to be 'see xml encoding declaration'.

What do you (or the TAG) mean by "default" in this context?
The XML Recommendation as well as RFC 3023,... give a clear priority
to the charset information in the Content-Type header. To switch
this around after having it defined like this for about 10 years
(starting with HTML i18n or HTML 4.0 or so) would be a very bad
idea.


 >Still therefore
 >a conflict with TAG findings.
 >http://www.imc.org/ietf-xml-mime/mail-archive/msg00978.html

I'm confused. First, that mail points to
http://www.w3.org/2001/tag/2002/0129-mime#char-encoding,
whereas there is a newer version of this document at
http://www.w3.org/2001/tag/2004/0430-mime, which is also
the one pointed to from http://www.w3.org/2001/tag/findings.

I don't see much of a difference between the respective
sections (numbered differently in the above versions because
another section was removed). And I don't see a big conflict
between the TAG finding on the one hand, and updating RFC 3023
or using a charset parameter for image/svg+xml on the other hand.

In particular, that section of the TAG finding, overall, seems
to suggest to replace

<<<<<<<<
The use of the charset parameter is STRONGLY RECOMMENDED, since this
information can be used by XML processors to determine authoritatively
the charset of the XML MIME entity.
<<<<<<<<

with

 >>>>>>>>
The use of the charset parameter, when the charset is reliably known and
agrees with the encoding declaration, is RECOMMENDED, since this information
can be used by non-XML processors to determine authoritatively the charset
of the XML MIME entity.
 >>>>>>>>

I do not see any very big problem with such a change, but of course
the details of the wording should be discussed on the relevant mailing
list rather than prescribed by the TAG.

I do not see any way to deduce from the above text proposed by the TAG
that it is a good idea to disallow the 'charset' parameter on certain
media types. On the contrary, the above wording seems to suggest to
me that it is a good idea to have such a parameter. There ARE
implementations out there that are actually sure about what character
encoding their data is in, and there is a benefit for non-XML processors
to be able to determine that encoding.


Regards,     Martin. 

Received on Thursday, 11 November 2004 06:29:14 UTC