PEX5 XML encoding detection in parse="text" from Paul Grosso on 2005-03-23 (public-xml-core-wg@w3.org from March 2005)

From: Paul Grosso <pgrosso@arbortext.com>
Date: Wed, 23 Mar 2005 12:39:38 -0500
To: <public-xml-core-wg@w3.org>
Message-ID: <F13E1BF26B19BA40AF3C0DE7D4DA0C0303BDF4FD@ati-mail01.arbortext.local>
PEX5 XML encoding detection in parse="text"
-------------------------------------------
The original comment is at
http://lists.w3.org/Archives/Public/www-xml-xinclude-comments/2004Dec/00
06
wherein the commentor says:

 http://www.w3.org/TR/2004/PR-xinclude-20040930/ states in section 4.3:

[...]
  * if the media type of the resource is text/xml, application/xml, or
    matches the conventions text/*+xml or application/*+xml as described
    in XML Media Types [IETF RFC 3023], the encoding is recognized as
    specified in XML, otherwise 
[...]

It is not clear whether this also applies to other Media Types such as
"message" or "image", e.g. for Message/Email+XML or image/svg+xml.
Please clearly indicate to which types this applies.

I am concerned that future revisions of RFC 3023 or the registration of
MIME Types that are different from the types registered so far might
contradict the requirements of the document, for example, it has been
proposed that there is no charset parameter for image/svg+xml, thus,
without special knowledge of the image/svg+xml MIME Type, XInclude
processors would seem to be required to consider an illegal charset
parameter for image/svg+xml resources which would render them non-
conforming to the image/svg+xml registration. RFC 3023 might also be
revised to make it a fatal error if e.g. a application/xml resource
with a charset parameter that is different from the encoding that would
be determined by XML rules, it would seem that XInclude would contradict
such a requirement. Please include a discussion on how such events will
be handled for XInclude.

As indicated in

 
http://lists.w3.org/Archives/Public/www-xml-xinclude-comments/2004Dec/00
05.html

the processing of the encoding attribute seems not well-defined. For
example, HTTP/1.1 requires that implementations determine for all text/*
resources without a charset parameter ISO-8859-1 encoding, this means
that for all text/* resources an encoding can be determined without
further processing of the content, thus from the first definition of the
encoding attribute it would seem that the encoding attribute is ignored
for all text/* types. One could read this section however so that this
is not considered external encoding information and thus the encoding
attribute would apply to e.g. text/plain resources. A good first step to
improve the definition of the attribute would be to reference section
4.3 for the definition of the attribute rather than defining it in two
places.

It is not clear how text/xml resources without a charset parameter are
to be processed, the text is, again,

[...]
  * if the media type of the resource is text/xml, application/xml, or
    matches the conventions text/*+xml or application/*+xml as described
    in XML Media Types [IETF RFC 3023], the encoding is recognized as
    specified in XML, otherwise 
[...]

Processing text/xml resources according to XML would mean to process the
resource as if it were application/xml which would be inconsistent with
RFC 3023. Please state clearly what the actual processing requirements
are and indicate clearly whether this is consistent with MIME, HTTP/1.1,
and RFC 3023. Note that RFC 3023 contradicts HTTP/1.1 as described in
RFC 3023. This would include to provide a more precise definition of
what is considered "external encoding information".

Please include a strong warning that this processing can yield in
choosing the wrong encoding e.g. for many resources as inline encoding
information or type specific defaults are ignored.
Received on Wednesday, 23 March 2005 18:06:41 UTC