The parse="text" mode from Bjoern Hoehrmann on 2004-12-11 (www-xml-xinclude-comments@w3.org from December 2004)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Sat, 11 Dec 2004 07:10:11 +0100
To: www-xml-xinclude-comments@w3.org
Message-ID: <41c08601.672271968@smtp.bjoern.hoehrmann.de>
Dear XML Core Working Group,

  http://www.w3.org/TR/2004/PR-xinclude-20040930/ states in section 4.3:

[...]
  * if the media type of the resource is text/xml, application/xml, or
    matches the conventions text/*+xml or application/*+xml as described
    in XML Media Types [IETF RFC 3023], the encoding is recognized as
    specified in XML, otherwise 
[...]

It is not clear whether this also applies to other Media Types such as
"message" or "image", e.g. for Message/Email+XML or image/svg+xml.
Please clearly indicate to which types this applies.

I am concerned that future revisions of RFC 3023 or the registration of
MIME Types that are different from the types registered so far might
contradict the requirements of the document, for example, it has been
proposed that there is no charset parameter for image/svg+xml, thus,
without special knowledge of the image/svg+xml MIME Type, XInclude
processors would seem to be required to consider an illegal charset
parameter for image/svg+xml resources which would render them non-
conforming to the image/svg+xml registration. RFC 3023 might also be
revised to make it a fatal error if e.g. a application/xml resource
with a charset parameter that is different from the encoding that would
be determined by XML rules, it would seem that XInclude would contradict
such a requirement. Please include a discussion on how such events will
be handled for XInclude.

As indicated in

  http://lists.w3.org/Archives/Public/www-xml-xinclude-comments/2004Dec/0005.html

the processing of the encoding attribute seems not well-defined. For
example, HTTP/1.1 requires that implementations determine for all text/*
resources without a charset parameter ISO-8859-1 encoding, this means
that for all text/* resources an encoding can be determined without
further processing of the content, thus from the first definition of the
encoding attribute it would seem that the encoding attribute is ignored
for all text/* types. One could read this section however so that this
is not considered external encoding information and thus the encoding
attribute would apply to e.g. text/plain resources. A good first step to
improve the definition of the attribute would be to reference section
4.3 for the definition of the attribute rather than defining it in two
places.

It is not clear how text/xml resources without a charset parameter are
to be processed, the text is, again,

[...]
  * if the media type of the resource is text/xml, application/xml, or
    matches the conventions text/*+xml or application/*+xml as described
    in XML Media Types [IETF RFC 3023], the encoding is recognized as
    specified in XML, otherwise 
[...]

Processing text/xml resources according to XML would mean to process the
resource as if it were application/xml which would be inconsistent with
RFC 3023. Please state clearly what the actual processing requirements
are and indicate clearly whether this is consistent with MIME, HTTP/1.1,
and RFC 3023. Note that RFC 3023 contradicts HTTP/1.1 as described in
RFC 3023. This would include to provide a more precise definition of
what is considered "external encoding information".

Please include a strong warning that this processing can yield in
choosing the wrong encoding e.g. for many resources as inline encoding
information or type specific defaults are ignored.

Please define what happens if UTF-8 has been determined by the last item
in the list and the resource starts with U+FEFF, i.e., whether this is
to be considered a byte order mark or a character.

It is not clear what happens if the encoding attribute has an illegal
value, for example encoding="ISO_8859-7:1987" would be illegal as the
EncName production in XML 1.0 does not include ":". This is not
descibred as fatal error in the specification, thus, if the third item
applies and the encoding attribute has an illegal value applications
might choose to ignore the attribute and continue with the last item, or
ignore that the attribute value is illegal and process the document
using the ISO-8859-7 encoding. Please define processing in this case
more clearly. (Note that ISO_8859-7:1987 is a legal, registered name and
thus it would not be a resource error due to an unsupported encoding if
the implementation supports the encoding and the registered name).

regards.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Saturday, 11 December 2004 06:10:30 UTC