W3C home > Mailing lists > Public > www-svg@w3.org > November 2004

Re: SVG 1.2 Comment: image/svg+xml;charset=""

From: Thomas DeWeese <Thomas.DeWeese@Kodak.com>
Date: Wed, 24 Nov 2004 07:44:13 -0500
Message-ID: <41A4821D.7090407@Kodak.com>
To: Bjoern Hoehrmann <derhoermi@gmx.net>
CC: Robin Berjon <robin.berjon@expway.fr>, www-svg@w3.org

Bjoern Hoehrmann wrote:

> * Robin Berjon wrote:
>>Take for instance:
>>[~]$ HEAD http://expway.com/robin/foo.xml.sjis | grep Content-Type
>>Content-Type: application/xml; charset=shift_jis
>>[~]$ xmllint http://expway.com/robin/foo.xml.sjis
>><?xml version="1.0" encoding="UTF-8"?>
>>Is that conformant? What do you think most XML parsers do?
> <http://www.bjoernsworld.de/temp/utf8-or-iso-8859-1.svg>, what do you
> think SVG implementations like Batik do? They consider it ISO-8859-1.
> So does the W3C Markup Validator and even MSXML4 does. What was your
> point exactly?

BTW from the Batik source that handles this case:

         // now looking for a charset encoding in the content type such
         // as "image/svg+xml; charset=iso8859-1" this is not official
         // for image/svg+xml yet! only for text/xml and maybe
         // for application/xml

The other problem with this is that even this code will only
work if you give the actual URL to our document factory, at least
for Batik it is quite common for the parser to be simply given
an InputStream to read from (binary stream) from which the
XML parser will construct it's Reader (char stream), based only
on the xml encoding, or even for us to be given a preconstructed
DOM - where it is totally unclear where the encoding came from.
This means that seemingly trivial changes in the way Batik is called
can lead to the same content suddenly failing (now I want to
tweak the DOM before it's processed - oops suddenly things stop
working I wonder why?).

I would be happy to remove the code if the resolution was that
charset was to be ignored for image/svg+xml.

I personally think that the only reasonable thing to do here
is state that if a charset is provided and it doesn't match
the xml encoding then the response is ill-formed and the
behavior is implementation dependent.  Then the only people
who have work to do are people who are sending content with
contradictory charset and xml encoding specifications.  Which
is exactly where the burden of resolving this issue should lie.
Received on Wednesday, 24 November 2004 12:44:25 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 8 March 2017 09:47:01 UTC