- From: Thomas DeWeese <Thomas.DeWeese@Kodak.com>
- Date: Wed, 24 Nov 2004 07:44:13 -0500
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- CC: Robin Berjon <robin.berjon@expway.fr>, www-svg@w3.org
Bjoern Hoehrmann wrote: > * Robin Berjon wrote: > >>Take for instance: >> >>[~]$ HEAD http://expway.com/robin/foo.xml.sjis | grep Content-Type >>Content-Type: application/xml; charset=shift_jis >>[~]$ xmllint http://expway.com/robin/foo.xml.sjis >><?xml version="1.0" encoding="UTF-8"?> >><foo>יגגיייי</foo> >> >>Is that conformant? What do you think most XML parsers do? > > <http://www.bjoernsworld.de/temp/utf8-or-iso-8859-1.svg>, what do you > think SVG implementations like Batik do? They consider it ISO-8859-1. > So does the W3C Markup Validator and even MSXML4 does. What was your > point exactly? BTW from the Batik source that handles this case: // now looking for a charset encoding in the content type such // as "image/svg+xml; charset=iso8859-1" this is not official // for image/svg+xml yet! only for text/xml and maybe // for application/xml The other problem with this is that even this code will only work if you give the actual URL to our document factory, at least for Batik it is quite common for the parser to be simply given an InputStream to read from (binary stream) from which the XML parser will construct it's Reader (char stream), based only on the xml encoding, or even for us to be given a preconstructed DOM - where it is totally unclear where the encoding came from. This means that seemingly trivial changes in the way Batik is called can lead to the same content suddenly failing (now I want to tweak the DOM before it's processed - oops suddenly things stop working I wonder why?). I would be happy to remove the code if the resolution was that charset was to be ignored for image/svg+xml. I personally think that the only reasonable thing to do here is state that if a charset is provided and it doesn't match the xml encoding then the response is ill-formed and the behavior is implementation dependent. Then the only people who have work to do are people who are sending content with contradictory charset and xml encoding specifications. Which is exactly where the burden of resolving this issue should lie.
Received on Wednesday, 24 November 2004 12:44:25 UTC