- From: Chris Lilley <chris@w3.org>
- Date: Wed, 24 Nov 2004 21:57:06 +0100
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- Cc: www-svg@w3.org
On Wednesday, November 24, 2004, 5:23:20 PM, Bjoern wrote: BH> * Chris Lilley wrote: >>BH> Are you saying that the proposed registration would avoid that, i.e., >>BH> that http://www.bjoernsworld.de/temp/utf8-or-iso-8859-1.svg must be >>BH> considered UTF-8 encoded by all implementations? >> >>Tell, me, what happens if you read that file from local disk on your >>server, or run xslt on it? How do you tell the implementations to ignore >>the well formedness error? BH> XML 1.0 defines no constraint (neither wf-constraint nor fatal errors) BH> that would consider the document in this state non-compliant, the doc BH> is both legal UTF-8 and legal ISO-8859-1. Yes, when I read that I had not realized that the document had been carefully constructed to avoid using byte sequences illegal in either encoding. So the content would 'merely' be mangled, rather than producing a well formedness error. In the general case that won't happen, which is why I asked what you would do when reading from local disk. BH> Regarding the other questions, BH> I would not do that, Uh-huh. I should perhaps have said 'what do you propose the software does'. BH> but if I did I would tell the relevant tool of the BH> higher-level encoding information So basically, you propose never needing or using the xml encoding declaration *at all*? BH> just like I do when processing HTML Ah, I see. As Robin earlier suggested, just because HTML had many problems in this area is no reason to foist them on XML. To quote from a quote on Anne's blog: We threw away everything but the pieces that were known to work and added pretty-good Unicode support, i.e. something else that had been proven to work. I'm trying to preserve the pretty good support, which you already agreed had good interoperability, and you seem to be trying to move it towards the known less interoperable area for all processing, not just over the wire processing. BH> documents, e.g. when using HTML Tidy on a UTF-8 encoded document I use BH> the -utf8 command line switch And you know this is the encoding how? BH> as I have not yet implemented something BH> else. For other tools, I've implemented encoding detection for HTML/XML BH> documents in the HTML::Encoding Perl module (which, btw, would consider BH> the document cited above ISO-8859-1, honoring the charset parameter). You get a charset parameter from local disk? What filesystem are you using, BeOS? -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group Member, W3C Technical Architecture Group
Received on Wednesday, 24 November 2004 20:57:07 UTC