- From: Chris Lilley <chris@w3.org>
- Date: Mon, 29 Apr 2002 05:12:46 +0200
- To: "Jim Ley" <jim@jibbering.com>
- CC: www-svg@w3.org, svg-developers@yahoogroups.com
On Sunday, April 28, 2002, 11:14:42 PM, Jim wrote: JL> "Chris Lilley" <chris@w3.org> >> If it has a suitable fallback font pre-configured. JL> This is completely different to my understanding, if a character is not JL> found in the browsers font, it is to find the character in any available JL> font, In general that is implemented by picking a suitable 'last chance' font that has wide coverage. i am not aware of implementations that proceed to search every font installed - perhaps many hundreds - on the offchance that a missing glyph is found. Mainly for performance reasons. JL> my understanding was brought from such pages as JL> http://ppewww.ph.gla.ac.uk/~flavell/charset/fontface-harmful.html and JL> related (follow the links.) Yes, I read that page many years ago and have referred to it from some papers at earlier Unicode conferences. Its a prime reason why in CSS font-family, as opposed the the vendor-HTML FONT tag a) The font family is a list, not a single value b) It is a priority ordered list c) CSS2 added font descriptors which allows the browser vendor and the user to contribute to the 'font database' d) CSS2 added unicode-range descriptor in particular. Also, a prime difference from the FONT tag is that in CSS, you cannot put (for example) Symbol, or one of the many fonts that associate glyph indices with characters on a one to one basis, onto ASCII text and have it come out looking like a foreign language. Instead, a conformant CSS processor will make it look like a bunch of 'missing glyph' markers. Glyphs are assigned based on Unicode characters, not on random glyph indices. >> JL> AIUI, it does in HTML with modern browsers. >> >> No, only if the user picks an appropriate font. >> >> JL> How are >> JL> we to know which fonts a user has that contains a particular JL> character? We don't. Thats why its not in the author stylesheet but the user stylesheet (or if you prefer, the user configuration settings). >> JL> (I don't send image/svg+xml; charset=utf-8 >> JL> which perhaps I should, >> >> There is no charset parameter defined for image types. JL> http://www.w3.org/TR/charmod/#sec-Encodings JL> (a draft, and I don't follow the exact issues, so my interpretation is JL> potentially wrong...) JL> Says: JL> "Because encoded text cannot be interpreted and processed without knowing JL> the encoding, it is vitally important that the character encoding [...] is JL> known at all times and places where text is exchanged or processed. " Yes, correct. You have demonstrated that the character encoding scheme needs to be transmitted. I agree. You then assert that this can only be transmitted, or is best transmitted, as a MIME chartet parameter. I disagree, very strongly. Since SVG is written in XML, the character encoding is known exactly at all times. Its what the encoding declaration says it is. If there is no encoding declaration, then it is either UTF-8 or UTF-16, a choice which is easily resolvable from looking at the first few bytes of the file for a BOM as defined in the XML specification. Note that this method is robust - the encoding is the same whether the SVG file sis read from local storage, over HTTP, FTP, POP, whatever. Note also that if the encoding declaration is *wrong*, then the XML parser will give a well formedness error and halt. Thus, there is no bad data around. JL> Which seems to be saying to me that when text is transmitted via a JL> protocol such as http you need to include what the CES is to allow for it JL> to be processed correctly Yes. JL> (It seems to me that saying inside the file is JL> too late if you're using 8 or 16 or whatever byte CES's) No, in fact that is very well defined. So, consider the alternative, a charset parameter copied from the text/* types and (unwisely) foisted on the application/* types. This is fragile, out of band information. It raises the possibility that the charset parameter and the encoding declaration may differ. In that case, one either has to establish a precedence or declare this to be an error. The RFC for XML media types establishes a precedence, so it is not an error if they conflict. The downside of this is that the simple act of saving a file locally now involves rewriting the file, otherwise it will fail with a WF error next time it is read. It also means that it is not possible to do server side processing on the file - because its encoding declaration might be wrong, but overridden to the correct value in some server config for HTTP. Now, there is rather a lot of server-side XML processing. Breaking it seems like a really, really bad idea. Lastly, if one relies on a charset parameter then it has to be generated. Either some per-server naming convention - for example, I use .htm8 on the W3C server to force XHTML files to be served with a charset parameter (since they are served as a text/* type). Who knows about my particular naming convention? How would an authoring tool know what to generate? maybee it would be foo.svg.utf8 or /utf8/foo.svg or .... too many possibilities. Wheras with an encoding declaration, it is very clear and totally independent of the server config, which an authoring tool cannot know. Just generate correct XMl according to the XML spec and voila! it all works. Whereas if one uses a charset parameter - well, another way to generate it would be to have the server parse each XML file as it is served, read the encoding declaration, generate the HTTP headers .... this is both inefficient and redundant. JL> http://lists.w3.org/Archives/Public/www-svg/2001Oct/0067.html indicates JL> you were discussing the registration including the charset issue, Yes, and the above is a summary of the discussion. JL> it JL> clearly needs one as an XML document, No, it clearly does not. JL> rfc3023 says this "In particular, JL> the charset parameter SHOULD be used in the same manner, as described in JL> Section 7.1, in order to enhance Interoperability." The problem is that it *decreases* interoperability (except for text/* types where I agree it is absolutely required due to the baroque way the text/* type is defined with a US-ASCII fallback when there is no charset parameter). JL> - Okay it was only JL> SHOULD, but I'd like to see some very good justification of why you're JL> going against this. See above. JL> (Perhaps in the registration of image/svg+xml ) Yes, you will see those same arguments deployed in that registration. -- Chris mailto:chris@w3.org
Received on Sunday, 28 April 2002 23:15:25 UTC