- From: David Carlisle <davidc@nag.co.uk>
- Date: Mon, 7 Jan 2008 13:35:48 GMT
- To: hsivonen@iki.fi
- CC: www-math@w3.org
> Therefore, I think it would be a mistake and Bad-for-the-Web if any WG > of the W3C tried to push a DTD change or a new DTD for Web deployment. > I find http://www.w3.org/TR/2007/WD-xml-entity-names-20071214/ very > alarming if the intent is to serve those entities over the wire. We do actually echo your arguments against using enity references in the current mathml 3 draft, and all examples in the mathl3 draft uses numeric references (with a comment with the Unicode name) rather than using a named entity reference. I word the arguments against using entities rather more strongly in my blog entry where I introduced that draft http://dpcarlisle.blogspot.com/2007/11/xml-entities-definitions-for-characters.html Apart from gcedil in this threead, I notice a recent request on public-html for sub[123] http://lists.w3.org/Archives/Public/public-html/2007Dec/0228.html Personally I'm very much against either adding or removing any entity names. The names that we have are a somewhat arbitrary collection but I don't see how changing the set of names in any way can make the situtation better. I do think however that changing the definitions can improve the situation. Using characters that were not previously available, and certainly it is not possible to get a consistent set of definitions across html/mathml/docbook/tei unless _someone_ changes. For various reasons the entity names will persist for some time yet in several contexts, and if they are going to persist in several places, personally I think it is worth the effort in getting a consistent set of definitions to Uniocde. It's _much_ easier to safely replace entity references by character data if there are agreed universal definitions, as that makes the change essentially reversible. If different vocabularies don't agree on the definition of phi then it is very hard to make a global change that expands out phi, as this may or may not be losing or corrupting information. > Mnemonic character input should be between the author and his/her > MathML converter. What goes over the wire to the browser should be > unescaped UTF-8. That actually I don't agree with, I can't see any reason not to use numeric references if you so choose (apart from file size) and using numeric references has several advantages. 1) a document that uses nummeric references is much more likely to be served with a correct encoding in the http headers. It _ought_ to be easy to get your http server to serve documents with the correct encoding but experience shows that this is wrong as often as it's right, A document that's ascii + numeric references will almost always be correctly served. A document that's utf8 will often as not end up being served from somewhere as latin 1 with resulting mangling of the character data. (I wish this wasn't true, but it's what I observe). Note that the author of the document often has no control, or even knowledge of the web server being used. For example we put documentation on CD which end users may (or may not) put on a web server or may just read off the file system. using ASCII + NCR (and no DTD!) simplifies the installation instructions enormously. 2) A document with numeric character data is self describing in tutorial examples. If you see a document with & # 1 2 3 4 ; in its source and you want to generate a similar document, then you can see how to generate it. If you see a document with some explict character then you may not know how to recreate that character (except by cut and paste, if that's available). David
Received on Monday, 7 January 2008 13:36:38 UTC