- From: Chris Lilley <chris@w3.org>
- Date: Tue, 24 Feb 2004 00:57:52 +0100
- To: mrose@dbc.mtview.ca.us
- Cc: www-archive@w3.org, fujisawa.jun@canon.co.jp, duerst@w3.org, ishida@w3.org, <mimasa@w3.org>
Hello, Working with xml2rfc, http://xml.resource.org/ I came across a problem. (At least) one of the bibliographic files is not well formed and thus cannot be processed by an XML system. I would like to help correct this. The file in question is http://xml.resource.org/public/rfc/bibxml4/reference.W3C.REC-SVG11-20030114.xml I suspect that whatever processed the original W3C Rec http://www.w3.org/TR/2003/REC-SVG11-20030114 to produce this file made the invalid assumption that the Rec was encoded in 8859-1 and that the 'initial' of an editors name could be obtained by taking the first byte of the name. In this case, however a) the file is in UTF-8 b) the editors name is in Japanese c) Japanese names are in the order FAMILY given d) the program took the first byte of the first character of the family name (Fujisawa in Japanese) e) In UTF-8, such a byte must be followed by another f) the closing quote and the space separating this attribute from the next get gobbled up and the XML is not well formed. Having dealt with why the file is incorrect, the next problem is what the correct contents should be. Three possibilities present themselves a) Since an 'initial' is the first letter of the given or personal name, and since the name in this case consists of one letter (Japanese for 'Jun'), use it. (This is email, I can't paste it here ;-) b) Pretend that XML is limited to ASCII and use "J" as the initial and "Fujisawa" in English as the surname. c) Ask the editor (copied), or some W3C Internationalization experts competent in Japanese (copied) if there is any equivalent for 'initial' in Japanese. I appreciate that all three of these options require human intervention rather than machine processing (yes, even option b, because the script still has to skip over the Japanese text to get to the English text). To offset this, I am happy to submit a revised and manually constructed replacement entry once I know what to put in it. I suspect that some other SVG specifications (WDs, CR, PR) and some HTML specifications edited by Masayasu Ishikawa (copied) will also be affected. -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group Member, W3C Technical Architecture Group
Received on Monday, 23 February 2004 18:57:53 UTC