W3C home > Mailing lists > Public > www-archive@w3.org > February 2004

Issue with mangled Japanese names in XML binbliographic entries

From: Chris Lilley <chris@w3.org>
Date: Tue, 24 Feb 2004 00:57:52 +0100
Message-ID: <1836390012.20040224005752@w3.org>
To: mrose@dbc.mtview.ca.us
Cc: www-archive@w3.org, fujisawa.jun@canon.co.jp, duerst@w3.org, ishida@w3.org, <mimasa@w3.org>

Hello,

Working with xml2rfc,
http://xml.resource.org/

I came across a problem. (At least) one of the bibliographic files is
not well formed and thus cannot be processed by an XML system. I would
like to help correct this.

The file in question is
http://xml.resource.org/public/rfc/bibxml4/reference.W3C.REC-SVG11-20030114.xml

I suspect that whatever processed the original W3C Rec
http://www.w3.org/TR/2003/REC-SVG11-20030114

to produce this file made the invalid assumption that the Rec was
encoded in 8859-1 and that the 'initial' of an editors name could be
obtained by taking the first byte of the name.

In this case, however
a) the file is in UTF-8
b) the editors name is in Japanese
c) Japanese names are in the order FAMILY given
d) the program took the first byte of the first character
of the family name (Fujisawa in Japanese)
e) In UTF-8, such a byte must be followed by another
f) the closing quote and the space separating this attribute from the
next get gobbled up and the XML is not well formed.

Having dealt with why the file is incorrect, the next problem is what
the correct contents should be. Three possibilities present themselves

a) Since an 'initial' is the first letter of the given or personal
name, and since the name in this case consists of one letter (Japanese
for 'Jun'), use it. (This is email, I can't paste it here ;-)

b) Pretend that XML is limited to ASCII and use "J" as the initial and
"Fujisawa" in English as the surname.

c) Ask the editor (copied), or some W3C Internationalization experts
competent in Japanese (copied) if there is any equivalent for
'initial' in Japanese.

I appreciate that all three of these options require human
intervention rather than machine processing (yes, even option b,
because the script still has to skip over the Japanese text to get to
the English text).

To offset this, I am happy to submit a revised and manually
constructed replacement entry once I know what to put in it. I suspect
that some other SVG specifications (WDs, CR, PR) and some HTML
specifications edited by Masayasu Ishikawa (copied) will also be
affected.

-- 
 Chris Lilley                    mailto:chris@w3.org
 Chair, W3C SVG Working Group
 Member, W3C Technical Architecture Group
Received on Monday, 23 February 2004 18:57:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 7 November 2012 14:17:39 GMT