- From: xietao <xietao@datatrans.com.cn>
- Date: Sun, 31 Mar 2002 10:39:53 +0800
- To: "www-zig@w3.org" <www-zig@w3.org>
Dear all, I have a question about UTF-8 encoding in UNIMARC(ISO2709) format. --- In USMARC format, leader postion 09, one character indicate the character coding scheme: 09 - Character coding scheme Identifies the character coding scheme used in the record. # - MARC-8 a - UCS/Unicode (http://lcweb.loc.gov/marc/bibliographic/ecbdldrd.html) And in http://lcweb.loc.gov/marc/specifications/speccharucs.html ... Encoding The encoding of Unicode characters will be according to the rules of UTF-8 (UCS Transformation Formats-8) which uses designated bits to indicate whether a UCS/Unicode character is represented by 1 octet (8-bits) or multiple octets. This encoding has the advantage of allowing the Basic Latin (ASCII) subset of the MARC 21 repertoire to be encoded the same as in MARC-8 (with 1 octet), thus preserving the basic structural elements of the MARC 21 record, while enabling record content to be multiscript. A brief description of UTF-8 encoding follows, but a fuller description is carried in the UCS and Unicode standards. ... So when I can identify the encoding of ISO2709 record through the first 24 bytes (record leader). --- But in UNIMARC format, leader postion 09 is undefined: 9 Undefined Contains a blank. (http://www.ifla.org/ifla/VI/3/p1996-1/uni.htm) Can I use the leader postion 09 in UNIMARC? If the information about the character encoding format, such as UTF-8, was storage into some fields, which field is it? So I have to look up the field content before I know the character encoding format? If I export a record(which charset is UCS-2) to a ISO2709 file which encoding use UTF-8, must I change some field contents corresponding to character encoding format? Thanks. 2002-03-31 XieTao DataTrans Software Corp. Ltd. http://www.datatrans.com.cn
Received on Saturday, 30 March 2002 21:34:16 UTC