A question about UTF-8 encoding in UNIMARC from xietao on 2002-03-31 (www-zig@w3.org from March 2002)

From: xietao <xietao@datatrans.com.cn>
Date: Sun, 31 Mar 2002 10:39:53 +0800
To: "www-zig@w3.org" <www-zig@w3.org>
Message-Id: <200203310234.VAA19545@www19.w3.org>

Dear all,

I have a question about UTF-8 encoding in UNIMARC(ISO2709) format.

---

In USMARC format, leader postion 09, one character indicate the character coding scheme:

09 - Character coding scheme
Identifies the character coding scheme used in the record.
# - MARC-8
a - UCS/Unicode
(http://lcweb.loc.gov/marc/bibliographic/ecbdldrd.html)

And in
http://lcweb.loc.gov/marc/specifications/speccharucs.html
...
Encoding
The encoding of Unicode characters will be according to the rules of UTF-8 (UCS Transformation Formats-8) which uses designated bits to indicate whether a UCS/Unicode character is represented by 1 octet (8-bits) or multiple octets. This encoding has the advantage of allowing the Basic Latin (ASCII) subset of the MARC 21 repertoire to be encoded the same as in MARC-8 (with 1 octet), thus preserving the basic structural elements of the MARC 21 record, while enabling record content to be multiscript. A brief description of UTF-8 encoding follows, but a fuller description is carried in the UCS and Unicode standards.
...

So when I can identify the encoding of ISO2709 record through the first 24 bytes (record leader).

---

But in UNIMARC format, leader postion 09 is undefined:

9 Undefined
Contains a blank.
(http://www.ifla.org/ifla/VI/3/p1996-1/uni.htm)

Can I use the leader postion 09 in UNIMARC?

If the information about the character encoding format, such as UTF-8, was storage into some fields, which field is it? So I have to look up the field content before I know the character encoding format?

If I export a record(which charset is UCS-2) to a ISO2709 file which encoding use UTF-8, must I change some field contents corresponding to character encoding format?

Thanks.

2002-03-31
XieTao
DataTrans Software Corp. Ltd.
http://www.datatrans.com.cn

Received on Saturday, 30 March 2002 21:34:16 UTC