A question about UTF-8 encoding in UNIMARC

Dear all,

I have a question about UTF-8 encoding in UNIMARC(ISO2709) format.

---

In USMARC format, leader postion 09, one character indicate the character coding scheme:

09 - Character coding scheme
Identifies the character coding scheme used in the record. 
# - MARC-8
a - UCS/Unicode
(http://lcweb.loc.gov/marc/bibliographic/ecbdldrd.html)

And in
http://lcweb.loc.gov/marc/specifications/speccharucs.html
...
Encoding
The encoding of Unicode characters will be according to the rules of UTF-8 (UCS Transformation Formats-8) which uses designated bits to indicate whether a UCS/Unicode character is represented by 1 octet (8-bits) or multiple octets. This encoding has the advantage of allowing the Basic Latin (ASCII) subset of the MARC 21 repertoire to be encoded the same as in MARC-8 (with 1 octet), thus preserving the basic structural elements of the MARC 21 record, while enabling record content to be multiscript. A brief description of UTF-8 encoding follows, but a fuller description is carried in the UCS and Unicode standards. 
...

So when I can identify the encoding of ISO2709 record through the first 24 bytes (record leader).

---

But in UNIMARC format, leader postion 09 is undefined:

9 Undefined
Contains a blank. 
(http://www.ifla.org/ifla/VI/3/p1996-1/uni.htm)

Can I use the leader postion 09 in UNIMARC?

If the information about the character encoding format, such as UTF-8, was storage into some fields, which field is it? So I have to look up the field content before I know the character encoding format? 

If I export a record(which charset is UCS-2) to a ISO2709 file which encoding use UTF-8, must I change some field contents corresponding to character encoding format?  

Thanks.


 	
2002-03-31			
XieTao
DataTrans Software Corp. Ltd.
http://www.datatrans.com.cn

Received on Saturday, 30 March 2002 21:34:16 UTC