- From: Olle Jarnefors <ojarnef@admin.kth.se>
- Date: Thu, 15 Dec 1994 23:59:48 +0100
- To: i18n@dkuug.dk, ietf-charsets@INNOSOFT.COM, insoft-l@trans2.b30.ingr.com, iso8859@jhuvm.hcf.jhu.edu, iso10646@jhuvm.hcf.jhu.edu, rustex-l@ubvm.cc.buffalo.edu, Datorpostteknik i Norden <nordpost@nada.kth.se>, Projektet SUNET-MIME <sunet-mime@sunet.se>, tc304p09@dkuug.dk, tc304wg4@dkuug.dk, teckenbok@sics.se, wg-char@rare.nl, wg-msg@rare.nl, tref@vhs.se
- Cc: Olle Jarnefors <ojarnef@nada.kth.se>
Trans-European Research and Education ANNOUNCEMENT Prototype Ap45 Networking Association (TERENA) 1994-12-15 Coded Character Set Conversion Task-Force (C3-TF) ALPHA TEST RELEASE OF THE C3 SYSTEM FOR CODED CHARACTER SET CONVERSION TERENA (formerly RARE) has supported the development of better tools for conversion between the continuously growing number of coded character sets in use in academic computer networks in Europe. The intention is to produce a general and flexible system for Coded CHaracter set Conversion, called >>> The C3 System <<< This is the announcement of the alpha test release of software (for Unix) and tables for the C3 System for *limited* distribution amongst interested implementors, system administrators and users. +-------------------------------------------------------+ ! Notice that this is a pre-release of software under ! ! development, which has not yet been thoroughly tested ! ! and is not intended for production use. ! +-------------------------------------------------------+ The package consists of: > ANSI C code for a software library implementing parts of the C3 API (see below). > ANSI C code for a program "ccconv", which can be used either as a character stream conversion filter or as a file conversion program. > Binary files for this software, compiled for SunOS 4.3.x > Approximation table (see below) > Definition tables for the following coded character sets: ASCII ANSI X3.4 Swedish general 7-bit character set SS 63 61 27 Swedish 7-bit character set for names SS 63 61 27 Norwegian 7-bit character set NS 4551 UK 7-bit character set BS 4730 Croatian/Slovene 7-bit character set JUS I.Bl. 002 Latin-1 8-bit character set ISO 8859-1 Latin-2 8-bit character set ISO 8859-2 Latin-Cyrillic 8-bit character set ISO 8859-5 Original IBM PC character set IBM CP437 International IBM PC character set IBM CP850 Macintosh Extended Roman character set UCS in 2-octet form at level 1 ISO 10646 > Documentation files: Introduction to the C3 System (8 pages) Directions for the installation of the C3 System (2 pages) How to use the "ccconv" file conversion utility (4 pages) How to use the C3 library of C functions (18 pages) Explanation of identifiers and names used in C3 (2 pages) Specification of the C3 API for conversion functions (37 pages) The software is developed with the GNU gcc compiler, but any C compiler allowing "const" and ANSI C function prototypes should work. The latest C3 distribution and other C3 information is avaliable in World Wide Web through <URL:http://www.nada.kth.se/i18n/c3/> or by anonymous FTP to ftp.nada.kth.se, directory "pub/i18n/c3", i.e. <URL:ftp://ftp.nada.kth.se/pub/i18n/c3/> Email addresses: <c3-questions@nada.kth.se> Questions, comments, bug reports, etc. <c3-info-request@nada.kth.se> Subscription to info-about-C3 list <c3-request@nada.kth.se> Subscription to discussion-about-C3 list <c3@nada.kth.se> Contribution to discussion-about-C3 list Features list: + Full _generality_: conversion can be done in any direction between any pair of the coded character sets included in the system. + _Approximate conversion_ when exact conversion is impossible: There are no arbitrary identification of different characters in the source and the target character sets. If the target character set lacks a source character, the best possible replacement character or string is used. + Can handle not only simple 7-bit and 8-bit coded character sets, but also _advanced character sets_ such as the 16-bit ISO 10646 character set (on implementation level 1) and stateful character sets like ISO 6937/T.61. Incomplete character sets, character sets lacking control characters, indeterministic character sets, and ambiguous character sets are also supported. + _Easy to use_ for the unsofisticated user (by means of carefully chosen defaults). + _Flexible_ and fully configurable for the sophisticated user/system administrator/application developer. + _Conversion parameters_ control the exact conversions performed: different needs or restrictions in different situations is easily handled by means of - the three conversion types (one-to-one, legible, reversible) - separate specification of the conversion of line breaks - the factor system (for varying cultural expectations affecting preferrable approximate conversions). + _Easy to customize_: The conversion tables use a format optimized for human readability which only uses the subset of ISO 10646 hexadecimal values are used to refer to characters. 82 graphic characters available in all coded character sets. Different full sets of conversion tables can be used in parallel. + _Simple to extend_: To add a new coded character set, only provide a definition table for it and approximate conversions for any character in it that isn't included in any already defined coded character set. + _Scalable_: To fully define the N(N-1) possible conversion paths between N different coded character sets, only N+1 conversion tables are needed. How conversion is to be done is defined by means of ISO 10646 as a common interface, but the actual conversion is a direct transformation from source character set to target character set, not involving a 10646 representation as an intermediate step. Temporary files are not needed. What's unique in the C3 System? The approximation table is the most innovative element in the C3 approach to character set conversion. It specifies for each character in any of the character sets for which definition tables are given, how it is to be represented approximately (by fall-back) in the target character set, if the character is _not_ included in that character set. Several alternative representations are specified for some characters, to take advantage of the different character repertoires of different target character sets. The conversion tables use only the invariant part of ASCII. To indicate other characters, the hexadecimal form of the coded representations in UCS is used. No information specific to a certain coded character set is included in the approximation table. The approximation table defines three types of conversion which the user can choose from: Type 1 converts one source character to one target character (best for tables and fields with length restrictions). Type 2 converts characters to a more understandable approximate representation, which may consists of one or a few target characters (best for prose). Type 3 is a reversible one-character-to-many-characters conversion, which is based on the mnemonics defined by RFC 1345. The C3 Task Force within TERENA consists of: Borka Jerman-Blazic <jerman-blazic@ijs.si> Olle Jarnefors <ojarnef@admin.kth.se> Peter Svanberg <psv@nada.kth.se> Keld Simonsen <keld@dkuug.dk> --Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Thursday, 15 December 1994 15:02:46 UTC