RE: Transliteration from Carrasco Benitez Manuel on 1998-10-20 (www-international@w3.org from October to December 1998)

From: Carrasco Benitez Manuel <manuel.carrasco@emea.eudra.org>
Date: Tue, 20 Oct 1998 12:15:33 +0100
To: "'Harald Tveit Alvestrand'" <Harald.Alvestrand@maxware.no>, www-international@w3.org, tc46sc2@elot.gr
Message-Id: <5DFB753C1329D1119DEC00805F15C34260809B@WS015>

>> 1) The objective is to find a way to tag the
>>    "language transformation".
>>
>> 2) "Language transformation" is expressing
>>     a (source) language in another form with
>>     some relation to a (target) second language.
>>     Tradicionally, transliteration (transformation
>>     of writing) or transcription (transformation
>>     of sound).
>>  
>>     It is not translation; the text is always in
>>     the source language, but somehow transformed.
>
>In this case, I would seriously doubt that the set of names
>of languages is an useful second identifier; for example, the
>Pinyin and new-chinese rules for latinizing of Chinese are
>very different (Peking vs Beijing), but neither bears much
>relation to a single language except Chinese.

Is your point that source language and scheme is sufficient ?
e.g.,

  el-tran-foo   (Greek transformed using scheme foo)

as oppose to

  el-tran-en  (Greek transformed for English using the default scheme)
  el-tran-en-foo (Greek transformed for English using the scheme foo)

i.e., that the scheme implies a language (a missing scheme
would have to be clarified).

In the original discussion, there was a tendency to have the second
language. Personally, I prefer *not* to have the second language.

It would be:
  - Simpler syntactically
  - *Less* confusing (it is cleat the is Greek and nothing else)
  - The information would be carried by the scheme

>> 3) The reason for proposing the extension
>>     of RFC-1766 is because:
>>
>>      3.1) It does *not*  break RFC-1766.

>If you regard the data as a "dialect" of the original language,

This is a good way to view it.

>it may indeed be possible to fit it within the RFC 1766 mindset.

Good.

>There have been proposals in the past to encode within RFC 1766
>the various scripts in which a language is commonly written, such
>as Arabic, Cyrillic and Latin script for some of the former Soviet
>republics' languages.

This is the general intended used. Perhpas expanding "script variations"
to "transformation" or something along this lines.

>But the specific scheme put forward, with its required "tran" tag,
>use of language as second discriminator, and nonregistered schemes,
>is not what I would call the "Right Thing".
>To use your language above, it "feels" wrong.

It has to "feel" right. Any reasonable syntax as long as the objective
is
achived.  If it is possible to "expand the semantics" without
changing the syntax, even better.

The information needed would be:

  - Indication that is a transformation
  - Source language
  - Transformation scheme
  - Target language

The syntax could be simplified.  For example, the "target language"
could
be eliminiated from the syntax and carried by the "transformation
scheme",
where some scheme have no target language.

e.g.

 foo : Greek for English transformation
 faa : Greek "general" transformation (no target language)

I could write a new proposal using a (new) "t" in the primary 
language tag.

e.g.,

 t-el        (Greek transformation using the default scheme)
 t-el-foo   (Greek transformation using the scheme foo)

With the danger that this implies, ISO-639 could be used
to name transformations:

e.g.
 t-el-en    (Greek transformation for English, using the
               default scheme for this language pair)

Regards
Tomas

Received on Tuesday, 20 October 1998 07:15:31 UTC