Re: Transcoding Tamil in the presence of markup

On Sun, 7 Dec 2003, Martin Duerst wrote:

> At 23:16 03/12/07 +0900, Jungshik Shin wrote:
>
> >On Sun, 7 Dec 2003, Peter Jacobi wrote:
>
> > > So, I'm still wondering whether Unicode and HTML4 will consider
> > >   <span style='color:#00f'>&#x0BB2;</span>&#x0BBE;
> > > valid and it is the task of the user agent to make the best out of it.
> >
> >   I think this is valid.
>
> I agree. It is the task of the user agent to make the best out of it,
> and different user agents may currently do different things with it.
> Because this is related to rendering and styling, it seems to make
> sense that this is clarified in the CSS spec (either 2.1 or 3.0).

 Are you gonna bring this up in CSS list(s)?

> >A more interesting case has to do with
> >W3 CHARMOD in which NFC is required/recommended (it's not yet complete
> >and W3C I18N-WG has been discussing it).  Consider the following case.
> >
> >   &#x0BB2;<span class="left_part">&#0x0BC7;</span>
> >  <span class="right_part">&#0x0BBE;</span>
> >
> >Because <U+0BC7, U+0BBE> is equivalent to U+0BCB, we couldn't use
> >the above if NFC is required even though in legacy TSCII encoding,
> >it's possible.
>
> Yes, this is a bad idea. But there is Web technology that can do
> this (see below).

  Ahah, that's neat.


> Similar examples exist in almost any other script. For most
> intents and purposes, most people are okay with what they
> can and can't do, but occasionally, we come close to the
> dividing line, and some of us are quite surprised. But somehow,
> we have to agree on what's a character and what's only a glyph,
> and we have to agree which combinations are canonically equivalent.

  I agree.

> >The same is true of Korean syllables(see below) as
> >Philippe pointed out.
> >
> >   &#x1100;<span class="vowel">&#x1161;</span>&#x11a8;
>
> Yes. Korean is particularly difficult because it is the most
> logical, well-designed script in the world. It has more
> clearly identifiable hierarchical levels than any other
> script. It is very difficult to agree on which level
> characters should be.

  Absolutely. The multi-level representability of Korean script
demonstrates its 'advanced' status as a script (invented only 5.5
centuries ago, it  must have been able to build upon more than 2,000
year's history of writing system), but at the same time, has been a
continuous source of "trouble" because it's hard to agree on which level
to use.


> As an example, the vowel pairs a/ya, o/yo, u/yu, and so on
> are distinguished by changing from one small stroke to two
> small strokes. A Web page for children or foreigners may
> want to color these strokes separately. With the current
> encoding(s) in Unicode this is not possible, but I'm sure
> somebody has designed an encoding where this would be possible.

  Perhaps, the internal (intermediate) encoding of Korean mobile phones
might have some semblance to what you described above. Korean mobile
phones have only three keys for vowels (vertical stroke, horizontal
stroke and dot, the very three components described as building blocks of
Korean vowels in the 15th century book). To enter 'a', you have to type a
vertical stroke and a dot. Entering another 'dot' gives you 'ya'.  This is
rather intuitive because that's the same as 'stroke' order in handwriting.


> So while this does not solve Peter's immediate problem,
> starting to change Unicode to color characters, glyphs,
> or character parts would be an extremely slippery slope.

  Yes, I (and virtually all the people in the thread) agree there's little
to change, if any, in Unicode although I still miss forever-lost canonical
decomposition of complex Korean letters.


> suited to do the job. And such technology actually is
> already around. It's part of SVG. Chris Lilley had a
> very nice example once, but it got lost in a HD crash.
> Chris, any chance of getting a new example?
...
> Here is more or less how it works (as far as I understand it):

(summary snipped ...)

Thanks for the info and the summary. I'm looking forward to seeing
Chris' example 'revived'.

 Jungshik

Received on Sunday, 7 December 2003 16:24:20 UTC