encodings, and "publishing documents" [Re: Are the public HTML DTDs valid XML?]

On Friday 07 December 2001 13:17, Christian Wolfgang Hujer wrote:
|   >
|   > Hello Christian!
|   >
|   > I guess you have never used Cyrillic - as your advice (quoted above) is
|   > absolutely useless for Cyrillic-based alphabets.
|
|   that's partially not true :)

ok! :-)

|   I haven't used Cyrillic that much, I only use Cyrillic, next to Klingon
| and Bopomofo, in XML courses to demonstrate students the power of Unicode.
| But my advice is definitely not useless, but also very useful for all
| non-Latin alphabets.

Now I should ask you what is Klingon and Bopomofo. :-)  

well, I know that this is quite common practice to encode *non-ASCII* 
characters (using &xxxx; ).
I found MS Word guilty in such broken practice, Macromedia products have same 
problems (not always but quite often). Allaire HomeSite tend to do this as 
well (and often doesn't understand cut'n'pasted Cyrillic due to this reason). 
I refer here to Windows versions of those programs. 
BTW it partialy explains why I do not use Windows anymore :-))

|
|   To be precise, I didn't mention I meant *publishing*, not *writing*. No I
|   say it.
|   I mean the encoding for publishing, not the encoding for writing.

ok, now I am confused.
Anyway, let me explain how I see typical publishing of *documents* (under 
*documents* I understand typical business memo, article in newspaper, etc.)
You type article/text in word processor. Than you "Save As HTML". You get 
HTML or XHTML file as an output. 
What encoding you get in such HTML/XHTML depends on your word processor.
I use KWord in such cases, and save docs as XHTML Strict with CSS2 formatting.
MS IE, Mozilla, Netscape, Konqueror do not have problems opening/rendering 
such docs.
Default encoding for such HTML exported from KWord is Unicode/UTF8.

[...]
|   > For all other cases, you should use Unicode (UTF-8).
|   > Unicode TTF fonts are widely available nowdays, so I see no problem
|   > with transition to Unicode. Windows 2000 has good support for Unicode,
|   > KDE (Linux,
|   > UNIX, FreeBSD) supports Unicode natively and I guess MacOS X too.
|   > So all major platforms completed migration and supporting
|   > *legacy* technics
|   > like  ü for Umlaut make no sence anymore.
|
|   That's where I cannot agree.

why?
|
|   - Does your cell phone have Unicode/UTF-8 support?

it's pretty well known that current models of mobile phones are terrible.
I hope you don't use some "rrecent model with WAP support", do you?
Anyway, until G3 cellular networks became common, mobile phone users will not 
use Internet from phones.

|   - Do Opera 5, 4, 3.6, Voyager, iBrowse, AWeb have Unicode/UTF-8 support?

It's known that Opera5 has problems with Unicode support.
IIRC this was one of the (officcial) reasons why MSN blocked access for Opera 
browser. (you can check some links on my web page, http://kde2.newmail.ru)
Please get me correctly: I like Opera browser, it has nice features.
But fact that Opera5 can't support Unicode correctly - is problem of company 
named Opera Software.
I use Konqueror, it has good Unicode support.
as about Voyager, iBrowse, AWeb - I guess these are some minor/experimental 
browsers? I haven't heard about those ones.
If they do not support Unicode - than they should get support ASAP. Otherwise 
they will disappear earlier than they matured :-)
 
|   - How many users do Amiga OS, Atari, BeOS, Mac OS 9 and older, some older
|   Linuxes, BSDs etc. have?

my recent research on nation-wide sites (in Russia) shows that web surfers 
with MacOS have 0.9% market share, and Linux users from 1.3% to 2.6%, 
depending on method of calculation. This mean that Windows has about 96.5% 
market share. I am going to right article about it but that article is not 
yet ready.
Amiga OS, Atari, BeOS - is history.
// don't get me wrong I was programming on AtariST in 1988. But again' that's 
history.

|
|   So
|   a) Legacy encodings are bad for known reasons
|   b) UTF-8 is still not supported enough
|   What's left?
|   Yes, ASCII.

Not for Cyrillic users.
There are appx. 300 million people using Cyrillic. It's usage includes 
Russian language but not limited to it. Bulgarian, Serbian, Macedonian, 
Ukranian, Belorussian languages,  other ex-USSR countries use Cyrillic 
alphabet.
ASCII knows nothing  about Cyrillic.
What options are left for us? Well, people in Russia use windows-cp1251 
encoding, invented and implemented by Microsoft.
 ASCII has no usage here. So, frankly speaking your (mine) choice is between 
2 options: cp-1251 and Unicode.  I prefer Unicode but agree that cp1251 has 
dominance, due to the fact that Microsoft software is installed on more than 
90% of desktops. 

|
|   Of course I do not suggest you *write* using ASCII, that can be annoying,
|   even in German, where it is required to use ä, Ä, ö,
| Ö, ü, Ü and ß. How annoying must it be in Chinese!
|   I suggest write in whatever encoding you like.
|
|   I suggest you *publish* in ASCII because that's always supported.

well, let me back here XML (while I understand that it can be partially 
off-topic on www-html mailing list)
default encoding for XML is UTF-8. So frankly speaking I do not understand 
why you want to use ASCII when UTF-8 is default (standard)

As I have mentioned, I use KWord for documents. KWord's native format is XML. 
XML documents are encoded in UTF8. To save disk space, all XML files and 
gzipped. (.tar.gz)
KWord "publishes" docs in HTML, XHTML, PostScript or PDF. New export filters 
are coming, but currentl list covers more than 99% of typical usage.

So thanks for proposed conversion method but I think it's rather useless for 
me, as I use more advanced technics ;-)
 
[...]
|
|   I understand your protest, but your protest is not neccessary.
|

Hmm, so far I was not protesting against something...
|
|   Greetings
|
|   Christian

BR,
-- 

Vadim Plessky
http://kde2.newmail.ru  (English)
33 Window Decorations and 6 Widget Styles for KDE
http://kde2.newmail.ru/kde_themes.html
KDE mini-Themes
http://kde2.newmail.ru/themes/

Received on Friday, 7 December 2001 11:29:23 UTC