W3C home > Mailing lists > Public > www-html@w3.org > December 2001

Re: encodings, and "publishing documents" [Re: Are the public HTML DTDs valid XML?]

From: Vadim Plessky <lucy-ples@mtu-net.ru>
Date: Sat, 8 Dec 2001 01:31:02 +0000
Message-Id: <200112072309.fB7N9GH30849@post.cnt.ru>
To: "Christian Wolfgang Hujer" <Christian.Hujer@itcqis.com>, <www-html@w3.org>
On Friday 07 December 2001 17:01, Christian Wolfgang Hujer wrote:
|   > I refer here to Windows versions of those programs.
|   > BTW it partialy explains why I do not use Windows anymore :-))
|   What about that practice is broken?

You mean, "practice of not using Windows"?
I would be very surprised to hear that. That's the only practice I can 
recommend so far - as Mac is enermously expensive, and SGI and SPARC 
workstations are even more expensive than Macs.

|   It is good practice to encode all non-ASCII-characters using &xxxx;
| because then every software that is capable of reading ASCII is capable of
| reading the document.

ok, let me do simple arithmetic excersize than.
We have (in Russian) 33 letters of alphabet, with capital letters total 
number of "Russian Cyrillic" letetrs is 66.
In typical Russian text, *none* of English letters/glyphs is used.

You propose to encode each Cyrillic glyph as &cccc; - 6 bytes per glyph.
It's 6 times bigger size comparing to cp1251 encoding, and 3 times more than 
I can't call it good practice -as page 50K in size will increase to 150K - 
300K. That's just not normal.
|   If you use UTF-8, even Internet Explorer 6.0 will get XHTML documents
| wrong when the <meta/> delcaration for the charset is missing.
|   And sometimes Internet Explorer does not interpret the charset
| declaration in a <meta/> element.

ok, I was not aware of these facts.
But I, personally, can't care less - as I don't use Windows. :-)

|   > What encoding you get in such HTML/XHTML depends on your word
|   > processor. I use KWord in such cases, and save docs as XHTML Strict
|   > with
|   > CSS2 formatting.
|   > MS IE, Mozilla, Netscape, Konqueror do not have problems
|   > opening/rendering
|   > such docs.
|   > Default encoding for such HTML exported from KWord is Unicode/UTF8.
|   If *publishing* is *saving* them from a "word processor" (ouch, that's
| not the tool to generate HTML, especially if its name is Microsoft Word),
| then you're right.

well, I can't call MS Word good example of word processor. And HTML generated 
by MS Word just sucks.
 KWord is *excellent* generator of HTML - as it generates *XHTML strict*, 
*with CSS2 formatting*. (more info at http://www.koffice.org)
Quality of generated code is superior to what you can find on 99.9% of all 
web pages.
|   But for me, even Macromedia DreamWeaver is not the tool to create XHTML
|   documents. I want valid documents. So I threw away HomeSite, Fusion,
|   FrontPage, DreamWeaver and all those.

Yes, MS Write or Wordpad is superior to programs you mentioned in terms of 
writing good HTML. And I use KWrite/KATE (KDE Advanced Text Editor), you can 
think of it as Wordpad on Steroids (syntax highlighting. MDI interface, 
easily handles 1.5MB files, etc.)
I just need to add to your list Adobe GoLive, Adobe Pagemaker, Adobe 
InDesign, Quark Xpress. Plus all of about 20 HTML editors I have tried on 
Windows platform and threw away.
|   I write XHTML by hand and use Transformation for all tasks like adding
| tocs, headers, footers, style and so on.

Good!  Yep, really good!

|   >
|   > why?
|   Mac OS 9 and older have a big market share in USA and a small (but not
| too small) market share in Europe. Unicode is a problem for Browsers on Mac
| OS.

Aha, one more Mac-specific problem. I thought that MacIE5 doesn't have it.
If problem still exists, even in MacIE5 (which, as I heard, is a very good 
browser) - than Mac users should install Lunux for PowerPC.
As I wrote, Unicode problem is solved on Linux. (and my favourite Linux, 
Linux-Mandrake, offers PowerPC port. It has same KDE 2.2 whcih I run on Intel 
i586 architecture)

|   > I use Konqueror, it has good Unicode support.
|   > as about Voyager, iBrowse, AWeb - I guess these are some
|   > minor/experimental
|   > browsers? I haven't heard about those ones.
|   > If they do not support Unicode - than they should get support
|   > ASAP. Otherwise
|   > they will disappear earlier than they matured :-)
|   Well, they are all about 8 years old, except for AWeb, which is a bit
|   younger.
|   Their OS just has no big market share: Amiga OS.

Is it [Amiga OS] still alive?
I know that there are 2 Mac-specific browsers, OmniWeb and iCab.
But didn't know that there are Amiga-specific browsers.
(and, BTW I can't consider Sun's HotJava as "browser", as it's market share 
is below 0.001% - and I even haven't sen it in server logs at all)
IMO browser becomes visible when it hits user base of at least 1 million 
Note - *users*, not *1 million of downloads*.
That's why it's always funny to me when I read that "number of downloads for 
Opera5 achieved 5 million"
Ok, I downloaded Opera, and even twice - once for Linux and once for Windows.
And guess what? I even don't use it on-line. I use it only offline, for 
testing purposes.

|   > Amiga OS, Atari, BeOS - is history.
|   > // don't get me wrong I was programming on AtariST in 1988. But
|   > again' that's
|   > history.
|   Oldtimers are also history, but streets are still built in a way that old
|   timers can drive on it.
|   Amiga OS, Atari, BeOS might be history to *you*, but not to the freaks
| that "still" use them.
|   But that's not the place to discuss that.
|   But as much as you would like to see support for Linux, they would like
| to see support of their platforms, just by using standards and a chance for
| migrating to newer technologies step by step.

well, I do not expect that somebody will bring me what I need "automatically".
I fight for web standards, and develop code which will make my life (and of 
many other people) better.
Frankly speaking, I do not understand people who continue to use computers 
with 16MHz processors (which IIRC was the speed of CPU of my AtariST in 1988)
You can buy 1GHz computer for $350 nowdays, or assemble your won even 
cheaper. It's not a problem for peeople even with low income, not speaking 
about mid-class.
|   > |   So
|   > |   a) Legacy encodings are bad for known reasons
|   > |   b) UTF-8 is still not supported enough
|   > |   What's left?
|   > |   Yes, ASCII.
|   >
|   > Not for Cyrillic users.
|   Yes, for all users. That's what character entities are for.
|   Of course, as already said, I do not request you to *write* them. I just
|   said that UTF-8 as an encoding isn't supported enough, so in general it's
|   best to use ASCII and character entities for publishing.

ok, than easy solution is to check on server what userAgent supports, and 
deliver "special" (legacy) encodings for Mac users and old browsers.
// and charge those users of legacy software extra $$$ for accessing your web 
site :))

|   > What options are left for us? Well, people in Russia use windows-cp1251
|   > encoding, invented and implemented by Microsoft.
|   To me that's not an "encoding" at all, it's ******** to me ;)

I see ;)
But believe me it's used on 99.9% of all web sites here.
For example, I use free hosting (newmail.ru) for my site - and I have no 
All pages are served by default in "windows-cp1251".

|   > well, let me back here XML (while I understand that it can be partially
|   > off-topic on www-html mailing list)
|   > default encoding for XML is UTF-8. So frankly speaking I do not
|   > understand why you want to use ASCII when UTF-8 is default (standard)
|   UTF-8 and UTF-16 are default. That way, ASCII automatically is default,
| too. What's the point about "default" then?

don't know. :-)
But I know people who were very surprised to learn that XML supports 
something else than Unicode (we discussed this problem on KOffice list some 
time ago). I mean - people who wrote hundreds of thousand lines of code (not 

|   > As I have mentioned, I use KWord for documents. KWord's native
|   > format is XML.
|   > XML documents are encoded in UTF8. To save disk space, all XML files
|   > and gzipped. (.tar.gz)
|   > KWord "publishes" docs in HTML, XHTML, PostScript or PDF. New
|   > export filters
|   > are coming, but currentl list covers more than 99% of typical usage.
|   Great. So what's the point?

UTF-8 is default encoding, that's the point.
As number of KWord users is increasing rapidly (say, from close to zero year 
ago to thousands nowdays, can be more than million in one year), you will 
face situation in one year time when *suddenly* there are many Unicode HTML 
pages (or text files) everywhere.
And using Unicode (again, *suddenly) became common practice.
My point is also that we can reduce this time from one year to 8-10 months, 
if we all together promote Unicode. 
|   > So thanks for proposed conversion method but I think it's rather
|   > useless for me, as I use more advanced technics ;-)
|   "more advanced" - you should tell that James J. Clark, the inventor of

if he is on this list, it would be nice to hear his comment.

|   KWord won't automatically add icons for off-site or
| foreign-language-links, so what shall be "more advanced"?
|   Anyway, I prefer vim ;) (which is capable of Unicode and UTF-8, which I
| also use for writing, but I use ASCII for publishing).

ah, *freedom of choice* - that's what all we need. 
|   Of course the final solution is UTF-8. But until many major browsers,
|   including IE, have problems regarding UTF-8, an intermediate but
| compatible solution is required. And ASCII is compatible to everything
| except EBCDIC and -alikes.

If MS IE sucks (at least in a way you describe it) - we can reduce transition 
time to Unicode migrating Windows desktops to Linux.
So, encoding problem will be fixed automatically, as soon as you replaced 
Windows 2000 with KDE  :))

But IMO MS IE is superior to Opera (in terms of National Language/Encoding 
support) - so there is no good alternative to MS IE if you run Windows.
(I test Mozilla regularry as well, so far it's memory requirements are 
enermous and I can't recommend it to an average user)

|   Greetings
|   Christian

P.S. What I nice discussion, I am pretty happy that subscribed to www-HTML 
list, in addition to www-CSS and www-DOM lists which I was reading already 
for an year :)) 


Vadim Plessky
http://kde2.newmail.ru  (English)
33 Window Decorations and 6 Widget Styles for KDE
KDE mini-Themes
Received on Friday, 7 December 2001 18:09:52 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:15:50 GMT