Re: HTML - i18n / NCR & charsets

In message <9611261813.AA08437@ jrc.it>, "Dirk.vanGulik" writes:
>
>As HTML if often transported using HTTP, the current proposal
>for an internationalized version of HTML causes several
>conflicts with widespread existing problems and 'out-of-HTML-
>band' communicated charset information on HTTP level; or the
>default latin1 assumption.

The HTML and HTTP specs are internally and mutually consistent.  That
non-standard practices exist is a sad fact. The standards process for
these documents bends over backwards to accomodate existing
practice. But as the system is growing exponentially, we have to weigh
the future value of a reasonably clean system against the value of
backwards compatibility.

>In the actual world people have taken to using so called Numerical
>Glyph/Character references within their HTML documents, such as &#160; 
>which are simply indexes into the 'defined' character set.
>
>In the il8n proposal these numerical references are taken to be
>indexes into the unicode set, so called 'codepoint's. This regardless
>of the character set announced in the header. (or in an http_equiv
>in the actual body).
>
>Currently most of these numerical references are intented by their
>authors to be indexes into latin1 or, if a charset is announced in 
>the http header by the server, as in index into that set. 

I understand many broken software packages contributed to
the delinquency of these authors. Sad but true...

I'm tempted to institute a PICS rating system for software that
violates standards and entices users to do so. PICS labels could be
used like traffic tickets to cite spec violators. They didn't call me
the "SGML cop" for nothing!

In any case: nobody ever promised these authors that their
documents would work:

>Effectively HTML has been upgraded to a new and better version,

Not so: what HTML standards there are have been consistent on
this issue throughout.

Dan

Received on Tuesday, 3 December 1996 20:06:50 UTC