W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > June 1997

Re: Comments on XML Part 1 from Japanese experts

From: Gavin Nicol <gtn@eps.inso.com>
Date: Mon, 9 Jun 1997 10:09:25 -0400
Message-Id: <199706091409.KAA12041@nathaniel.ebt>
To: w3c-sgml-wg@w3.org
>After the meeting, I spoke with Internet experts and HTML experts 
>, and asked their opinion about the ideographic space character 
>and HTTP content labelling.

Who did you talk to? I'd be *most* interested to know.

>1) The ideographic space
>They unanimously agree with me in that the ideographic 
>space character must not be a delimiter.  Period.
>A plea to the ERB: please remove the ideographic space from the 
>definition of S.

The ideographic space is not a delimiter, and that's the whole point.
I would argue for including *more* characters into S (I like Rick's
"if it looks like a space, it's a SPACECHAR" test).

>2) HTTP content labelling
>Although they (and I) unanimously disagree with Gavin, it is not 
>easy to write a complete proposal right now.

I would *really* like to know who your "experts" are.

>Addition: we might propose to add encoding declarations 
>as a part of external entity declarations. (See I18N of HTML.)

BAD (Broken As Designed). What happens if the encoding of the external
entity is altered, but everyone's system doesn't invalidate the
cache entry for the DTD? What happens to security if you have to
rewrite the DTD if the document passes through a transcoder? Do you
really want caching proxies to have to parse DTD's?


>Priority 1:  If BOM or an encoding declaration exists at the beginning 
>		of this external entity or document entity, they 
>		are used to detect the encoding.

First, you should separate the BOM, and encoding declarations. They
are two distinct issues/methods, and should separate entires in your
priority tower.

Second, this requires that transcoding proxies include an XML parser, and
rewriting of the document, which could invalidate caches and security.
This also means that *normal* caching proxies will have to include an
XML parser, so that they can do equivalence testing.

>Priority 2:  If this entity is not a document entity but an external 
>		entity, then:
>	2-1: If an encoding declaration is attached to the external 
>		entity declaration that declares this external 
>		entity, this encoding declaration is used.
Same problems as above.

>	2-2: The encoding of the external entity 
>		or document entity that refers to this external entity is 
>		used.

Worse. First we had cascading stylesheets that don't, and now we have a
proposal for cascading encodings.... BAD.

>Priority 3:  HTTP content labelling

I cannot say this strongly enough: external, protocol-level
signalling, should be given highest priority. If an application cannot
bide by requirements of the transport protocol, it is broken. This is
your guarantee. You have no other.

To give you a *perfect* example of the kind of problems your solution
will run into, I suggest you take a look at HTML forms. I guess that
you and your "experts" have a *deterministic* solution to that problem
that doesn't involve the protocol? Have a chat to Frank Tang....

Most of the I18N problems in the WWW today are caused by applications
that do not respect the protocols, or half-baked solutions to half the
problem that don't work in the general case. Let's not repeat such
mistakes for XML!!

I can accept the ENCODING parameter on the XML declaration as being of
*informative* value, but if you have anything more reliable to use, it
should be given priority.

Maybe we should just require that XML *always* be in utf8? (I diasgree
on a personal level, but from one viewpoint, this has a lot in it's

Received on Monday, 9 June 1997 10:10:08 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:25:10 UTC