Responses to two Character Model last call issues

Dear DSig WG,

[note to the i18n IG: the XML DSig mailing list is public]

I have been actioned (see e.g.
by the I18N WG to have a look at part of
your last call comments to the Character Model, identified
on our part as Issues 94 and 95.

These are not yet the formal dispositions from the I18N WG, but
they will become formal dispositions automatically if these
issues are not reopened by another i18n WG member.

If you disagree with the dispositions below, please say so at
your earliest convenience, so that we can try to find a better
solution or can register your disagreement.

issue 94, you wrote:

 > 3.6.1 Character Encoding Identification
 > Because of the layered Web architecture (e.g. formats used over protocols),
 > there may be multiple and at times conflicting information about character
 > encoding. Specifications MUST define conflict-resolution mechanisms (e.g.
 > priorities) for these cases, and implementers and content developers MUST
 > follow them carefully.
 > This requirement can be relevant to dsig that there is a type attribute (of
 > type URI) that could identify the encoding of an identified resource being
 > signed. However, the signature text speaks of dsig types, not MIME types
 > though MIME types when represented as a URI could be included:
 > 4.3.3 The Reference Element
 > . The Type attribute facilitates the processing of referenced data. For
 > example, while this specification makes no requirements over external data,
 > an application may wish to signal that the referent is a Manifest.
 > If someone did use this to describe the MIME type, the dsig spec does not
 > address how to resolve conflicting information and leaves it to the
 > application.

It is not clear to us whether you want us to change the relevant text
in the Character Model, and if yes, how, or whether you are asking us
whether we think the Character Model applies to the DSig spec in this
case or not, or something else. [This is a problem that applies to some
others of your comments, too, and that has confused our WG quite a bit.]

Anyway, having had a look at the latest version of the spec (PR Aug 20),
I have first been a bit confused by the fact that there is more about
the 'Type' attribute in the example section 2.2
than in, but I think
I have understood what you mean by a DSig type.

Given this, I think the scenario you describe (a mime type that may
include a 'charset' parameter) is in theory possible. But as far as
the 'Type' attribute is an open field for use by applications, it is
impossible to forsee and forbid (or define priorities for) any potentially
dangerous values. In particluar, as the spec says "Note that Type is
advisory and no action based on it or checking of its correctness is
required by core behavior.", it would be the responsibility of
any specific application of DSig that used the Type attribute in
a specific way to make sure it does not conflict with the Character

As such, the response of the I18N WG is that we accept your comment
(in the sense that you have brought up an issue worth to consider),
but that no actual action is needed on either side.

Now to the next issue, our LCI-95 at You write:

 > 3.6.2 Private Use Code Points
 > The recommendation that private-use code points be allowed but
 > prohibition against any mechanism to facilitate private agreements
 > concerning these code points in any protocol seems bizarre.  Why not
 > leave it up to protocol designers to determine if they will include a
 > mechanism for private extensions or negotiation of privately defined
 > options?

We have to disagree with you on this point. I will try to explain why.

While some mechanisms for extensibility, and options, and private
agreements can be part of most well-designed specifications, it is
important that these mechanisms are themselves well-defined.
However, there are many problems with using the private-use area,
and these problems get more and more severe in the context of
the WWW with its worldwide interaction.

One main problem is the interaction with general XML processing
tools. These tools would not know how to correctly conserve
(or modify) the additional information in the protocol, and
so the protocol and the codepoints could easily get out of sync.
You may think that this is not much different from other kind
of markup, but typing/cutting/pasting characters is something
that will work for everything except the private use zone.

A related problem is that we try to create an architecture where
e.g. various XML namespaces can be combined in innovative ways.
In a document with two different private use area extensions,
this will lead to all kinds of problems and inconsistencies.
If the interpretation of private use characters depends on
the namespace of the element they are contained in, this
breaks the fundamental assumption of formats such as HTML
and XML that each entity has only one character encoding.
It also brings us back into ISO 2022-like hodgepodges, except
that there is much more of a layer violation between markup
and character codes with 2022 character escapes.

Another issue is that it might look easy to misuse the private use
area for all kinds of pseudo-clever tagging and compression schemes.

In addition, there is the danger that some assignments in the
private use zone lead to pressure to include these codes in
Unicode. The W3C is not the place to create new encoded characters,
this is much better left to the expert organizations in this field.

In some sense, the private use area is very similar to the x-
convention for languages, charsets,... in the IETF. It is there for
experimental use, but not for use as part of a specification.

I hope this explains why we prohibit private-use agreement mechanisms
for specifications that want to conform to the character model, and
why we therefore reject your comment.

Regards,    Martin.

Received on Wednesday, 29 August 2001 05:14:23 UTC