W3C home > Mailing lists > Public > xml-encryption@w3.org > June 2002

Re: Error in xmldsig REC

From: Martin Duerst <duerst@w3.org>
Date: Sat, 01 Jun 2002 10:32:30 +0900
Message-Id: <>
To: reagle@w3.org, <w3c-ietf-xmldsig@w3.org>
Cc: xml-encryption@w3.org, w3c-i18n-ig@w3.org

At 15:59 02/05/31 -0400, Joseph Reagle wrote:

>While looking at xenc's use of the MimeType and Encoding attributes I
>noticed that in the text we say Encoding="base64" but the schema says
><attribute name='Encoding' type='anyURI' use="optional"/>.

Hello Joseph,

Your mail is a bit out of context. Are you talking about XML Sig,
or XML Enc? And which element(s)?

I just went back to both specs, and my impression was that the
word 'encoding' appears in too many contexts, and in particular
it's unqualified in too many contexts. Having 'encoding' in
the XML declaration and 'Encoding' mean different things is
bad enough, but it may be too late to be changed. But I think
it is very worthwhile to make sure that in the text, every
single instance of the word 'encoding' is qualified
(e.g. 'character encoding', 'transfer encoding',...), and
that the differences are pointed out clearly.

I'm sorry that such a comment comes at such a late stage,
but I hope it's still possible to apply it. I'm glad to
help if necessary.

Regards,    Martin.

>I went to check
>what xmldsig says, and it unfortunately says the same thing! So we need to
>answer two questions:
>1. Confirm our intent is to specify an informational (no action need be
>taken) TES: Transfer Encoding Syntax (e.g., base64, uuencode, BinHex,
>quoted-printable, gzip, etc) and not a CEF: Character Encoding Form (e.g.,
>UTF-8, UTF-16), or CES: Character Encoding Scheme (e.g.,  UTF-16BE,
>UTF-16LE.) Given we say Encoding, the example uses "base64", and we're
>using this to encode various objects (like a PDF, Word file, etc.) I think
>it is safe to say we mean a Transfer Encoding Syntax.
>2. How do we want to represent this, as a string or URI?
>A. Is there a registry for TES?
>B. We have a URI for base64 which his easy to use.
>C. I think it's easier to change the text to an example using a URI, then
>to change the schema of the REC...
>[1] http://www.unicode.org/unicode/reports/tr17/
>    The five levels can be summarized as:
>      * ACR: Abstract Character Repertoire
>           + the set of characters to be encoded, e.g., some alphabet or
>             symbol set
>      * CCS: Coded Character Set
>           + a mapping from an abstract character repertoire to a set of
>             non-negative integers
>      * CEF: Character Encoding Form
>           + a mapping from a set of non-negative integers (from a CCS) to
>             a set of sequences of particular code units of some specified
>             width, such as bytes
>      * CES: Character Encoding Scheme
>           + a mapping from a set of sequences of codes units (from one or
>             more CEFs) to a serialized sequence of bytes
>      * TES: Transfer Encoding Syntax
>           + a reversible transform of encoded data. This data may or may
>             not contain textual data
>Character Repertoire (CR) = a set of abstract characters
>Coded Character Set (CCS) = a mapping of code values (space, points,
>     positions) to a Character Repertoire
>Character Encoding Scheme (CES) = scheme for representing a character
>     repertoire in a code space. Frequently, a (|CR| > |code space|) so one
>     has to do various extensions and escaping to represent those extra
>     chacters. UTF-8 is a CES.
>Charset = CCS + CES
Received on Saturday, 1 June 2002 15:46:37 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:13:09 UTC