Error in xmldsig REC from Joseph Reagle on 2002-05-31 (w3c-ietf-xmldsig@w3.org from April to June 2002)

From: Joseph Reagle <reagle@w3.org>
Date: Fri, 31 May 2002 15:59:00 -0400
To: <w3c-ietf-xmldsig@w3.org>
Cc: xml-encryption@w3.org, i18n-ig@w3.org
Message-Id: <20020531195901.6876F85A0C@aeon.w3.org>

While looking at xenc's use of the MimeType and Encoding attributes I 
noticed that in the text we say Encoding="base64" but the schema says
<attribute name='Encoding' type='anyURI' use="optional"/>. I went to check 
what xmldsig says, and it unfortunately says the same thing! So we need to 
answer two questions:

1. Confirm our intent is to specify an informational (no action need be 
taken) TES: Transfer Encoding Syntax (e.g., base64, uuencode, BinHex,  
quoted-printable, gzip, etc) and not a CEF: Character Encoding Form (e.g., 
UTF-8, UTF-16), or CES: Character Encoding Scheme (e.g.,  UTF-16BE, 
UTF-16LE.) Given we say Encoding, the example uses "base64", and we're 
using this to encode various objects (like a PDF, Word file, etc.) I think 
it is safe to say we mean a Transfer Encoding Syntax.
2. How do we want to represent this, as a string or URI?
A. Is there a registry for TES?
B. We have a URI for base64 which his easy to use.
C. I think it's easier to change the text to an example using a URI, then 
to change the schema of the REC...

[1] http://www.unicode.org/unicode/reports/tr17/
   The five levels can be summarized as:
     * ACR: Abstract Character Repertoire
          + the set of characters to be encoded, e.g., some alphabet or
            symbol set
     * CCS: Coded Character Set
          + a mapping from an abstract character repertoire to a set of
            non-negative integers
     * CEF: Character Encoding Form
          + a mapping from a set of non-negative integers (from a CCS) to
            a set of sequences of particular code units of some specified
            width, such as bytes
     * CES: Character Encoding Scheme
          + a mapping from a set of sequences of codes units (from one or
            more CEFs) to a serialized sequence of bytes
     * TES: Transfer Encoding Syntax
          + a reversible transform of encoded data. This data may or may
            not contain textual data

Character Repertoire (CR) = a set of abstract characters
Coded Character Set (CCS) = a mapping of code values (space, points,
    positions) to a Character Repertoire
Character Encoding Scheme (CES) = scheme for representing a character
    repertoire in a code space. Frequently, a (|CR| > |code space|) so one
    has to do various extensions and escaping to represent those extra
    chacters. UTF-8 is a CES.
Charset = CCS + CES
http://www.faqs.org/rfcs/rfc1521.html

http://www.iana.org/assignments/character-sets
http://czyborra.com/utf/

Received on Friday, 31 May 2002 16:00:17 UTC