- From: Joseph Reagle <reagle@w3.org>
- Date: Fri, 31 May 2002 15:59:00 -0400
- To: <w3c-ietf-xmldsig@w3.org>
- Cc: xml-encryption@w3.org, i18n-ig@w3.org
While looking at xenc's use of the MimeType and Encoding attributes I
noticed that in the text we say Encoding="base64" but the schema says
<attribute name='Encoding' type='anyURI' use="optional"/>. I went to check
what xmldsig says, and it unfortunately says the same thing! So we need to
answer two questions:
1. Confirm our intent is to specify an informational (no action need be
taken) TES: Transfer Encoding Syntax (e.g., base64, uuencode, BinHex,
quoted-printable, gzip, etc) and not a CEF: Character Encoding Form (e.g.,
UTF-8, UTF-16), or CES: Character Encoding Scheme (e.g., UTF-16BE,
UTF-16LE.) Given we say Encoding, the example uses "base64", and we're
using this to encode various objects (like a PDF, Word file, etc.) I think
it is safe to say we mean a Transfer Encoding Syntax.
2. How do we want to represent this, as a string or URI?
A. Is there a registry for TES?
B. We have a URI for base64 which his easy to use.
C. I think it's easier to change the text to an example using a URI, then
to change the schema of the REC...
[1] http://www.unicode.org/unicode/reports/tr17/
The five levels can be summarized as:
* ACR: Abstract Character Repertoire
+ the set of characters to be encoded, e.g., some alphabet or
symbol set
* CCS: Coded Character Set
+ a mapping from an abstract character repertoire to a set of
non-negative integers
* CEF: Character Encoding Form
+ a mapping from a set of non-negative integers (from a CCS) to
a set of sequences of particular code units of some specified
width, such as bytes
* CES: Character Encoding Scheme
+ a mapping from a set of sequences of codes units (from one or
more CEFs) to a serialized sequence of bytes
* TES: Transfer Encoding Syntax
+ a reversible transform of encoded data. This data may or may
not contain textual data
Character Repertoire (CR) = a set of abstract characters
Coded Character Set (CCS) = a mapping of code values (space, points,
positions) to a Character Repertoire
Character Encoding Scheme (CES) = scheme for representing a character
repertoire in a code space. Frequently, a (|CR| > |code space|) so one
has to do various extensions and escaping to represent those extra
chacters. UTF-8 is a CES.
Charset = CCS + CES
http://www.faqs.org/rfcs/rfc1521.html
http://www.iana.org/assignments/character-sets
http://czyborra.com/utf/
Received on Friday, 31 May 2002 16:00:17 UTC