- From: Joseph Reagle <reagle@w3.org>
- Date: Fri, 31 May 2002 15:59:00 -0400
- To: <w3c-ietf-xmldsig@w3.org>
- Cc: xml-encryption@w3.org, i18n-ig@w3.org
While looking at xenc's use of the MimeType and Encoding attributes I noticed that in the text we say Encoding="base64" but the schema says <attribute name='Encoding' type='anyURI' use="optional"/>. I went to check what xmldsig says, and it unfortunately says the same thing! So we need to answer two questions: 1. Confirm our intent is to specify an informational (no action need be taken) TES: Transfer Encoding Syntax (e.g., base64, uuencode, BinHex, quoted-printable, gzip, etc) and not a CEF: Character Encoding Form (e.g., UTF-8, UTF-16), or CES: Character Encoding Scheme (e.g., UTF-16BE, UTF-16LE.) Given we say Encoding, the example uses "base64", and we're using this to encode various objects (like a PDF, Word file, etc.) I think it is safe to say we mean a Transfer Encoding Syntax. 2. How do we want to represent this, as a string or URI? A. Is there a registry for TES? B. We have a URI for base64 which his easy to use. C. I think it's easier to change the text to an example using a URI, then to change the schema of the REC... [1] http://www.unicode.org/unicode/reports/tr17/ The five levels can be summarized as: * ACR: Abstract Character Repertoire + the set of characters to be encoded, e.g., some alphabet or symbol set * CCS: Coded Character Set + a mapping from an abstract character repertoire to a set of non-negative integers * CEF: Character Encoding Form + a mapping from a set of non-negative integers (from a CCS) to a set of sequences of particular code units of some specified width, such as bytes * CES: Character Encoding Scheme + a mapping from a set of sequences of codes units (from one or more CEFs) to a serialized sequence of bytes * TES: Transfer Encoding Syntax + a reversible transform of encoded data. This data may or may not contain textual data Character Repertoire (CR) = a set of abstract characters Coded Character Set (CCS) = a mapping of code values (space, points, positions) to a Character Repertoire Character Encoding Scheme (CES) = scheme for representing a character repertoire in a code space. Frequently, a (|CR| > |code space|) so one has to do various extensions and escaping to represent those extra chacters. UTF-8 is a CES. Charset = CCS + CES http://www.faqs.org/rfcs/rfc1521.html http://www.iana.org/assignments/character-sets http://czyborra.com/utf/
Received on Friday, 31 May 2002 16:00:17 UTC