clarifications on XML Schema base64Binary type

In both the XML-Signature WG and in the XML Protocol WG, questions
have come up about the lexical forms of base64 data, when the builtin
base64Binary simple type of XML Schema [1] is used.  See, for example,
the email at [2].  In particular, the following questions arise:

   - are newlines required every 76 characters, as described in
     RFC 2045?
   - are characters outside the base64 alphabet defined in RFC 2045
     legal (and ignored) or illegal (do conforming XML Schema
     processors have to raise an error if they see them?
   - are XML Schema processors expected to enforce the rules about
     equals-signs and character count which are implicit in RFC 2045?
     (Namely: there may be zero, one, or two equals signs; if any
     equals signs appear in the lexical form, they must be the last
     characters in the base64 alphabet which appear; the total number
     of base64 characters, including equals signs, which appear in
     the lexical form must be a multiple of 4.)  I.e. is it a type
     error, or an application error, if a document contains a
     base64Binary value which reads "a=b=c"?
   - what is the canonical form for base64Binary values?

Of these, I believe the first seems most urgent to the XML-Signature
and XML Protocol WGs.

The XML Schema WG discussed these problems in a recent telcon [3],
and agreed to insert a clarification into the XML Schema Errata
document.  The erratum has not yet been formulated, so it's not
in the document yet, but the gist is simple:

1 The base64Binary type has no line length restriction on its lexical
forms.  So it's not a type error if the value contains (for example)
800 characters without any newline sequence.

2 The lexical form of base64Binary data can contain any of the 65
characters in the "Base64 Alphabet", plus any characters recognized by
the XML spec as whitespace.  Other characters are not legal and a
conforming XML Schema processor should raise a type error if they

(Note that this creates a dependency between base64Binary data and the
NEL question being considered by the Core WG: if … is added to
the list of XML whitespace characters, it will become legal in
base64Binary data.  Opinions about whether it should become legal
automatically, without any change to XML Schema, or whether it should
wait for a new version of XML Schema, are solicited; send them to the
XML Schema comments list [4].)

3 We reached no consensus on whether XML Schema processors should
enforce the rules about equals signs and base-64 character count or on
what the canonical form should be.  The XML Schema WG leans toward
enforcing the equals-signs-and-character-count rules by writing an
appropriate regular expression or BNF for the lexical space of
base64Binary.  On canonical form, the XML Schema WG is currently
leaning toward either

   (a) 76 characters from the base64 alphabet, then a newline sequence;
       repeat as needed; last line of more than 0, less than 76
       characters, also terminated by newline sequence,
   (b) 4 characters of base64 alphabet, blank, repeat; replace
       every fifteenth blank with a newline sequence.  Replace any
       final blank with a newline sequence.  (So the result is
       lines of 74 characters containing 15 blank-separated
       quartets of base-64 characters, and a final shorter line.)

On these two questions, we would like to consult people actually using
base64 data to see whether you have opinions on this.  If you do, we
would like to know what they are.  Please send comments to the
XML Schema comments list [4].  Some sample data values which will
be affected by our decision on the equals-sign question are at [5].


-C. M. Sperberg-McQueen
  Co-chair, W3C XML Schema WG

     (W3C member-only link)

Received on Friday, 27 July 2001 14:00:18 UTC