Re: Base64 input

At 2001-07-27 13:07, Rich Salz wrote:
>As for canonical form, I don't see why adding fourteen internal spaces
>per line is noticeably better than not doing so, but I don't care all
>that much.

No one seems to care that much.  (The internal spaces are intended
to make life marginally easier for humans who end up having to
hand-check this stuff for some reason, but that's a fairly remote
prospect in real life.  So -- no one much cares.)

>As far as the equal sign padding, I have a much stronger position.  The
>padding is required.  The RFC is quite clear, and padding is a very
>different subject from whitespace, where there is significant history of
>leniency.
>
>Among the packages with which I am familiar, Python, OpenSSL, and
>OpenLDAP (dating back to the first UMich distributions) all require the
>padding. If you make it optional, then you have supersetted the spec in
>a fairly powerful way, and it would be misleading to still call it
>base64.

Just to clarify: no one proposes to say that equals signs mean anything
different in base64Binary data than they do in data encoded according
to RFC 2045.  The string "a=b" won't be correct base64 data, no matter
what XML Schema does.  The question is solely this:  if I put the string
"a=b" into an element declared as having datatype base64Binary, is it

   (a) an XML Schema type error (which a conforming XML Schema processor
       must detect and report)?

or

   (b) an application error (the XML Schema processor having handed the
       data off to a base64 decoder, which actually does the barfing)?

The question arose -- well, why? I asked it, I think, because neither
the current XML Schema spec, nor RFC 2045, say that you have to (or should,
or even might) raise an error if an equals sign appears before the end
of the data.  RFC 2045 says only "the occurrence of any '=' characters
may be taken as evidence that the end of the data has been reached
(without truncation in transit)."  It does NOT say "so if more characters
in the base64 alphabet are encountered, it might be appropriate to
raise a warning" or anything of the sort.

What do existing implementations do with a string like "abc=de=="?
Do they reject it, or do they treat it as identical to "abc=", i.e.
as an encoding of 01101001 10110111 00011100?

>There is an even stronger argument: what is the "canonical" form?  I can
>easily deal with whitespace -- ignore it, as the spec says.  But which
>of the following are legal base64 encodings of foo?
>         Zm9vCg
>         Zm9vCg=
>         Zm9vCg==
>         Zm9vCg===
>         Zm9vCg====== (6 ='s)
>
>If padding can be elided, why can't it be added?
>
>Keep it clear, follow the spec, don't break installed code: leave the
>padding as the RFC says.

I think I'm hearing you say that yes, you think it's worthwhile
for XML Schema processors to check that the equal signs are where
they ought to be in correct data, and nowhere else (so that
"abc=de==" raises a type error right away).

Thanks for the input.

-CMSMcQ

Received on Thursday, 2 August 2001 00:05:29 UTC