Re: base64Binary lexical/octet length

Hi Sperberg-McQueen,

The concept of 3 octets accommodating 4 base64 chars, is what gave me a
wrong idea that octet-lengths have to be a multiple of 3. I guess, I had not
accounted for use of padding chars while encoding to base64.  With the use
of 1 or 2 padding chars, the base64 encoded string, when stripped-off of
whitespaces, would always be of a length multiple of 4. It's clear now!

Thanks for the detailed and patient reply!   It was both, very insightful
and helpful.

-- 
Best Regards,
Satya Prakash Tripathi


On Mon, Apr 11, 2011 at 9:50 PM, C. M. Sperberg-McQueen <
cmsmcq@blackmesatech.com> wrote:

>
> On Apr 9, 2011, at 2:49 PM, xmlplus custodians wrote:
>
> > Hi
> >
> > The XSD1.1 DataTypes spec in the base64Binary section gives following
> pseudo-code for calculating octet length of a base64Binary encoded string.
> >
> >
> ---------------------------------------------------------------------------------
> > 1) lex2   := killwhitespace(lexform)    -- remove whitespace characters
> > 2) lex3   := strip_equals(lex2)         -- strip padding characters at
> end
> > 3) length := floor (length(lex3) * 3 / 4)         -- calculate length
> >
> ---------------------------------------------------------------------------------
> >
> >
> > My understanding is that, for a base64Binary encoded string, it's lexical
> length would be a multiple of 4 and it's octet length would be a multiple of
> 3.
>
> It's been a while since I read the base64 spec, but my recollection is that
> base64 encodes
> octet sequences of any length, not just octet sequences whose length is a
> multiple of three.
>
> The lexical length (ignoring whitespace) will indeed always be a multiple
> of four; the
> padding characters are added at the end in order to ensure that this is so.
>
> >
> > As an example if we take a base64Binary encoded string, which doesn't
> contain whitespaces or padding
> > chars(=), so that lexform is same as lex3 in above code. Now let us take
> a lex3 of length 10 then,
> > according to above code, the octet length would be 7(not a multiple of
> 4).
>
> Yes, precisely.  If the lexical form, ignoring whitespace, is twelve
> characters long
> and the last two characters are equals signs, then what you have is two
> clusters of four characters, each of which encodes three octets, followed
> by a final cluster of two non-padding characters, which encodes the final
> octet.
>
> > Are octet-lengths which are not multiple of 4, valid in case of
> base64Binary encoded string ?
>
> Yes.
>
> > Also, what should be the formulae for calculating lexical-length from the
> octet-length of a base64Binary string ?
> > Should it be something like this:
> >
> > lexical-length := ceil( octet-length*4/3)
> >
> > If we take an example with octet-length=10, the lexical-length is not a
> multiple of 4.
> > I am clueless here. Appreciate your help on the same.
>
> In base64 encoding, any input octet stream is subdivided into 24-bit
> (i.e. three-octet) groups, each of which is encoded in four base64
> digitis.  If there are fewer than 24 bits in the final group of bits, then
> padding characters are used.  So if you wish to calculate the minimum
> length of the base64 encoding for an arbitrary sequence of octets (i.e.
> the length of an encoding without any white space), then I think the
> formula you want will be 4 * ceil( octet-length / 3).  It is a good idea,
> though, to follow the recommendations in the RFC for adding
> whitespace and newlines; it makes debugging problems easier, if
> nothing else.
>
> You may find it helpful to read RFC 3548, which is normatively referred
> to from the XSD spec.
>
> http://www.ietf.org/rfc/rfc3548.txt
>
> I hope this helps.
>
>
> --
> ****************************************************************
> * C. M. Sperberg-McQueen, Black Mesa Technologies LLC
> * http://www.blackmesatech.com
> * http://cmsmcq.com/mib
> * http://balisage.net
> ****************************************************************
>
>
>
>
>

Received on Monday, 11 April 2011 18:55:20 UTC